date:20151014

[AArch64] --with-arch in config.gcc support "."

2015-10-14 Thread Jiong Wang


Since armv8.1 added, we need to improve --with-arch recognition sed
pattern to catch the new "." in the architecture base name.

OK for trunk?

2015-10-14  Jiong Wang  

gcc/
  * config.gcc: Recognize "." in architecture base name for AArch64.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5818663..215ad9a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3544,7 +3544,7 @@ case "${target}" in
 
 			eval "val=\$with_$which"
 			base_val=`echo $val | sed -e 's/\+.*//'`
-			ext_val=`echo $val | sed -e 's/[a-z0-9\-]\+//'`
+			ext_val=`echo $val | sed -e 's/[a-z0-9\.\-]\+//'`
 
 			if [ $which = arch ]; then
 			  def=aarch64-arches.def

Re: [AArch64] --with-arch in config.gcc support "."

2015-10-14 Thread Andreas Schwab

Jiong Wang  writes:

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 5818663..215ad9a 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3544,7 +3544,7 @@ case "${target}" in
>  
>   eval "val=\$with_$which"
>   base_val=`echo $val | sed -e 's/\+.*//'`
> - ext_val=`echo $val | sed -e 's/[a-z0-9\-]\+//'`
> + ext_val=`echo $val | sed -e 's/[a-z0-9\.\-]\+//'`

Neither backslash is needed inside bracket expressions (in fact, the
backslash is taken literally here) as the set of special characters is
completely different there.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] c/67925 - update documentation on `inline'

2015-10-14 Thread Martin Sebor


On 10/13/2015 04:47 PM, Arkadiusz Drabczyk wrote:

* gcc/doc/extend.texi: documentation says that functions declared
`inline' would not be integrated if they are called before they are
defined or if they are recursive. Both of these statements is now
false as shown in examples on Bugzilla.


It might also be worth updating the note in the subsequent
paragraph and removing the mention of variable-length data types
which no longer prevent inlining.

FWIW, the list of most -Winline warnings issued by GCC is here
(there are two more in Ada which, AFAICT, have to do with nested
functions):

$ grep -A1 "can never be inlined" gcc/tree-inline.c
= G_("function %q+F can never be inlined because it uses "
 "alloca (override using the always_inline attribute)");
--
= G_("function %q+F can never be inlined because it uses setjmp");
  *handled_ops_p = true;
--
  = G_("function %q+F can never be inlined because it "
   "uses variable argument lists");
--
  = G_("function %q+F can never be inlined because "
   "it uses setjmp-longjmp exception handling");
--
  = G_("function %q+F can never be inlined because "
   "it uses non-local goto");
--
  = G_("function %q+F can never be inlined because "
   "it uses __builtin_return or __builtin_apply_args");
--
= G_("function %q+F can never be inlined "
 "because it contains a computed goto");
--
warning (OPT_Winline, "function %q+F can never be inlined 
because it "

 "is suppressed using -fno-inline", fn);
--
warning (OPT_Winline, "function %q+F can never be inlined 
because it "

 "uses attributes conflicting with inlining", fn);

Martin


---
  gcc/doc/extend.texi | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..7ea4b62 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7088,12 +7088,9 @@ function are integrated into the caller, and the 
function's address is
  never used, then the function's own assembler code is never referenced.
  In this case, GCC does not actually output assembler code for the
  function, unless you specify the option @option{-fkeep-inline-functions}.
-Some calls cannot be integrated for various reasons (in particular,
-calls that precede the function's definition cannot be integrated, and
-neither can recursive calls within the definition).  If there is a
-nonintegrated call, then the function is compiled to assembler code as
-usual.  The function must also be compiled as usual if the program
-refers to its address, because that can't be inlined.
+If there is a nonintegrated call, then the function is compiled to
+assembler code as usual.  The function must also be compiled as usual if
+the program refers to its address, because that can't be inlined.

  @opindex Winline
  Note that certain usages in a function definition can make it unsuitable

Re: [PATCH] Fix pr67963

2015-10-14 Thread H.J. Lu

On Wed, Oct 14, 2015 at 8:08 AM, Yulia Koval  wrote:
> Hi,
>
> This patch fixes the issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963
>
>   gcc/config/i386/i386.c (ix86_option_override_internal) Disable
> 80387 mask if lakemont target is set.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 4c25c9e..db722aa 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
>   break;
>}
>
> +  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
> +{
> +  opts->x_target_flags &= ~MASK_80387;
> +  opts_set->x_target_flags |= MASK_80387;
> +}
> +
>if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
>  error ("Intel MPX does not support x32");
>
> Ok for trunk?

We should add a bit to "struct pta" to indicate availability of
80387 ISA and turn it off for lakemount if 90387 ISA hasn't be
turned on explicitly.

We also need some testcases.

-- 
H.J.

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Jeff Law


On 10/14/2015 03:19 AM, Richard Biener wrote:

On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:

On 10/14/2015 02:49 AM, Jeff Law wrote:


The problem here is we don't know what address space the *0 is going to
hit,
right?



Correct, not before we do the walk of stmt to see what's present.


Isn't that also an issue for code generation as well?



What sort of problem are you thinking of?  I haven't seen one yet.


The actual dereference of course has a properly address-space qualified zero.
OK.  That's the bit I was missing and hinted out in the message I just 
sent -- the address-space information is carried outside the address. 
It seems that we're carrying it in the type, which is probably sensible 
at some level.





Only your walking depends on operand_equal_p to treat different address-space
zero addresses as equal (which they are of course not ...):
And that's the key problem with carrying the address-space information 
outside the address.   We have to look at more than just the raw address 
to determine if we've got a faulting *0 vs an address space qualified *0.





int
operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
{
...
   /* Check equality of integer constants before bailing out due to
  precision differences.  */
   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
 {
   /* Address of INTEGER_CST is not defined; check that we did not forget
  to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
   gcc_checking_assert (!(flags
  & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
   return tree_int_cst_equal (arg0, arg1);
 }

but only later we do

   /* We cannot consider pointers to different address space equal.  */
   if (POINTER_TYPE_P (TREE_TYPE (arg0))
   && POINTER_TYPE_P (TREE_TYPE (arg1))
   && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
 return 0;

So "fixing" that would make the walker only look for default
address-space zero dereferences.

Agreed.



I think we need to fix operand_equal_p anyway because 0 is clearly not
equal to 0 (only if they convert to the same literal)
My worry here is we'd be getting onto a slippery slope.  But it may be 
unavoidable.


jeff

[gomp4.5] Support for monotonic and nonmonotonic schedule modifiers

2015-10-14 Thread Jakub Jelinek

Hi!

I've created gomp-4_5-branch in svn, where further OpenMP 4.5 development
will happen.

The following patch which I've committed there (and after a while plan to
merge to trunk together with other smaller changes) adds support for
monotonic and nonmonotonic schedule modifiers.  The older versions of the
standard can be read either way for dynamic and guided schedules, whether
the chunks must be given in order or randomly; all current OpenMP
implementations (including libgomp) have monotonic behavior, but for better
scalability at least of dynamic scheduling allowing random order of the
chunks is desirable, so that work-stealing can be used.

On the library side, this patch right now just adds aliases which make it
clear whether user wants monotonic or nonmonotonic, static kind as well as
ordered clause force monotonic, and for now we treat lack of nonmonotonic as
monotonic (which is going to change in 5.0).
Once we have a work-stealing implementation, we can just change the library
side.

2015-10-14  Jakub Jelinek  

* tree-core.h (enum omp_clause_schedule_kind): Add
OMP_CLAUSE_SCHEDULE_MASK, OMP_CLAUSE_SCHEDULE_MONOTONIC,
OMP_CLAUSE_SCHEDULE_NONMONOTONIC and change
OMP_CLAUSE_SCHEDULE_LAST value.
* omp-builtins.def (BUILT_IN_GOMP_LOOP_NONMONOTONIC_DYNAMIC_START,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_GUIDED_START,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_DYNAMIC_NEXT,
BUILT_IN_GOMP_LOOP_NONMONOTONIC_GUIDED_NEXT,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_DYNAMIC_START,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_GUIDED_START,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_DYNAMIC_NEXT,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_GUIDED_NEXT,
BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_DYNAMIC,
BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_GUIDED): New built-ins.
* omp-low.c (struct omp_region): Add sched_modifiers field.
(struct omp_for_data): Likewise.
(extract_omp_for_data): Fill in sched_modifiers, and mask out
OMP_CLAUSE_SCHEDULE_KIND bits outside of OMP_CLAUSE_SCHEDULE_MASK
from sched_kind.
(determine_parallel_type): Use only OMP_CLAUSE_SCHEDULE_MASK
bits of OMP_CLAUSE_SCHED_KIND.
(expand_parallel_call): Use nonmonotonic entrypoints for
nonmonotonic: dynamic/guided.
(expand_omp_for): Likewise.  Initialize region->sched_modifiers.
* tree-pretty-print.c (dump_omp_clause): Print schedule clause
modifiers.
gcc/c/
* c-parser.c (c_parser_omp_clause_schedule): Parse schedule
modifiers, diagnose monotonic together with nonmonotonic.
* c-typeck.c (c_finish_omp_clauses): Diagnose nonmonotonic
modifier on kinds other than dynamic or guided or nonmonotonic
modifier together with ordered clause.
gcc/cp/
* parser.c (cp_parser_omp_clause_schedule): Parse schedule
modifiers, diagnose monotonic together with nonmonotonic.
* semantics.c (finish_omp_clauses): Diagnose nonmonotonic
modifier on kinds other than dynamic or guided or nonmonotonic
modifier together with ordered clause.
gcc/testsuite/
* c-c++-common/gomp/schedule-modifiers-1.c: New test.
* gcc.dg/gomp/for-20.c: New test.
* gcc.dg/gomp/for-21.c: New test.
* gcc.dg/gomp/for-22.c: New test.
* gcc.dg/gomp/for-23.c: New test.
* gcc.dg/gomp/for-24.c: New test.
libgomp/
* libgomp.map (GOMP_4.5): Export
GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_next,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,
GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_next,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_parallel_loop_nonmonotonic_dynamic and
GOMP_parallel_loop_nonmonotonic_guided.
* libgomp_g.h (GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_next,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,
GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_next,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_parallel_loop_nonmonotonic_dynamic,
GOMP_parallel_loop_nonmonotonic_guided): New prototypes.
* loop.c (GOMP_parallel_loop_nonmonotonic_dynamic,
GOMP_parallel_loop_nonmonotonic_guided,
GOMP_loop_nonmonotonic_dynamic_start,
GOMP_loop_nonmonotonic_guided_start,
GOMP_loop_nonmonotonic_dynamic_next,
GOMP_loop_nonmonotonic_guided_next): New aliases or functions.
* loop_ull.c (GOMP_loop_ull_nonmonotonic_dynamic_start,
GOMP_loop_ull_nonmonotonic_guided_start,
GOMP_loop_ull_nonmonotonic_dynamic_next,

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jeff Law


On 10/14/2015 09:43 AM, Jan Hubicka wrote:

I think he asked for trivial forward threads though due to repeated
tests.
I hacked FRE to do this (I think), but maybe some trivial cleanup
opportunities
are still left here.  Honza?


Well, unthreaded jumps quite confuse profile prediction and create profiles
that we can't fix later. An of course they count in time (and size sometimes)
estimates.

 From cases I commonly see it is the usual lazyness of repeated tests comming
from early inlining/macro expansion and also C++ love to introduce

   if (ptr != NULL)
 ptr2 = >foo;
   else
 ptr2 = NULL

for instances of multiple inheritance. usually ptr is known to be non-NULL.
And also cases where if is uses to check individual cases without having proper
esles.
Yea.  I still  see a variety of trivial jump threads lying around early 
in the pipeline.


The nice thing about the backwards walking stuff in this context is we 
can control how hard it looks for jump threads much better.


The difficult thing is it's not currently prepared to find the implicit 
sets from conditionals.  Re-using the ASSERT_EXPR mechanisms from vrp 
may be the solution.  I haven't tried that yet, but it's in the back of 
my mind for solving that class of problems cleanly.




jeff

Re: [patch 0/6] scalar-storage-order merge (2)

2015-10-14 Thread Trevor Saunders

On Tue, Oct 13, 2015 at 07:32:08PM +0200, Eric Botcazou wrote:
> > My main question about this series is - how generally useful do you
> > expect it to be? I know of some different projects that would like
> > bi-endian capability, but it looks like this series implements something
> > that is a little too limited to be of use in these cases.
> 
> AdaCore has customers who have been using it for a few years.  With the 
> inline 
> pragma and either the configuration pragma (Ada) or the switch (C/C++), you 
> can use it without much code rewriting.
> 
> > It looks like it comes with a nontrivial maintenance cost.
> 
> Nontrivial but manageable IMO and the heavily modified parts (mostly the RTL 
> expander) are "cold" these days.  I suspect that less "limited" versions 
> would 
> be far more intrusive and less manageable.
> 
> Of course I would do the maintenance (I have been doing it for a few years at 
> AdaCore), except for the C++ front-end that I don't know at all; that's why 
> I'm OK to drop the C++ support for now.

I haven't looked at the C++ changes, but I tend to think they mat may be
the language where this is the least useful.  I expect it would be
pretty "trivial" to write some wrapper classes that use bswap in
operators so you could say things like struct { uint32_t_be x; }; and
have x stored in big endian.  At which point all you are really missing
is a flag to have all struct members work that way, but rewriting all
struct members of type uint32_t to uint32_t_be should be straight
forward.

trev

Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-10-14 Thread Ilya Enkovich

On 14 Oct 15:06, Ilya Enkovich wrote:
> 
> Will send an updated version after testing.
> 
> Thanks,
> Ilya
> 

Here is an updated patch version.

Thanks,
Ilya
--
gcc/

2015-10-14  Ilya Enkovich  

* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-stmts.c (vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
* tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(vectorizable_comparison): New.


diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8a4d489..0be0523 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3870,6 +3870,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
var_kind, const char *name)
   case vect_scalar_var:
 prefix = "stmp";
 break;
+  case vect_mask_var:
+prefix = "mask";
+break;
   case vect_pointer_var:
 prefix = "vectp";
 break;
@@ -4424,7 +4427,11 @@ vect_create_destination_var (tree scalar_dest, tree 
vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+? VECTOR_BOOLEAN_TYPE_P (vectype)
+? vect_mask_var
+: vect_simple_var
+: vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 23cec8a..6a52895 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7516,6 +7516,192 @@ vectorizable_condition (gimple *stmt, 
gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
+gimple **vec_stmt, tree reduc_def,
+slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec vec_oprnds0 = vNULL;
+  vec vec_oprnds1 = vNULL;
+  gimple *def_stmt;
+  tree mask_type;
+  tree mask;
+
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
+return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+  && reduc_def))
+return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"value used after loop.\n");
+  return false;
+}
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (!vect_is_simple_use_1 (rhs1, stmt, stmt_info->vinfo,
+_stmt, , [0], ))
+return false;
+
+  if (!vect_is_simple_use_1 (rhs2, stmt, stmt_info->vinfo,
+_stmt, , [1], ))
+   return false;
+
+  if (vectype1 && vectype2
+  && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
+return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  /* Invariant comparison.  */
+  if (!vectype)
+{
+  vectype = build_vector_type (TREE_TYPE (rhs1), nunits);
+  if (tree_to_shwi (TYPE_SIZE_UNIT (vectype)) != current_vector_size)
+   return false;
+}
+  else if (nunits != TYPE_VECTOR_SUBPARTS (vectype))
+return false;
+
+  if (!vec_stmt)
+{
+  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+  vect_model_simple_cost (stmt_info, ncopies, dts, NULL, NULL);
+  return expand_vec_cmp_expr_p (vectype, mask_type);
+}
+
+  /* Transform.  */
+  if (!slp_node)
+

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Ilya Enkovich

On 14 Oct 13:50, Ilya Enkovich wrote:
> 2015-10-14 11:49 GMT+03:00 Richard Biener :
> > On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  
> > wrote:
> >> I don't understand what you mean. vect_get_vec_def_for_operand has two
> >> changes made.
> >> 1. For boolean invariants use build_same_sized_truth_vector_type
> >> instead of get_vectype_for_scalar_type in case statement produces a
> >> boolean vector. This covers cases when we use invariants in
> >> comparison, AND, IOR, XOR.
> >
> > Yes, I understand we need this special-casing to differentiate between
> > the vector type
> > used for boolean-typed loads/stores and the type for boolean typed 
> > constants.
> > What happens if we mix them btw, like with
> >
> >   _Bool b = bools[i];
> >   _Bool c = b || d;
> >   ...
> >
> > ?
> 
> Here both statements should get vector of char as a vectype and we
> never go VECTOR_BOOLEAN_TYPE_P way for them
> 
> >
> >> 2. COND_EXPR is an exception because it has built-in boolean vector
> >> result not reflected in its vecinfo. Thus I added additional operand
> >> for vect_get_vec_def_for_operand to directly specify vectype for
> >> vector definition in case it is a loop invariant.
> >> So what do you propose to do with these changes?
> >
> > This is the change I don't like and don't see why we need it.  It works 
> > today
> > and the comparison operands should be of appropriate type already?
> 
> Today it works because we always create vector of integer constant.
> With boolean vectors it may be either integer vector or boolean vector
> depending on context. Consider:
> 
> _Bool _1;
> int _2;
> 
> _2 = _1 != 0 ? 0 : 1
> 
> We have two zero constants here requiring different vectypes.
> 
> Ilya
> 
> >
> > Richard.
> >
> >> Thanks,
> >> Ilya

Here is an updated patch version.

Thanks,
Ilya
--
gcc/

2015-10-14  Ilya Enkovich  

* expr.c (const_vector_mask_from_tree): New.
(const_vector_from_tree): Use const_vector_mask_from_tree
for boolean vectors.
* tree-vect-stmts.c (vect_init_vector): Support boolean vector
invariants.
(vect_get_vec_def_for_operand): Add VECTYPE arg.
(vectorizable_condition): Directly provide vectype for invariants
used in comparison.
* tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
arg.


diff --git a/gcc/expr.c b/gcc/expr.c
index b5ff598..ab25d1a 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11344,6 +11344,40 @@ try_tablejump (tree index_type, tree index_expr, tree 
minval, tree range,
   return 1;
 }
 
+/* Return a CONST_VECTOR rtx representing vector mask for
+   a VECTOR_CST of booleans.  */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+  rtvec v;
+  unsigned i;
+  int units;
+  tree elt;
+  machine_mode inner, mode;
+
+  mode = TYPE_MODE (TREE_TYPE (exp));
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+
+  v = rtvec_alloc (units);
+
+  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+{
+  elt = VECTOR_CST_ELT (exp, i);
+
+  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+  if (integer_zerop (elt))
+   RTVEC_ELT (v, i) = CONST0_RTX (inner);
+  else if (integer_onep (elt)
+  || integer_minus_onep (elt))
+   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+  else
+   gcc_unreachable ();
+}
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
 static rtx
 const_vector_from_tree (tree exp)
@@ -11359,6 +11393,9 @@ const_vector_from_tree (tree exp)
   if (initializer_zerop (exp))
 return CONST0_RTX (mode);
 
+  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
+return const_vector_mask_from_tree (exp);
+
   units = GET_MODE_NUNITS (mode);
   inner = GET_MODE_INNER (mode);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 6a52895..01168ae 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1308,7 +1308,22 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
   if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
{
  if (CONSTANT_CLASS_P (val))
-   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+   {
+ /* Can't use VIEW_CONVERT_EXPR for booleans because
+of possibly different sizes of scalar value and
+vector element.  */
+ if (VECTOR_BOOLEAN_TYPE_P (type))
+   {
+ if (integer_zerop (val))
+   val = build_int_cst (TREE_TYPE (type), 0);
+ else if (integer_onep (val))
+   val = build_int_cst (TREE_TYPE (type), 1);
+ else
+   gcc_unreachable ();
+   }
+ else
+   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+   }
  else
{
  new_temp =

[gomp4, committed] Backported param parloops-schedule

2015-10-14 Thread Tom de Vries

[ was: Re: [PATCH, 3/5] Handle original loop tree in 
expand_omp_for_generic ]


On 13/10/15 23:48, Thomas Schwinge wrote:

Hi Tom!

On Mon, 12 Oct 2015 18:56:29 +0200, Tom de Vries  wrote:

>Handle original loop tree in expand_omp_for_generic
>
>2015-09-12  Tom de Vries
>
>PR tree-optimization/67476
>* omp-low.c (expand_omp_for_generic): Handle original loop tree.

Working on a merge from trunk into gomp-4_0-branch, I'm seeing your
change (trunk r228754) conflict with code Chung-Lin changed
(gomp-4_0-branch r224505).  So, would you two please cherry-pick/merge
trunk r228754 into gomp-4_0-branch?  Thanks!  (I'm assuming you can
easily tell what needs to be done here; it's been a long time that
Chung-Lin touched this code, so CCing him just in case.)  Thanks!


Hi Thomas,

I've backport the whole patch series:
 1Handle simple latch in expand_omp_for_generic
 2Add missing phis in expand_omp_for_generic
 3Handle original loop tree in expand_omp_for_generic
 4Support DEFPARAMENUM in params.def
 5Add param parloops-schedule
and committed to gomp-4_0-branch.

I'm only posting patch nr. 3, the only one with a non-trivial conflict.

Thanks,
- Tom
Handle original loop tree in expand_omp_for_generic

2015-10-14  Tom de Vries  

	backport from trunk:
	2015-10-13  Tom de Vries  

	PR tree-optimization/67476
	* omp-low.c (expand_omp_for_generic): Handle original loop tree.
---
 gcc/omp-low.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 473e2e7..dde3e1b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6924,7 +6924,6 @@ expand_omp_for_generic (struct omp_region *region,
   remove_edge (e);
 
   make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
   e = find_edge (cont_bb, l1_bb);
   if (e == NULL)
 	{
@@ -7002,23 +7001,36 @@ expand_omp_for_generic (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, l1_bb,
 			   recompute_dominator (CDI_DOMINATORS, l1_bb));
 
-  struct loop *outer_loop;
-  if (seq_loop)
-	outer_loop = l0_bb->loop_father;
-  else
+  /* We enter expand_omp_for_generic with a loop.  This original loop may
+	 have its own loop struct, or it may be part of an outer loop struct
+	 (which may be the fake loop).  */
+  struct loop *outer_loop = entry_bb->loop_father;
+  bool orig_loop_has_loop_struct = l1_bb->loop_father != outer_loop;
+
+  add_bb_to_loop (l2_bb, outer_loop);
+
+  struct loop *new_loop = NULL;
+  if (!seq_loop)
 	{
-	  outer_loop = alloc_loop ();
-	  outer_loop->header = l0_bb;
-	  outer_loop->latch = l2_bb;
-	  add_loop (outer_loop, l0_bb->loop_father);
+	  /* We've added a new loop around the original loop.  Allocate the
+	 corresponding loop struct.  */
+	  new_loop = alloc_loop ();
+	  new_loop->header = l0_bb;
+	  new_loop->latch = l2_bb;
+	  add_loop (new_loop, outer_loop);
 	}
 
-  if (!gimple_omp_for_combined_p (fd->for_stmt))
+  /* Allocate a loop structure for the original loop unless we already
+	 had one.  */
+  if (!orig_loop_has_loop_struct
+	  && !gimple_omp_for_combined_p (fd->for_stmt))
 	{
-	  struct loop *loop = alloc_loop ();
-	  loop->header = l1_bb;
+	  struct loop *orig_loop = alloc_loop ();
+	  orig_loop->header = l1_bb;
 	  /* The loop may have multiple latches.  */
-	  add_loop (loop, outer_loop);
+	  add_loop (orig_loop, (new_loop != NULL
+? new_loop
+: outer_loop));
 	}
 }
 }
-- 
1.9.1

Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-10-14 Thread Andrew MacLeod

Here's the latest version of the tools for a sub directory in contrib.   
I've handled all the feedback, except I have not fully commented the 
python code in the tools, nor followed any particular coding 
convention...   Documentation has been handled, and I've added some 
additional comments to the places which were noted as being unclear. Ive 
also removed all tabs from the source files.


Ive also updated show-headers slightly to be a little more 
error-resistant and to put some emphasis on any header files specified 
on the command as being of interest . (when there are 140 shown, it can 
be hard to find the one you are looking for sometimes)


Do we wish to impose anything in particular on the source for  tools 
going into this sub-directory of contrib? The other tools in contrib 
don't seem to have much in the way of coding standards. I also 
wonder if anyone other than me will look at them much :-)


Andrew




headers/

	* README : New File.
	* count-headers : New File.
	* gcc-order-headers : New File.
	* graph-header-logs : New File.
	* graph-include-web : New File.
	* headerutils.py : New File.
	* included-by : New File.
	* reduce-headers : New File.
	* replace-header : New File.
	* show-headers : New File.

Index: headers/README
===
*** headers/README	(revision 0)
--- headers/README	(working copy)
***
*** 0 
--- 1,283 
+ Quick start documentation for the header file utilities.  
+ 
+ This isn't a full breakdown of the tools, just they typical use scenarios.
+ 
+ - Each tool accepts -h to show it's usage.  Usually no parameters will also
+ trigger the help message.  Help may specify additional functionality to what is
+ listed here.
+ 
+ - For all tools, option format for specifying filenames must have no spaces
+ between the option and filename.
+ ie.: tool -lfilename.h  target.h
+ 
+ - Many of the tools are required to be run from the core gcc source directory
+ containing coretypes.h.  Typically that is in gcc/gcc from a source checkout.
+ For these tools to work on files not in this directory, their path needs to be
+ specified on the command line.
+ ie.: tool c/c-decl.c  lto/lto.c
+ 
+ - options can be intermixed with filenames anywhere on the command line
+ ie.   tool ssa.h rtl.h -a   is equivalent to 
+   tool ssa.h -a rtl.h
+ 
+ 
+ 
+ 
+ 
+ gcc-order-headers
+ -
+   This will reorder any primary backend headers files known to the tool into a
+   canonical order which will resolve any hidden dependencies they may have.
+   Any unknown headers will simply be placed after the recognized files, and
+   retain the same relative ordering they had.
+  
+   This tool must be run in the core gcc source directory.
+ 
+   Simply execute the command listing any files you wish to process on the
+   command line.
+ 
+   Any files which are changed are output, and the original is saved with a
+   .bak extention.
+ 
+   ex.: gcc-order-headers tree-ssa.c c/c-decl.c
+ 
+   -s will list all of the known headers in their canonical order. It does not
+   show which of those headers include other headers, just the final canonical
+   ordering.
+ 
+   if any header files are included within a conditional code block, the tool
+   will issue a message and not change the file.  When this happens, you can
+   manually inspect the file to determine if reordering it is actually OK.  Then
+   rerun the command with the -i option.  This will ignore the conditional error
+   condition and perform the re-ordering anyway.
+   
+   If any #include line has the beginning of a multi-line comment, it will also
+   refuse to process the file until that is resolved by terminating the comment
+   on the same line, or removing it.
+ 
+ 
+ show-headers
+ 
+   This will show the include structure for any given file. Each level of nesting
+   is indented, and when any duplicate headers are seen, they have their
+   duplicate number shown
+ 
+   -i may be used to specify alternate search directories for headers to parse.
+   -s specifies headers to look for and emphasize in the output.
+ 
+   This tool must be run in the core gcc source directory.
+ 
+   ex.: show-headers -sansidecl.h tree-ssa.c
+ 	tree-ssa.c
+ 	  config.h
+ 	auto-host.h
+ 	ansidecl.h  (1)   <<---
+ 	  system.h
+ 	safe-ctype.h
+ 	filenames.h
+ 	  hashtab.h  (1)
+ 		ansidecl.h  (2)<<---
+ 	libiberty.h
+ 	  ansidecl.h  (3)<<---
+ 	hwint.h
+ 	  coretypes.h
+ 	machmode.h  (1)
+ 	  insn-modes.h  (1)
+ 	signop.h
+ 	  <...>
+ 
+ 
+ 
+ 
+ count-headers
+ -
+   simply count all the headers found in the specified files. A summary is 
+   printed showing occurrences from high to low.
+ 
+   ex.:count-headers  tree*.c
+ 	86 : coretypes.h
+ 	86 : config.h
+ 	86 : system.h
+ 	86 : tree.h
+ 	82 : backend.h
+ 	80

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Jeff Law


On 10/13/2015 02:59 PM, Richard Henderson wrote:

On 10/14/2015 02:49 AM, Jeff Law wrote:

The problem here is we don't know what address space the *0 is going
to hit,
right?


Correct, not before we do the walk of stmt to see what's present.
So the address space information isn't part of the address?  I must 
admit I haven't looked at how that stuff is being implemented.





Isn't that also an issue for code generation as well?


What sort of problem are you thinking of?  I haven't seen one yet.
If the address space information was supposed to be carried in the 
address itself, then we'd need the address to be distinct from 
NULL_POINTER_NODE.


It sounds to me like you're carrying address space information outside 
the address itself, which avoid those issues.  However, it does mean 
that the path isolation code needs some kind of adjustment to 
distinguish between *0 that will fault and *0 which hits a different 
address space and may not fault.


jeff

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Jan Hubicka

> >>> I think he asked for trivial forward threads though due to repeated
> >>> tests.
> >>> I hacked FRE to do this (I think), but maybe some trivial cleanup
> >>> opportunities
> >>> are still left here.  Honza?

Well, unthreaded jumps quite confuse profile prediction and create profiles
that we can't fix later. An of course they count in time (and size sometimes)
estimates.

>From cases I commonly see it is the usual lazyness of repeated tests comming
from early inlining/macro expansion and also C++ love to introduce

  if (ptr != NULL)
ptr2 = >foo;
  else
ptr2 = NULL

for instances of multiple inheritance. usually ptr is known to be non-NULL.
And also cases where if is uses to check individual cases without having proper
esles.

Honza
> >>
> >>
> >> This or other related patches in the range r228731:228774 has caused a
> >> quite
> >> big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe
> >> affecting
> >> bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
> >> effectively peeling all loops once?
> >
> > It's possible.  I've actually got a patch in overnight testing that
> > introduces some of the heuristics to avoid mucking up loops to the FSM bits.
> 
> Like never threading a loop exit test to the loop header (but only to the 
> exit).
> At least if it is the only exit in the loop (but maybe better for all exits).
> 
> Richard.
> 
> > jeff
> >

[patch] Minor adjustment to gimplify_addr_expr

2015-10-14 Thread Eric Botcazou

Hi,

this is the regression of ACATS c37213k at -O2 with an upcoming change in
the front-end of the Ada compiler:

eric@polaris:~/gnat/gnat-head/native> gcc/gnat1 -quiet c37213k.adb -I 
/home/eric/gnat/bugs/support -O2 
+===GNAT BUG DETECTED==+
| Pro 7.4.0w (20151014-60) (x86_64-suse-linux) GCC error:  |
| tree check: expected class 'expression', have  |
| 'exceptional' (ssa_name) in tree_operand_check, at tree.h:3431|
| Error detected around c37213k.adb:95:37

It's recompute_tree_invariant_for_addr_expr receiving an SSA_NAME instead of
an ADDR_EXPR when called from gimplify_addr_expr.  The sequence is as follows:
we start with this GIMPLE statement:

  *R.43_60 = c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: ] 
[return slot optimization]

Then IPA clones the function and turns the statement into:

  MEM[(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46] = 
c37213k.B_1.B_6.proc6.B_4.B_5.value (); [static-chain: ] [return slot 
optimization]

The 'value' function has been NRVed and contains:

  .builtin_memcpy (&, _9, _10);

and gets inlined so the above statement is rewritten into:

.builtin_memcpy ([(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46], 
_174, _175);

so gimplify_addr_expr is invoked on:

  [(struct c37213k__B_1__B_6__proc6__B_4__nrec *)R.43_46]

and gets confused because it doesn't see that it's just R.43_46 (it would have
seen it if the original INDIRECT_REF was still present in lieu of MEM_REF).

Hence the attached fixlet.  Tested on x86_64-suse-linux, OK for the mainline?


2015-10-14  Eric Botcazou  <ebotca...@adacore.com>

* gimplify.c (gimplify_addr_expr) : New case.

-- 
Eric BotcazouIndex: gimplify.c
===
--- gimplify.c	(revision 228794)
+++ gimplify.c	(working copy)
@@ -4984,6 +4984,12 @@ gimplify_addr_expr (tree *expr_p, gimple
   ret = GS_OK;
   break;
 
+case MEM_REF:
+  if (integer_zerop (TREE_OPERAND (op0, 1)))
+	goto do_indirect_ref;
+
+  /* ... fall through ... */
+
 default:
   /* If we see a call to a declared builtin or see its address
 	 being taken (we can unify those cases here) then we can mark

[PATCH] Fix pr67963

2015-10-14 Thread Yulia Koval

Hi,

This patch fixes the issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963

  gcc/config/i386/i386.c (ix86_option_override_internal) Disable
80387 mask if lakemont target is set.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4c25c9e..db722aa 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
  break;
   }

+  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
+{
+  opts->x_target_flags &= ~MASK_80387;
+  opts_set->x_target_flags |= MASK_80387;
+}
+
   if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
 error ("Intel MPX does not support x32");

Ok for trunk?

Yulia


patch
Description: Binary data

Re: [PATCH] Fix pr67963

2015-10-14 Thread H.J. Lu

On Wed, Oct 14, 2015 at 8:15 AM, H.J. Lu  wrote:
> On Wed, Oct 14, 2015 at 8:08 AM, Yulia Koval  wrote:
>> Hi,
>>
>> This patch fixes the issue:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67963
>>
>>   gcc/config/i386/i386.c (ix86_option_override_internal) Disable
>> 80387 mask if lakemont target is set.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 4c25c9e..db722aa 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -4943,6 +4943,12 @@ ix86_option_override_internal (bool main_args_p,
>>   break;
>>}
>>
>> +  if (!strcmp (opts->x_ix86_arch_string, "lakemont"))
>> +{
>> +  opts->x_target_flags &= ~MASK_80387;
>> +  opts_set->x_target_flags |= MASK_80387;
>> +}
>> +
>>if (TARGET_X32 && (opts->x_ix86_isa_flags & OPTION_MASK_ISA_MPX))
>>  error ("Intel MPX does not support x32");
>>
>> Ok for trunk?
>
> We should add a bit to "struct pta" to indicate availability of
> 80387 ISA and turn it off for lakemount if 90387 ISA hasn't be
> turned on explicitly.

Something like

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a2314e7..1cea58e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4348,6 +4348,7 @@ ix86_option_override_internal (bool main_args_p,
   const enum processor_type processor;
   const enum attr_cpu schedule;
   const unsigned HOST_WIDE_INT flags;
+  const unsigned HOST_WIDE_INT mask;
 }
   const processor_alias_table[] =
 {

> We also need some testcases.
>
> --
> H.J.

-- 
H.J.

Re: [patch 0/6] scalar-storage-order merge (2)

2015-10-14 Thread Jeff Law


On 10/14/2015 09:25 AM, Trevor Saunders wrote:


I haven't looked at the C++ changes, but I tend to think they mat may be
the language where this is the least useful.  I expect it would be
pretty "trivial" to write some wrapper classes that use bswap in
operators so you could say things like struct { uint32_t_be x; }; and
have x stored in big endian.  At which point all you are really missing
is a flag to have all struct members work that way, but rewriting all
struct members of type uint32_t to uint32_t_be should be straight
forward.
I've seen this kind of thing come up in other languages, even some not 
covered by ACT's work.ACT's work does make supporting the extensions 
in those other languages significantly easier.




jeff

PR67945: Fix oscillation between pow representations

2015-10-14 Thread Richard Sandiford

This patch fixes some fallout from my patch to move the sqrt and cbrt
folding rules to match.pd.  The rules included canonicalisations like:

   sqrt(sqrt(x))->pow(x,1/4)

which in the original code was only ever done at the generic level.
My patch meant that we'd do it whenever we tried to fold a gimple
statement, and eventually it would win over the sincos optimisation
that replaces pow(x,1/4) with sqrt(sqrt(x)).

Following a suggestion from Richard B, the patch adds a new
PROP_gimple_* flag to say whether fp routines have been optimised
for the target.  If so, match.pd should only transform calls to math
functions if the result is actually an optimisation, not just an
IL simplification or canonicalisation.  The question then of course
is: which rules are which?  I've added block comments that describe
the criteria I was using.

A slight wart is that we need to use the cfun global to access
the PROP_gimple_* flag; there's no local function pointer available.

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also tested
on powerc64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/67945
* tree-pass.h (PROP_gimple_opt_math): New property flag.
* generic-match-head.c (canonicalize_math_p): New function.
* gimple-match-head.c: Include tree-pass.h.
(canonicalize_math_p): New function.
* match.pd: Group math built-in rules into simplifications
and canonicalizations.  Guard the latter with canonicalize_math_p.
* tree-ssa-math-opts.c (pass_data_cse_sincos): Provide the
PROP_gimple_opt_math property.

diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index 0a7038d..94135b3 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -73,3 +73,12 @@ single_use (tree t ATTRIBUTE_UNUSED)
 {
   return true;
 }
+
+/* Return true if math operations should be canonicalized,
+   e.g. sqrt(sqrt(x)) -> pow(x, 0.25).  */
+
+static inline bool
+canonicalize_math_p ()
+{
+  return true;
+}
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index cab77a4..f29e97f 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "cgraph.h"
 #include "gimple-match.h"
+#include "tree-pass.h"
 
 
 /* Forward declarations of the private auto-generated matchers.
@@ -827,3 +828,12 @@ single_use (tree t)
 {
   return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t);
 }
+
+/* Return true if math operations should be canonicalized,
+   e.g. sqrt(sqrt(x)) -> pow(x, 0.25).  */
+
+static inline bool
+canonicalize_math_p ()
+{
+  return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0;
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index 655c9ff..d319441 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2134,11 +2134,25 @@ along with GCC; see the file COPYING3.  If not see
clearly less optimal and which we'll transform again in forwprop.  */
 
 
-/* Simplification of math builtins.  */
+/* Simplification of math builtins.  These rules must all be optimizations
+   as well as IL simplifications.  If there is a possibility that the new
+   form could be a pessimization, the rule should go in the canonicalization
+   section that follows this one.
 
-/* fold_builtin_logarithm */
-(if (flag_unsafe_math_optimizations)
+   Rules can generally go in this section if they satisfy one of
+   the following:
+
+   - the rule describes an identity
+
+   - the rule replaces calls with something as simple as addition or
+ multiplication
+
+   - the rule contains unary calls only and simplifies the surrounding
+ arithmetic.  (The idea here is to exclude non-unary calls in which
+ one operand is constant and in which the call is known to be cheap
+ when the operand has that value.)  */
 
+(if (flag_unsafe_math_optimizations)
  /* Simplify sqrt(x) * sqrt(x) -> x.  */
  (simplify
   (mult (SQRT@1 @0) @1)
@@ -2151,63 +2165,12 @@ along with GCC; see the file COPYING3.  If not see
(mult (root:s @0) (root:s @1))
 (root (mult @0 @1
 
- /* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */
- (simplify
-  (mult (POW:s @0 @1) (POW:s @0 @2))
-   (POW @0 (plus @1 @2)))
-
- /* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */
- (simplify
-  (mult (POW:s @0 @1) (POW:s @2 @1))
-   (POW (mult @0 @2) @1))
-
  /* Simplify expN(x) * expN(y) -> expN(x+y). */
  (for exps (EXP EXP2 EXP10 POW10)
   (simplify
(mult (exps:s @0) (exps:s @1))
 (exps (plus @0 @1
 
- /* Simplify tan(x) * cos(x) -> sin(x). */
- (simplify
-  (mult:c (TAN:s @0) (COS:s @0))
-   (SIN @0))
-
- /* Simplify x * pow(x,c) -> pow(x,c+1). */
- (simplify
-  (mult @0 (POW:s @0 REAL_CST@1))
-  (if (!TREE_OVERFLOW (@1))
-   (POW @0 (plus @1 { build_one_cst (type); }
-
- /* Simplify sin(x) / cos(x) -> tan(x). */
- (simplify
-  (rdiv (SIN:s @0) (COS:s @0))
-   (TAN @0))
-
- /* Simplify cos(x) / sin(x) -> 1 / tan(x).

Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Jeff Law


On 10/14/2015 02:28 AM, Eric Botcazou wrote:

If you're using one of the switches that checks for stack overflow at the
start of the function, you certainly don't want to do any such stores.


There is a protection area for -fstack-check (STACK_CHECK_PROTECT bytes) so
you can do stores just below the stack pointer as far as it's concerned.

There is indeed the issue of the mere writing below the stack pointer.  Our
experience with various OSes and architectures shows that this almost always
works.  The only problematic case is x86{-64}/Linux historically, where you
cannot write below the page pointed to by the stack pointer (that's why there
is a specific implementation of -fstack-check for x86{-64}/Linux).

It was problematical on the PA, but I can't recall precisely why.

The thing we need to remember here is that if we do somethig like use 
space just below the stack pointer, valgrind is probably going to start 
complaining (and legitimately so).


While we know the result is throw-away, valgrind doesn't and the 
complains and noise from this would IMHO outweigh the benefits from 
using the trick of reading outside the defined stack area.


jeff

[PATCH] Fix wrong-code when folding X - (X / Y) * Y (PR tree-optimization/67953)

2015-10-14 Thread Marek Polacek

Evidently, the X - (X / Y) * Y -> X % Y pattern can't change the
signedness of X from signed to unsigned, otherwise we'd generate
wrong code.  (But unsigned -> signed should be fine.)

Does anyone see a better fix than this?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-14  Marek Polacek  

PR tree-optimization/67953
* match.pd (X - (X / Y) * Y): Don't change signedness of @0.

* gcc.dg/fold-minus-6.c (fn4): Change the type of A to
unsigned.
* gcc.dg/torture/pr67953.c: New test.

diff --git gcc/match.pd gcc/match.pd
index 655c9ff..24e19a9 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -267,7 +267,8 @@ along with GCC; see the file COPYING3.  If not see
 /* X - (X / Y) * Y is the same as X % Y.  */
 (simplify
  (minus (convert1? @0) (convert2? (mult (trunc_div @0 @1) @1)))
- (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type))
   (trunc_mod (convert @0) (convert @1
 
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
diff --git gcc/testsuite/gcc.dg/fold-minus-6.c 
gcc/testsuite/gcc.dg/fold-minus-6.c
index 1c22c25..1535452 100644
--- gcc/testsuite/gcc.dg/fold-minus-6.c
+++ gcc/testsuite/gcc.dg/fold-minus-6.c
@@ -20,7 +20,7 @@ fn3 (long int x)
 }
 
 int
-fn4 (int a, int b)
+fn4 (unsigned int a, int b)
 {
   return a - (unsigned) ((a / b) * b);
 }
diff --git gcc/testsuite/gcc.dg/torture/pr67953.c 
gcc/testsuite/gcc.dg/torture/pr67953.c
index e69de29..5ce399b 100644
--- gcc/testsuite/gcc.dg/torture/pr67953.c
+++ gcc/testsuite/gcc.dg/torture/pr67953.c
@@ -0,0 +1,42 @@
+/* PR tree-optimization/67953 */
+/* { dg-do run } */
+
+unsigned int
+fn1 (signed int a)
+{
+  return (unsigned int) a - ((a / 3) * 3);
+}
+
+unsigned int
+fn2 (signed int a)
+{
+  return a - ((a / 3) * 3);
+}
+
+unsigned int
+fn3 (int a)
+{
+  return a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn4 (int a)
+{
+  return (unsigned) a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn5 (unsigned int a)
+{
+  return (signed) a - (int) ((a / 3) * 3);
+}
+
+int
+main ()
+{
+  if (fn1 (-5) != -2
+  || fn2 (-5) != -2
+  || fn3 (-5) != -2
+  || fn4 (-5) != -2)
+__builtin_abort ();
+}

Marek

Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Richard Biener

On October 14, 2015 6:27:02 PM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch adds the CONSTRUCTOR case discussed while back.  Only empty
>constructors are matched, as those are only appearing in gimple
>operand.
>I tested that during bootstrap about 7500 matches are for empty ctors.
>There are couple hundred for non-empty probably used on generic. 
>
>Bootstrapped/regtested x86_64-linux, OK?
>
>Honza
>
>   * fold-const.c (operand_equal_p): Match empty constructors.
>Index: fold-const.c
>===
>--- fold-const.c   (revision 228735)
>+++ fold-const.c   (working copy)
>@@ -2890,6 +2891,11 @@ operand_equal_p (const_tree arg0, const_
>   return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1,
>0),
>   flags | OEP_ADDRESS_OF
>   | OEP_CONSTANT_ADDRESS_OF);
>+  case CONSTRUCTOR:
>+  /* In GIMPLE empty constructors are allowed in initializers of
>+ vector types.  */

The comment is wrong (or at least odd),
On gimple an empty vector constructor should be folded to a VECTOR_CST.

>+  return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
>+  && !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
>   default:
>   break;
>   }

[PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread H.J. Lu

By default, there is no visibility on builtin functions.  When there is
explicitly declared visibility on the C library function which a builtin
function fall back on, we should honor the explicit visibility on the
the C library function.

There are 2 issues:

1. We never update visibility of the fall back C library function.
2. init_block_move_fn and init_block_clear_fn used to implement builtin
memcpy and memset generate the library call to memcpy and memset
directly without checking if there is explicitly declared visibility on
them.

This patch updates builtin function with explicit visibility and checks
visibility on builtin memcpy/memset when generating library call.

Tested on Linux/x86-64 without regressions.  OK for trunk?


H.J.
---
gcc/c/

PR middle-end/67220
* c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
to builtin function.

gcc/

PR middle-end/67220
* expr.c (init_block_move_fn): Copy visibility from the builtin
memcpy.
(init_block_clear_fn): Copy visibility from the builtin memset.

gcc/testsuite/

PR middle-end/67220
* gcc.target/i386/pr67220-1.c: New test.
* gcc.target/i386/pr67220-2.c: Likewise.
* gcc.target/i386/pr67220-3.c: Likewise.
* gcc.target/i386/pr67220-4.c: Likewise.
* gcc.target/i386/pr67220-5.c: Likewise.
* gcc.target/i386/pr67220-6.c: Likewise.
---
 gcc/c/c-decl.c| 21 +
 gcc/expr.c| 12 ++--
 gcc/testsuite/gcc.target/i386/pr67220-1.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr67220-5.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr67220-6.c | 14 ++
 8 files changed, 115 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-6.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index ce8406a..26460eb 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2232,11 +2232,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
   /* warnings */
   /* All decls must agree on a visibility.  */
   if (CODE_CONTAINS_STRUCT (TREE_CODE (newdecl), TS_DECL_WITH_VIS)
-  && DECL_VISIBILITY_SPECIFIED (newdecl) && DECL_VISIBILITY_SPECIFIED 
(olddecl)
-  && DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
+  && DECL_VISIBILITY_SPECIFIED (newdecl))
 {
-  warned |= warning (0, "redeclaration of %q+D with different visibility "
-"(old visibility preserved)", newdecl);
+  if (DECL_VISIBILITY_SPECIFIED (olddecl))
+   {
+ if (DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
+   warned |= warning (0, "redeclaration of %q+D with different "
+  "visibility (old visibility preserved)",
+  newdecl);
+   }
+  else if (TREE_CODE (olddecl) == FUNCTION_DECL
+  && DECL_BUILT_IN (olddecl))
+   {
+ enum built_in_function fncode = DECL_FUNCTION_CODE (olddecl);
+ tree fndecl = builtin_decl_explicit (fncode);
+ gcc_assert (fndecl && !DECL_VISIBILITY_SPECIFIED (fndecl));
+ DECL_VISIBILITY (fndecl) = DECL_VISIBILITY (newdecl);
+ DECL_VISIBILITY_SPECIFIED (fndecl) = 1;
+   }
 }
 
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
diff --git a/gcc/expr.c b/gcc/expr.c
index 595324d..a12db96 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1390,7 +1390,11 @@ init_block_move_fn (const char *asmspec)
   TREE_PUBLIC (fn) = 1;
   DECL_ARTIFICIAL (fn) = 1;
   TREE_NOTHROW (fn) = 1;
-  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
+  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCPY);
+  if (fndecl)
+   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
+  else
+   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
   DECL_VISIBILITY_SPECIFIED (fn) = 1;
 
   attr_args = build_tree_list (NULL_TREE, build_string (1, "1"));
@@ -2846,7 +2850,11 @@ init_block_clear_fn (const char *asmspec)
   TREE_PUBLIC (fn) = 1;
   DECL_ARTIFICIAL (fn) = 1;
   TREE_NOTHROW (fn) = 1;
-  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
+  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
+  if (fndecl)
+   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
+  else
+   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
   DECL_VISIBILITY_SPECIFIED (fn) = 1;
 
   block_clear_fn = fn;
diff --git

Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread Ramana Radhakrishnan

On Wed, Oct 14, 2015 at 5:21 PM, H.J. Lu  wrote:

> ---
> gcc/c/
>
> PR middle-end/67220
> * c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
> to builtin function.
>
> gcc/
>
> PR middle-end/67220
> * expr.c (init_block_move_fn): Copy visibility from the builtin
> memcpy.
> (init_block_clear_fn): Copy visibility from the builtin memset.
>
> gcc/testsuite/
>
> PR middle-end/67220
> * gcc.target/i386/pr67220-1.c: New test.
> * gcc.target/i386/pr67220-2.c: Likewise.
> * gcc.target/i386/pr67220-3.c: Likewise.
> * gcc.target/i386/pr67220-4.c: Likewise.
> * gcc.target/i386/pr67220-5.c: Likewise.
> * gcc.target/i386/pr67220-6.c: Likewise.

Why aren't these tests in gcc.dg ?  The problem affects all targets
not just x86.


Thanks,
Ramana

> ---
>  gcc/c/c-decl.c| 21 +
>  gcc/expr.c| 12 ++--
>  gcc/testsuite/gcc.target/i386/pr67220-1.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-2.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-3.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-4.c | 15 +++
>  gcc/testsuite/gcc.target/i386/pr67220-5.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr67220-6.c | 14 ++
>  8 files changed, 115 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67220-6.c
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index ce8406a..26460eb 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2232,11 +2232,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
>/* warnings */
>/* All decls must agree on a visibility.  */
>if (CODE_CONTAINS_STRUCT (TREE_CODE (newdecl), TS_DECL_WITH_VIS)
> -  && DECL_VISIBILITY_SPECIFIED (newdecl) && DECL_VISIBILITY_SPECIFIED 
> (olddecl)
> -  && DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
> +  && DECL_VISIBILITY_SPECIFIED (newdecl))
>  {
> -  warned |= warning (0, "redeclaration of %q+D with different visibility 
> "
> -"(old visibility preserved)", newdecl);
> +  if (DECL_VISIBILITY_SPECIFIED (olddecl))
> +   {
> + if (DECL_VISIBILITY (newdecl) != DECL_VISIBILITY (olddecl))
> +   warned |= warning (0, "redeclaration of %q+D with different "
> +  "visibility (old visibility preserved)",
> +  newdecl);
> +   }
> +  else if (TREE_CODE (olddecl) == FUNCTION_DECL
> +  && DECL_BUILT_IN (olddecl))
> +   {
> + enum built_in_function fncode = DECL_FUNCTION_CODE (olddecl);
> + tree fndecl = builtin_decl_explicit (fncode);
> + gcc_assert (fndecl && !DECL_VISIBILITY_SPECIFIED (fndecl));
> + DECL_VISIBILITY (fndecl) = DECL_VISIBILITY (newdecl);
> + DECL_VISIBILITY_SPECIFIED (fndecl) = 1;
> +   }
>  }
>
>if (TREE_CODE (newdecl) == FUNCTION_DECL)
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 595324d..a12db96 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -1390,7 +1390,11 @@ init_block_move_fn (const char *asmspec)
>TREE_PUBLIC (fn) = 1;
>DECL_ARTIFICIAL (fn) = 1;
>TREE_NOTHROW (fn) = 1;
> -  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
> +  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCPY);
> +  if (fndecl)
> +   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
> +  else
> +   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
>DECL_VISIBILITY_SPECIFIED (fn) = 1;
>
>attr_args = build_tree_list (NULL_TREE, build_string (1, "1"));
> @@ -2846,7 +2850,11 @@ init_block_clear_fn (const char *asmspec)
>TREE_PUBLIC (fn) = 1;
>DECL_ARTIFICIAL (fn) = 1;
>TREE_NOTHROW (fn) = 1;
> -  DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
> +  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
> +  if (fndecl)
> +   DECL_VISIBILITY (fn) = DECL_VISIBILITY (fndecl);
> +  else
> +   DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
>DECL_VISIBILITY_SPECIFIED (fn) = 1;
>
>block_clear_fn = fn;
> diff --git a/gcc/testsuite/gcc.target/i386/pr67220-1.c 
> b/gcc/testsuite/gcc.target/i386/pr67220-1.c
> new file mode 100644
> index 000..06af0ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67220-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target fpic } } */
> +/* { dg-options "-O2 -fPIC" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +extern void *memcpy (void *, const void *, size_t);
>

Re: [Patch PR target/67366 2/2] [gimple-fold.c] Support movmisalign optabs in gimple-fold.c

2015-10-14 Thread Jeff Law


On 10/08/2015 08:10 AM, Ramana Radhakrishnan wrote:

This patch by Richard allows for movmisalign optabs to be supported
in gimple-fold.c. This caused a bit of pain in the testsuite with strlenopt-8.c
in conjunction with the ARM support for movmisalign_optabs as the test
was coded up to do different things depending on whether the target
supported misaligned access or not. However now with unaligned access
being allowed for different levels of the architecture in the arm backend,
the concept of the helper function non_strict_align mapping identically
to the definition of STRICT_ALIGNMENT disappears.

Adjusted thusly for ARM. The testsuite/lib changes were tested with an
arm-none-eabi multilib that included architecture variants that did not
support unaligned access and architecture variants that did.

The testing matrix for this patch was:

1. x86_64 bootstrap and regression test - no regressions.
2. armhf bootstrap and regression test - no regressions.
3. arm-none-eabi cross build and regression test for

{-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp}
{-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard}
{-marm/-mcpu=arm7tdmi/-mfloat-abi=soft}
{-mthumb/-mcpu=arm7tdmi/-mfloat-abi=soft}

with no regressions.

Ok to apply ?

Ramana

2015-10-08  Richard Biener  

* gimple-fold.c (optabs-query.h): Include
(gimple_fold_builtin_memory_op): Allow unaligned stores
when movmisalign_optabs are available.

2015-10-08  Ramana Radhakrishnan  

PR target/67366
* lib/target-supports.exp (check_effective_target_non_strict_align):
Adjust for arm*-*-*.
* gcc.target/arm/pr67366.c: New test.

OK.
jeff

Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jan Hubicka

> On 10/14/2015 10:27 AM, Jan Hubicka wrote:
> >Hi,
> >this patch adds the CONSTRUCTOR case discussed while back.  Only empty
> >constructors are matched, as those are only appearing in gimple operand.
> >I tested that during bootstrap about 7500 matches are for empty ctors.
> >There are couple hundred for non-empty probably used on generic.
> >
> >Bootstrapped/regtested x86_64-linux, OK?
> >
> >Honza
> >
> > * fold-const.c (operand_equal_p): Match empty constructors.
> OK.  It'd be useful to have a test which shows that matching these
> results in some kind of difference we can see in the dump files.

I will try to think of something.  My main motivation is to get ipa-icf-gimple
and operand_equal_p into sync and hopefully comonize the logic.  Having testcase
then will be a lot easier (any function with empty constructor will do)

Honza
> 
> jeff

[PATCH] Fix accounting for num_threaded_edges

2015-10-14 Thread Jeff Law



tree-ssa-threadupdate.c keeps  running total of the number of edges it 
threads.  Those totals are useful debugging tools and are also examined 
by the testsuite.


While looking at the effects of using the FSM threader on 
ssa-dom-thread-2?.c I noticed the counters weren't being updated 
properly for FSM threads.


This patch fixes that minor goof.

Bootstrapped & regression tested on x86_64-linux-gnu.  Installed on the 
trunk.


Jeff
commit 05dc98161472ce2e3d5f68bfcfca907deac03140
Author: Jeff Law 
Date:   Wed Oct 14 11:52:01 2015 -0600

[PATCH] Fix accounting for num_threaded_edges

* tree-ssa-threadupdate.c (thread_through_all_blocks): Bump
num_threaded_edges for successful FSM threads too.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a555d2b..7c64fa8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-10-14  Jeff Law  
+
+   * tree-ssa-threadupdate.c (thread_through_all_blocks): Bump
+   num_threaded_edges for successful FSM threads too.
+
 2015-10-14  Richard Biener  
 
* tree-vectorizer.h (vect_is_simple_use): Remove unused parameters.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index e426c1d..5632a88 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2672,6 +2672,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
  free_dominance_info (CDI_DOMINATORS);
  bitmap_set_bit (threaded_blocks, entry->src->index);
  retval = true;
+ thread_stats.num_threaded_edges++;
}
 
   delete_jump_thread_path (path);

Re: [PATCH] print help for undocumented options

2015-10-14 Thread Martin Sebor


On 10/14/2015 11:24 AM, Joseph Myers wrote:

On Wed, 14 Oct 2015, Martin Sebor wrote:


+ /* For undocumented options that are aliases for other
+options that are documented, print the other option's
+help and name.  */
+ help = cl_options [option->alias_target].help;
+
+ snprintf (new_help, sizeof new_help, "%s", help);
+ snprintf (new_help + strlen (new_help),
+   sizeof new_help - strlen (new_help),
+   ".  Same as %s",
+   cl_options [option->alias_target].opt_text);


Obviously this English string needs to be marked for translation.

There is no consistency about whether option descriptions in .opt files
end with ".", so this might produce "..  Same as" in some cases.  While I
think the .opt files should be made consistent, I also think it would be
better just to give the "Same as" message without also repeating the
description of the canonical option.


Thanks for the quick review.

I tweaked the patch to work around the .opt file inconsistency and
always print a period (the driver itself doesn't print one but that
can be fixed if/once we agree that this is a good way to deal with
it).  I also annotated the "Same as" text so that it can be
translated.

IMO, printing the aliased option's help text makes using the output
easier for users who find the undocumented option first, in that
they don't then have to go look for the one that does have
documentation, so I left that part in place.  If you or someone
feels strongly that it shouldn't be there I'll remove it.

Let me know your thoughts on this version.

Martin

2015-10-14  Martin Sebor  

* options.c (wrap_help): End last sentence in a period.
(print_filtered_help): Print help for aliased option and its name
instead of undocumented text for undocumented options.


--git a/gcc/opts.c b/gcc/opts.c
index 2bbf653..7debf33 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -960,9 +960,19 @@ wrap_help (const char *help,
 {
   unsigned int col_width = LEFT_COLUMN;
   unsigned int remaining, room, len;
+  char *new_help = NULL;

   remaining = strlen (help);
-
+  if (remaining && help [remaining - 1] != '.')
+{
+  /* Help text in .opt files doesn't always end in a period.  Make
+	 it constistent by appending a period when it's missing.  */
+  new_help = XNEWVEC (char, remaining + 2);
+  memcpy (new_help, help, remaining);
+  new_help [remaining++] = '.';
+  new_help [remaining] = '\0';
+  help = new_help;
+}
   do
 {
   room = columns - 3 - MAX (col_width, item_width);
@@ -995,6 +1005,9 @@ wrap_help (const char *help,
   remaining -= len;
 }
   while (remaining);
+
+  /* Free the help text allocated by this function.  */
+  XDELETEVEC (const_cast(new_help));
 }

 /* Print help for a specific front-end, etc.  */
@@ -1010,7 +1023,7 @@ print_filtered_help (unsigned int include_flags,
   const char *help;
   bool found = false;
   bool displayed = false;
-  char new_help[128];
+  char new_help[256];

   if (include_flags == CL_PARAMS)
 {
@@ -1086,9 +1099,41 @@ print_filtered_help (unsigned int include_flags,
 	{
 	  if (exclude_flags & CL_UNDOCUMENTED)
 	continue;
+
+	  if (option->alias_target < N_OPTS
+	  && cl_options [option->alias_target].help)
+	{
+	  /* For undocumented options that are aliases for other
+		 options that are documented, print the other option's
+		 help and name.  */
+	  help = cl_options [option->alias_target].help;
+
+	  const char *alias_text =
+		cl_options [option->alias_target].opt_text;
+
+	  /* End the alias help with a period if it doesn't.  */
+	  const char *maybe_period =
+		alias_text [strlen (alias_text) - 1] == '.' ? "" : ".";
+
+	  snprintf (new_help, sizeof new_help, "%s", help);
+	  snprintf (new_help + strlen (new_help),
+			sizeof new_help - strlen (new_help),
+			_("%s  Same as %s"),
+			maybe_period,
+			cl_options [option->alias_target].opt_text);
+	  help = new_help;
+	}
+	  else
 	help = undocumented_msg;
 	}

+  if (option->warn_message)
+	{
+	  snprintf (new_help, sizeof new_help, ">>>%s<<<",
+		option->warn_message);
+	  help = new_help;
+	}
+
   /* Get the translation.  */
   help = _(help);

Re: [PATCH] Fix wrong-code when folding X - (X / Y) * Y (PR tree-optimization/67953)

2015-10-14 Thread Richard Biener

On October 14, 2015 8:27:31 PM GMT+02:00, Marek Polacek  
wrote:
>Evidently, the X - (X / Y) * Y -> X % Y pattern can't change the
>signedness of X from signed to unsigned, otherwise we'd generate
>wrong code.  (But unsigned -> signed should be fine.)
>
>Does anyone see a better fix than this?
>
>Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Thanks,
Richard.

>2015-10-14  Marek Polacek  
>
>   PR tree-optimization/67953
>   * match.pd (X - (X / Y) * Y): Don't change signedness of @0.
>
>   * gcc.dg/fold-minus-6.c (fn4): Change the type of A to
>   unsigned.
>   * gcc.dg/torture/pr67953.c: New test.
>
>diff --git gcc/match.pd gcc/match.pd
>index 655c9ff..24e19a9 100644
>--- gcc/match.pd
>+++ gcc/match.pd
>@@ -267,7 +267,8 @@ along with GCC; see the file COPYING3.  If not see
> /* X - (X / Y) * Y is the same as X % Y.  */
> (simplify
>  (minus (convert1? @0) (convert2? (mult (trunc_div @0 @1) @1)))
>- (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>+  && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type))
>   (trunc_mod (convert @0) (convert @1
> 
> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
>diff --git gcc/testsuite/gcc.dg/fold-minus-6.c
>gcc/testsuite/gcc.dg/fold-minus-6.c
>index 1c22c25..1535452 100644
>--- gcc/testsuite/gcc.dg/fold-minus-6.c
>+++ gcc/testsuite/gcc.dg/fold-minus-6.c
>@@ -20,7 +20,7 @@ fn3 (long int x)
> }
> 
> int
>-fn4 (int a, int b)
>+fn4 (unsigned int a, int b)
> {
>   return a - (unsigned) ((a / b) * b);
> }
>diff --git gcc/testsuite/gcc.dg/torture/pr67953.c
>gcc/testsuite/gcc.dg/torture/pr67953.c
>index e69de29..5ce399b 100644
>--- gcc/testsuite/gcc.dg/torture/pr67953.c
>+++ gcc/testsuite/gcc.dg/torture/pr67953.c
>@@ -0,0 +1,42 @@
>+/* PR tree-optimization/67953 */
>+/* { dg-do run } */
>+
>+unsigned int
>+fn1 (signed int a)
>+{
>+  return (unsigned int) a - ((a / 3) * 3);
>+}
>+
>+unsigned int
>+fn2 (signed int a)
>+{
>+  return a - ((a / 3) * 3);
>+}
>+
>+unsigned int
>+fn3 (int a)
>+{
>+  return a - (unsigned) ((a / 3) * 3);
>+}
>+
>+signed int
>+fn4 (int a)
>+{
>+  return (unsigned) a - (unsigned) ((a / 3) * 3);
>+}
>+
>+signed int
>+fn5 (unsigned int a)
>+{
>+  return (signed) a - (int) ((a / 3) * 3);
>+}
>+
>+int
>+main ()
>+{
>+  if (fn1 (-5) != -2
>+  || fn2 (-5) != -2
>+  || fn3 (-5) != -2
>+  || fn4 (-5) != -2)
>+__builtin_abort ();
>+}
>
>   Marek

Re: [PATCH 5/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko


On 14/10/15 10:37, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 02:20:06PM +0300, Maxim Ostapenko wrote:

This patch removes UBSan stubs from ASan and TSan code. We don't embed UBSan
to ASan and UBSan because that would lead to undefined references to C++
stuff when linking with -static-libasan. AFAIK, sanitizer developers use
different libraries for C and CXX runtimes, but I think this is out of scope
of this merge.

Where is CAN_SANITIZE_UB defined?  I don't see it anywhere in the current
libsanitizer and in the patch only:
grep CAN_SANITIZE_UB libsanitizer-249633-2.diff
+#if CAN_SANITIZE_UB
+# define TSAN_CONTAINS_UBSAN (CAN_SANITIZE_UB && !defined(SANITIZER_GO))
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB


Hm, this is strange, perhaps the patch was malformed.



So, unless I'm missing something, it would be best to arrange for
-DCAN_SANITIZE_UB=1 to be in CXXFLAGS for ubsan/ source files and
-DCAN_SANITIZE_UB=0 to be in CXXFLAGS for {a,t}san/ source files?


CAN_SANITIZE_UB definition is hardcoded into new ubsan/ubsan_platform.h 
file. To use DCAN_SANITIZE_UB from CXXFLAGS, we still need some changes 
in libsanitizer against upstream:


Index: libsanitizer/ubsan/ubsan_platform.h
===
--- libsanitizer/ubsan/ubsan_platform.h(revision 250295)
+++ libsanitizer/ubsan/ubsan_platform.h(working copy)
@@ -13,6 +13,7 @@
 #ifndef UBSAN_PLATFORM_H
 #define UBSAN_PLATFORM_H

+#ifndef CAN_SANITIZE_UB
 // Other platforms should be easy to add, and probably work as-is.
 #if (defined(__linux__) || defined(__FreeBSD__) || defined(__APPLE__)) 
&& \

 (defined(__x86_64__) || defined(__i386__) || defined(__arm__) || \
@@ -23,5 +24,6 @@
 #else
 # define CAN_SANITIZE_UB 0
 #endif
+#endif // CAN_SANITIZE_UB

 #endif


Are there any other defines that are supposedly set from cmake or wherever
upstream and are left undefined?


There is ASAN_DYNAMIC macro, but I see it into current libsanitizer too 
and it's not touched in any Makefile. Same for 
ASAN_DYNAMIC_RUNTIME_THUNK, that is used for Windows build and 
ASAN_LOW_MEMORY, that set explicitly only for Android. Do we need to 
touch them?
Also, ASAN_FLEXIBLE_MAPPING_AND_OFFSET was bumped upstream, so we don't 
need it anymore.


I'm applying the patch mentioned above, redefining CAN_SANITIZE_UB in 
corresponding Makefiles, dropping ASAN_FLEXIBLE_MAPPING_AND_OFFSET and 
resending libsanitizer-249633-2.diff in corresponding thread.

2015-10-13  Maxim Ostapenko  

* tsan/tsan_defs.h: Define TSAN_CONTAINS_UBSAN to 0.
* asan/asan_flags.cc (InitializeFlags): Do not initialize UBSan flags.
* asan/asan_rtl.cc (AsanInitInternal): Do not init UBSan.

Jakub

Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2015-10-14 Thread H.J. Lu

On Wed, Oct 14, 2015 at 9:46 AM, Ramana Radhakrishnan
 wrote:
> On Wed, Oct 14, 2015 at 5:21 PM, H.J. Lu  wrote:
>
>> ---
>> gcc/c/
>>
>> PR middle-end/67220
>> * c-decl.c (diagnose_mismatched_decls): Copy explicit visibility
>> to builtin function.
>>
>> gcc/
>>
>> PR middle-end/67220
>> * expr.c (init_block_move_fn): Copy visibility from the builtin
>> memcpy.
>> (init_block_clear_fn): Copy visibility from the builtin memset.
>>
>> gcc/testsuite/
>>
>> PR middle-end/67220
>> * gcc.target/i386/pr67220-1.c: New test.
>> * gcc.target/i386/pr67220-2.c: Likewise.
>> * gcc.target/i386/pr67220-3.c: Likewise.
>> * gcc.target/i386/pr67220-4.c: Likewise.
>> * gcc.target/i386/pr67220-5.c: Likewise.
>> * gcc.target/i386/pr67220-6.c: Likewise.
>
> Why aren't these tests in gcc.dg ?  The problem affects all targets
> not just x86.
>

If I move tests to gcc.dg, would you mind updating them to verify
that they pass on arm?


-- 
H.J.

[PATCH] Split ssa-dom-thread-2.c into separate files/tests

2015-10-14 Thread Jeff Law



ssa-dom-thread-2.c is actually 6 distinct tests crammed into a single 
file.  That's normally not a huge problem, but it can make tests hard to 
write when we're scanning dumps.


This patch splits it into 6 distinct tests.  ssa-dom-thread-2[a-f].c. 
It also tightens the expected output slightly for each test and adds 
further comments to the tests.



Tested on x86_64-linux-gnu.  Installed on the trunk.

Jeff
commit 8715d70d2da7ab22a793437c6298c51a6fbae70f
Author: Jeff Law 
Date:   Wed Oct 14 13:09:29 2015 -0400

[PATCH] Split ssa-dom-thread-2.c into separate files/tests

* gcc.dg/tree-ssa/ssa-dom-thread-2.c: Deleted.  The six functions
contained within have their own file/test now.
* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: New test extracted from
ssa-dom-thread-2.c.  Tighten expected output slightly and comment
expectations a bit more.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2c.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2d.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2e.c: Likewise.
* gcc.dg/tree-ssa/ssa-dom-thread-2f.c: Likewise.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8009732..f45ab81 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,16 @@
 2015-10-14  Jeff Law  
 
+   * gcc.dg/tree-ssa/ssa-dom-thread-2.c: Deleted.  The six functions
+   contained within have their own file/test now.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2a.c: New test extracted from
+   ssa-dom-thread-2.c.  Tighten expected output slightly and comment
+   expectations a bit more.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2c.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2d.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2e.c: Likewise.
+   * gcc.dg/tree-ssa/ssa-dom-thread-2f.c: Likewise.
+
 PR testsuite/67959
* gcc.dg/tree-ssa/ssa-thread-13.c: Avoid bitfield assumptions.
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c
deleted file mode 100644
index bb697d1..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2.c
+++ /dev/null
@@ -1,117 +0,0 @@
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats" } */
-
-void foo();
-void bla();
-void bar();
-
-/* In the following two cases, we should be able to thread edge through
-   the loop header.  */
-
-void thread_entry_through_header (void)
-{
-  int i;
-
-  for (i = 0; i < 170; i++)
-bla ();
-}
-
-void thread_latch_through_header (void)
-{
-  int i = 0;
-  int first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-/* This is a TODO -- it is correct to thread both entry and latch edge through
-   the header, but we do not handle this case yet.  */
-
-void dont_thread_1 (void)
-{
-  int i = 0;
-  int first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-/* Avoid threading in the following two cases, to prevent creating subloops.  
*/
-
-void dont_thread_2 (int first)
-{
-  int i = 0;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = 0;
-  bla ();
-} while (i++ < 100);
-}
-
-void dont_thread_3 (int nfirst)
-{
-  int i = 0;
-  int first = 0;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = nfirst;
-  bla ();
-} while (i++ < 100);
-}
-
-/* Avoid threading in this case, in order to avoid creating loop with
-   multiple entries.  */
-
-void dont_thread_4 (int a, int nfirst)
-{
-  int i = 0;
-  int first;
-
-  if (a)
-first = 0;
-  else
-first = 1;
-
-  do
-{
-  if (first)
-   foo ();
-  else
-   bar ();
-
-  first = nfirst;
-  bla ();
-} while (i++ < 100);
-}
-
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 0 "dom1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 1 "dom1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
new file mode 100644
index 000..73d0ccf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats" } */
+
+void bla();
+
+/* In the following case, we should be able to thread edge through
+   the loop header.  */
+
+void thread_entry_through_header (void)
+{
+  int i;
+
+  for (i = 0; i < 170; i++)
+bla ();
+}
+
+/* There's a single jump thread

Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Jeff Law

On 10/13/2015 02:16 PM, Bernd Schmidt wrote:

_Potentially_ so, yes.  However, GCC is free to put the allocation into
an otherwise-unused part of the stack frame.

Well, I looked at code generation changes, and it usually seems to come
with an increase in stack frame size - sometimes causing extra
instructions to be emitted.

I think that's essentially unavoidable when we end up using the scratchpad.

However, why do we need to allocate anything in the first place?

 > If you want to store something that will be thrown away,
 > just pick an address below the stack pointer.

Because allocating a scratchpad should work on all relevant targets.  We
do not have the resources to test on all GCC-supported
CPU ISAs and on all GCC-supported OSes, and we would like to have an
optimization that works on as many targets as makes sense
[those with cmove-like ability and withOUT full-blown conditional
execution].

Yeah, but if you put in a new facility like this, chances are
maintainers for active targets will pick it up and add the necessary
hooks. That's certainly what happened with shrink-wrapping. So I don't
think this is a concern.
But won't you get valgrind warnings if the code loads/stores outside the 
defined stack?  While we know it's safe, the warnings from valgrind will 
likely cause a backlash of user complaints.

I'm afraid I'll have to reject the patch then, on these grounds:
  * it may pessimize code
  * it does not even estimate costs to attempt avoiding this
  * a much simpler, more efficient implementation is possible.

Noted.  I think the pessimization is the area were folks are most concerned.

Obviously some pessimization relative to current code is necessary to 
fix some of the problems WRT thread safety and avoiding things like 
introducing faults in code which did not previously fault.

However, pessimization of safe code is, err, um, bad and needs to be 
avoided.

Jeff

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Adhemerval Zanella

On 14-10-2015 04:54, Jakub Jelinek wrote:
> On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
>> On 13/10/15 14:15, Maxim Ostapenko wrote:
>>> This is the raw merge itself. I'm bumping SONAME to libasan.so.3.
>>>
>>> -Maxim
>>
>> I have just noticed that I've misused autoconf stuff (used wrong version).
>> Here a fixed version of the same patch. Sorry for inconvenience.
> 
> Is libubsan, libtsan backwards compatible, or do we want to change SONAME
> there too?
> 
> The aarch64 changes are terrible, not just because it doesn't yet have
> runtime decision on what VA to use or that it doesn't support 48-bit VA,
> but also that for the 42-bit VA it uses a different shadow offset from
> 39-bit VA.  But on the compiler side we have just one...
> Though, only the 39-bit VA is enabled right now by default, so out of the
> box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
> VA have to patch libsanitizer.

Yes we are aware with current deficiencies for aarch64 with a 39-bit and 
42-bit vma and the lack of support of 48-bit vma. On LLVM side current 
approach is to built compiler support for either 39 or 42 bit (and again
we are aware this is not the ideal approach). This approach was used
mainly for validation and buildbot enablement.

I am currently working on a way to make the sanitizer (asan and msan)
VMA independent on aarch64 (aiming to current support 39/42 and later 
48-bit vma) and the approach we decide to use is to change the 
instrumentation to use a parametrized value to compute the shadow memory
regions (based on VMA address) which will be initialized externally
by the  ibsanitizer. TSAN is somewhat easier since instrumentation 
does not  take in consideration the VMA (only the libsanitizer itself).

The idea is to avoid compiler switches a make is transparent to run
the binary regardless of the VMA in the system. The downside is 
instrumentation will require more steps (to lead the parametrized
value to compute shadow memory) and thus slower.

> 
> Have you verified libbacktrace sanitization still works properly (that is
> something upstream does not test)?
> 
> Do you plan to update the asan tests we have to reflect the changes in
> upstream?
> 
>   Jakub
>

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Renato Golin

On 14 October 2015 at 19:21, Evgenii Stepanov  wrote:
> Wait. As Jakub correctly pointed out in the other thread, there is no
> obvious reason why there could not be a single shadow offset value
> that would work for all 3 possible VMA settings. I suggest figuring
> this out first.

We are.

cheers,
--renato

Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-14 Thread Jan Hubicka

Hi,
this patch adds VIEW_CONVERT_EXPR which is another code omitted in
operand_equal_p.  During bootstrap there are about 1000 matches.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Handle VIEW_CONVERT_EXPR.
Index: fold-const.c
===
--- fold-const.c(revision 228735)
+++ fold-const.c(working copy)
@@ -2962,6 +2968,12 @@ operand_equal_p (const_tree arg0, const_
case IMAGPART_EXPR:
  return OP_SAME (0);
 
+   case VIEW_CONVERT_EXPR:
+ if (!(flags & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF))
+ && !types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1)))
+   return false;
+ return OP_SAME (0);
+
case TARGET_MEM_REF:
case MEM_REF:
  if (!(flags & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)))

Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jan Hubicka

Hi,
this patch adds the CONSTRUCTOR case discussed while back.  Only empty
constructors are matched, as those are only appearing in gimple operand.
I tested that during bootstrap about 7500 matches are for empty ctors.
There are couple hundred for non-empty probably used on generic. 

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Match empty constructors.
Index: fold-const.c
===
--- fold-const.c(revision 228735)
+++ fold-const.c(working copy)
@@ -2890,6 +2891,11 @@ operand_equal_p (const_tree arg0, const_
return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0),
flags | OEP_ADDRESS_OF
| OEP_CONSTANT_ADDRESS_OF);
+  case CONSTRUCTOR:
+   /* In GIMPLE empty constructors are allowed in initializers of
+  vector types.  */
+   return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
+   && !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
   default:
break;
   }

[gomp4] remove dead code

2015-10-14 Thread Nathan Sidwell

I've committed this to gomp4 branch.  It removes some now unreachable code and 
removes the now bogus description about OpenACC.


nathan
2015-10-14  Nathan Sidwell  

	* omp-low.c (lower_reduction_clauses): Correct comment, remove
	unreachable code.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228810)
+++ gcc/omp-low.c	(working copy)
@@ -5088,9 +5115,7 @@ lower_oacc_head_tail (location_t loc, tr
   lower_oacc_loop_marker (loc, false, NULL_TREE, tail);
 }
 
-/* Generate code to implement the REDUCTION clauses.  OpenACC reductions
-   are usually executed in parallel, but they fallback to sequential code for
-   known single-threaded regions.  */
+/* Generate code to implement the REDUCTION clauses.  */
 
 static void
 lower_reduction_clauses (tree clauses, gimple_seq *stmt_seqp, omp_context *ctx)
@@ -5153,23 +5178,11 @@ lower_reduction_clauses (tree clauses, g
 
 	  addr = save_expr (addr);
 
-	  if (is_gimple_omp_oacc (ctx->stmt)
-	  && (ctx->gwv_this == 0))
-	{
-	  /* This reduction is done sequentially in OpenACC by a single
-		 thread.  There is no need to use atomics.  */
-	  x = build2 (code, TREE_TYPE (ref), ref, new_var);
-	  ref = build_outer_var_ref (var, ctx);
-	  gimplify_assign (ref, x, stmt_seqp);
-	}
-	  else
-	{
-	  ref = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (addr)), addr);
-	  x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref,
-   new_var);
-	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
-	  gimplify_and_add (x, stmt_seqp);
-	}
+	  ref = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (addr)), addr);
+	  x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref,
+			   new_var);
+	  x = build2 (OMP_ATOMIC, void_type_node, addr, x);
+	  gimplify_and_add (x, stmt_seqp);
 
 	  return;
 	}

Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jeff Law


On 10/14/2015 10:27 AM, Jan Hubicka wrote:

Hi,
this patch adds the CONSTRUCTOR case discussed while back.  Only empty
constructors are matched, as those are only appearing in gimple operand.
I tested that during bootstrap about 7500 matches are for empty ctors.
There are couple hundred for non-empty probably used on generic.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* fold-const.c (operand_equal_p): Match empty constructors.
OK.  It'd be useful to have a test which shows that matching these 
results in some kind of difference we can see in the dump files.


jeff

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Evgenii Stepanov

On Wed, Oct 14, 2015 at 11:03 AM, Adhemerval Zanella
 wrote:
>
>
> On 14-10-2015 04:54, Jakub Jelinek wrote:
>> On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
>>> On 13/10/15 14:15, Maxim Ostapenko wrote:
 This is the raw merge itself. I'm bumping SONAME to libasan.so.3.

 -Maxim
>>>
>>> I have just noticed that I've misused autoconf stuff (used wrong version).
>>> Here a fixed version of the same patch. Sorry for inconvenience.
>>
>> Is libubsan, libtsan backwards compatible, or do we want to change SONAME
>> there too?
>>
>> The aarch64 changes are terrible, not just because it doesn't yet have
>> runtime decision on what VA to use or that it doesn't support 48-bit VA,
>> but also that for the 42-bit VA it uses a different shadow offset from
>> 39-bit VA.  But on the compiler side we have just one...
>> Though, only the 39-bit VA is enabled right now by default, so out of the
>> box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
>> VA have to patch libsanitizer.
>
> Yes we are aware with current deficiencies for aarch64 with a 39-bit and
> 42-bit vma and the lack of support of 48-bit vma. On LLVM side current
> approach is to built compiler support for either 39 or 42 bit (and again
> we are aware this is not the ideal approach). This approach was used
> mainly for validation and buildbot enablement.
>
> I am currently working on a way to make the sanitizer (asan and msan)
> VMA independent on aarch64 (aiming to current support 39/42 and later
> 48-bit vma) and the approach we decide to use is to change the
> instrumentation to use a parametrized value to compute the shadow memory
> regions (based on VMA address) which will be initialized externally
> by the  ibsanitizer. TSAN is somewhat easier since instrumentation
> does not  take in consideration the VMA (only the libsanitizer itself).

Wait. As Jakub correctly pointed out in the other thread, there is no
obvious reason why there could not be a single shadow offset value
that would work for all 3 possible VMA settings. I suggest figuring
this out first.

>
> The idea is to avoid compiler switches a make is transparent to run
> the binary regardless of the VMA in the system. The downside is
> instrumentation will require more steps (to lead the parametrized
> value to compute shadow memory) and thus slower.
>
>
>
>>
>> Have you verified libbacktrace sanitization still works properly (that is
>> something upstream does not test)?
>>
>> Do you plan to update the asan tests we have to reflect the changes in
>> upstream?
>>
>>   Jakub
>>

Re: [PATCH 3/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:17:46PM +0300, Maxim Ostapenko wrote:
> This is just reapplied patch for SPARC by David S. Miller. I was unable to
> test this, so could anyone help me here?

This is ok if all the other changes are approved.  You don't need to list
my name in there, just list David's.  We don't want 20 copies of Reapply in
a few years.

> 2015-10-12  Maxim Ostapenko  
> 
>   PR sanitizer/63958
>   Reapply:
>   2015-03-09  Jakub Jelinek  
> 
>   PR sanitizer/63958
>   Reapply:
>   2014-10-14  David S. Miller  
> 
>   * sanitizer_common/sanitizer_platform_limits_linux.cc (time_t):
>   Define at __kernel_time_t, as needed for sparc.
>   (struct __old_kernel_stat): Don't check if __sparc__ is defined.
>   * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
>   (__sanitizer): Define struct___old_kernel_stat_sz,
>   struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc.
>   (__sanitizer_ipc_perm): Adjust for sparc targets.
>   (__sanitizer_shmid_ds): Likewsie.
>   (__sanitizer_sigaction): Likewise.
>   (IOC_SIZE): Likewsie.

Jakub

Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:16:23PM +0300, Maxim Ostapenko wrote:
> This patch introduces required compiler changes. Now, we don't version
> asan_init, we have a special __asan_version_mismatch_check_v[n] symbol for
> this.

For this, I just have to wonder what is the actual improvement over what we
had.  To me it looks like a step in the wrong direction, it will only bloat
the size of the ctors.  I can live with it, but just want to put on record I
think it is a mistake.

> Also, asan_stack_malloc_[n] doesn't take a local stack as a second parameter
> anymore, so don't pass it.

I think this is another mistake, but this time with actually bad fix on the
compiler side for it.  If I read the code well, previously
__asan_stack_malloc_n would return you the local stack if it failed for
whatever reason, which is actually what you want as fallback.
But, the new code returns NULL instead, so I think you would need to compare
the return value of __asan_stack_malloc_n with NULL and if it is NULL, use
the addr instead of what it returned; which is not what your asan.c change
does.  Now, what is the improvement here?  Bloat the compiler generated
code... :(

> 2015-10-12  Maxim Ostapenko  
> 
> config/
> 
>   * bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with
>   LSAN_OPTIONS=detect_leaks

Missing . at the end, and the config/ hunk missing from the patch.

> gcc/
> 
>   * asan.c (asan_emit_stack_protection): Don't pass local stack to
>   asan_stack_malloc_[n] anymore.
>   (asan_finish_file): Instert __asan_version_mismatch_check_v[n] call.

s/Instert/Instead/

Jakub

Re: [PATCH 6/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:21:21PM +0300, Maxim Ostapenko wrote:
> This patch adjusts the fix for
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771 to extract the last PC
> from the stack frame if no valid FP is available for ARM.

I guess this is ok once all other changes are acked.

> 2015-10-13  Maxim Ostapenko  
> 
>   * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we
>   compiled code with GCC when extracting the caller PC for ARM if no
>   valid frame pointer is available.
> 
> Index: libsanitizer/sanitizer_common/sanitizer_stacktrace.cc
> ===
> --- libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (revision 
> 250059)
> +++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (working copy)
> @@ -62,8 +62,8 @@
>// Nope, this does not look right either. This means the frame after next 
> does
>// not have a valid frame pointer, but we can still extract the caller PC.
>// Unfortunately, there is no way to decide between GCC and LLVM frame
> -  // layouts. Assume LLVM.
> -  return bp_prev;
> +  // layouts. Assume GCC.
> +  return bp_prev - 1;
>  #else
>return (uhwptr*)bp;
>  #endif


Jakub

Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:22:36PM +0300, Maxim Ostapenko wrote:
> This is the final patch. Force libsanitizer to use an old ABI for ubsan
> float cast data descriptors, because for some exprs (e.g. that type of
> tcc_declaration) we can't get the right location for now. I'm not sure about
> this, perhaps it should be fixed in GCC somehow.

I don't like this (neither the heuristics on the libubsan, it wouldn't be a
big deal to add a new library entrypoint).
If because of the heuristics you need to ensure that the SourceLocation is
always known, then either you check in ubsan.c whether expand_location
gives you NULL xloc.file and in that case use old style float cast overflow
(without location) - i.e. pass 0, NULL, otherwise you use new style, i.e.
pass 1,   Or arrange through some special option to emit something like
{ "", 0, 0 } instead of { NULL, 0, 0 } for the float cast case.
And, regardless of this, any progress in making sure we have fewer cases
with UNKNOWN_LOCATION on this will not hurt.  I think at this point I'd
prefer the first choice, i.e. using old style for locations without
filename, and new style otherwise.

> 2015-10-13  Maxim Ostapenko  
> 
>   * ubsan/ubsan_handlers.cc (looksLikeFloatCastOverflowDataV1): Always
>   return true for now.
> 
> Index: libsanitizer/ubsan/ubsan_handlers.cc
> ===
> --- libsanitizer/ubsan/ubsan_handlers.cc  (revision 250059)
> +++ libsanitizer/ubsan/ubsan_handlers.cc  (working copy)
> @@ -307,6 +307,9 @@
>  }
>  
>  static bool looksLikeFloatCastOverflowDataV1(void *Data) {
> +  // (TODO): propagate SourceLocation into DataDescriptor and use this
> +  // heuristic than.
> +  return true;
>// First field is either a pointer to filename or a pointer to a
>// TypeDescriptor.
>u8 *FilenameOrTypeDescriptor;

Jakub

[PATCH] More vectorizer TLC

2015-10-14 Thread Richard Biener


Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-14  Richard Biener  

* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Reset info at start.
(vect_analyze_group_access_1): Add debug print.
* tree-vect-loop.c (vect_get_single_scalar_iteration_cost): Rename ...
(vect_compute_single_scalar_iteration_cost): ... to this.
(vect_analyze_loop_2): Adjust.
* tree-vect-slp.c (struct _slp_oprnd_info): Move from ...
* tree-vectorizer.h: ... here.
(add_stmt_info_to_vec): Remove.
* tree-vect-stmts.c (record_stmt_cost): Inline add_stmt_info_to_vec.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 228759)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_enhance_data_refs_alignment (loop_v
*** 1352,1357 
--- 1352,1361 
  dump_printf_loc (MSG_NOTE, vect_location,
   "=== vect_enhance_data_refs_alignment ===\n");
  
+   /* Reset data so we can safely be called multiple times.  */
+   LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).truncate (0);
+   LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
+ 
/* While cost model enhancements are expected in the future, the high level
   view of the code at this time is as follows:
  
*** vect_analyze_group_access_1 (struct data
*** 2151,2156 
--- 2155,2164 
return false;
  }
  
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Two or more load stmts share the same dr.\n");
+ 
/* For load use the same data-ref load.  */
GROUP_SAME_DR_STMT (vinfo_for_stmt (next)) = prev;
  
Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 228759)
--- gcc/tree-vect-loop.c(working copy)
*** destroy_loop_vec_info (loop_vec_info loo
*** 1043,1049 
  
  /* Calculate the cost of one scalar iteration of the loop.  */
  static void
! vect_get_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
  {
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
--- 1043,1049 
  
  /* Calculate the cost of one scalar iteration of the loop.  */
  static void
! vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
  {
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
*** vect_analyze_loop_2 (loop_vec_info loop_
*** 1739,1745 
  }
  
/* Compute the scalar iteration cost.  */
!   vect_get_single_scalar_iteration_cost (loop_vinfo);
  
/* This pass will decide on using loop versioning and/or loop peeling in
   order to enhance the alignment of data references in the loop.  */
--- 1739,1745 
  }
  
/* Compute the scalar iteration cost.  */
!   vect_compute_single_scalar_iteration_cost (loop_vinfo);
  
/* This pass will decide on using loop versioning and/or loop peeling in
   order to enhance the alignment of data references in the loop.  */
Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 228759)
--- gcc/tree-vect-slp.c (working copy)
*** vect_create_new_slp_node (vec
*** 135,140 
--- 135,157 
  }
  
  
+ /* This structure is used in creation of an SLP tree.  Each instance
+corresponds to the same operand in a group of scalar stmts in an SLP
+node.  */
+ typedef struct _slp_oprnd_info
+ {
+   /* Def-stmts for the operands.  */
+   vec def_stmts;
+   /* Information about the first statement, its vector def-type, type, the
+  operand itself in case it's constant, and an indication if it's a pattern
+  stmt.  */
+   enum vect_def_type first_dt;
+   tree first_op_type;
+   bool first_pattern;
+   bool second_pattern;
+ } *slp_oprnd_info;
+ 
+ 
  /* Allocate operands info for NOPS operands, and GROUP_SIZE def-stmts for each
 operand.  */
  static vec 
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 228759)
--- gcc/tree-vect-stmts.c   (working copy)
*** record_stmt_cost (stmt_vector_for_cost *
*** 94,105 
if (body_cost_vec)
  {
tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
!   add_stmt_info_to_vec (body_cost_vec, count, kind,
!   stmt_info ? STMT_VINFO_STMT (stmt_info) : NULL,
!   misalign);
return (unsigned)
(builtin_vectorization_cost (kind, vectype, misalign) * count);
-
  }
else
  return add_stmt_cost

Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber


On 14/10/15 10:04, Jakub Jelinek wrote:

No, both the above changes are wrong.  There is not a single int32_t
written, but could be many more, it is an array of 32-bit integers.
I'd say you just want to cast explicitly,
   omp_get_place_proc_ids (*place_num, (int *) ids);
and
   omp_get_parition_place_nums ((int *) place_nums);
The reason for int32_t is that on the Fortran side it is integer(kind=4)
and everywhere else for that int32_t is used.
If this works, the patch is preapproved.


I checked this in:

https://gcc.gnu.org/viewcvs/gcc?view=revision=228805

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH] Optimize const1 * copysign (const2, y) in reassoc (PR tree-optimization/67815)

2015-10-14 Thread Richard Biener

On Tue, 13 Oct 2015, Marek Polacek wrote:

> This patch implements the copysign optimization for reassoc I promised
> I'd look into.  I.e.,
> 
> CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0
> CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0
> 
> After getting familiar with reassoc a bit this wasn't that hard.  But
> I'm hopeless when it comes to floating-point stuff, so I'd appreciate
> if you could glance over the tests.  The reassoc-40.c should address
> Joseph's comment in the audit trail (with -fno-rounding-math the
> optimization would take place).
> 
> For 0.0 * copysign (cst, x), the result is folded into 0.0 way before
> reassoc, so we probably don't have to pay attention to this case.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-10-13  Marek Polacek  
> 
>   PR tree-optimization/67815
>   * tree-ssa-reassoc.c (attempt_builtin_copysign): New function.
>   (reassociate_bb): Call it.
> 
>   * gcc.dg/tree-ssa/reassoc-39.c: New test.
>   * gcc.dg/tree-ssa/reassoc-40.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> index e69de29..589d06b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-39.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-reassoc1-details" } */
> +
> +float
> +f0 (float x)
> +{
> +  return 7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +float
> +f1 (float x)
> +{
> +  return -7.5 * __builtin_copysignf (2.0, x);
> +}
> +
> +double
> +f2 (double x, double y)
> +{
> +  return x * ((1.0/12) * __builtin_copysign (1.0, y));
> +}
> +
> +double
> +f3 (double x, double y)
> +{
> +  return (x * (-1.0/12)) * __builtin_copysign (1.0, y);
> +}
> +
> +double
> +f4 (double x, double y, double z)
> +{
> +  return (x * z) * ((1.0/12) * __builtin_copysign (4.0, y));
> +}
> +
> +double
> +f5 (double x, double y, double z)
> +{
> +  return (x * (-1.0/12)) * z * __builtin_copysign (2.0, y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Optimizing copysign" 6 "reassoc1"} }*/
> diff --git gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c 
> gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> index e69de29..d65bcc1b 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-40.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/67815 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -frounding-math -fdump-tree-reassoc1-details" } */
> +
> +/* Test that the copysign reassoc optimization doesn't fire for
> +   -frounding-math (i.e. HONOR_SIGN_DEPENDENT_ROUNDING) if the multiplication
> +   is inexact.  */
> +
> +double
> +f1 (double y)
> +{
> +  return (1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +double
> +f2 (double y)
> +{
> +  return (-1.2 * __builtin_copysign (1.1, y));
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Optimizing copysign" "reassoc1" } } */
> diff --git gcc/tree-ssa-reassoc.c gcc/tree-ssa-reassoc.c
> index 879722e..b8897b7 100644
> --- gcc/tree-ssa-reassoc.c
> +++ gcc/tree-ssa-reassoc.c
> @@ -4622,6 +4622,95 @@ attempt_builtin_powi (gimple *stmt, vec *> *ops)
>return result;
>  }
>  
> +/* Attempt to optimize
> +   CST1 * copysign (CST2, y) -> copysign (CST1 * CST2, y) if CST1 > 0, or
> +   CST1 * copysign (CST2, y) -> -copysign (CST1 * CST2, y) if CST1 < 0.  */
> +
> +static void
> +attempt_builtin_copysign (vec *ops)
> +{
> +  operand_entry *oe;
> +  unsigned int i;
> +  unsigned int length = ops->length ();
> +  tree cst1 = ops->last ()->op;
> +
> +  if (length == 1 || TREE_CODE (cst1) != REAL_CST)
> +return;
> +
> +  FOR_EACH_VEC_ELT (*ops, i, oe)
> +{
> +  if (TREE_CODE (oe->op) == SSA_NAME)

I think you need to check whether the SSA_NAME has a single use only
as you are changing its value.  Which also means you shouldn't be
"reusing" it (because existing debug stmts will then be wrong).
Thus you have to replace it.

> + {
> +   gimple *def_stmt = SSA_NAME_DEF_STMT (oe->op);
> +   if (is_gimple_call (def_stmt))
> + {
> +   tree fndecl = gimple_call_fndecl (def_stmt);
> +   tree cst2;
> +   switch (DECL_FUNCTION_CODE (fndecl))
> + {
> + CASE_FLT_FN (BUILT_IN_COPYSIGN):
> +   cst2 = gimple_call_arg (def_stmt, 0);
> +   /* The first argument of copysign must be a constant,
> +  otherwise there's nothing to do.  */
> +   if (TREE_CODE (cst2) == REAL_CST)
> + {
> +   tree mul = const_binop (MULT_EXPR, TREE_TYPE (cst1),
> +   cst1, cst2);
> +   /* If we couldn't fold to a single constant, skip it.  */
> +   if (mul == NULL_TREE)
> + break;
> +   /* We're going to replace the copysign argument

Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-14 Thread Szabolcs Nagy


On 30/09/15 20:23, Andreas Krebbel wrote:

On 09/30/2015 06:21 PM, Szabolcs Nagy wrote:

On 30/09/15 14:47, Bernd Schmidt wrote:

On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the
issues on the various affected arches well enough. It looks like Jakub
had some input on the earlier changes, maybe he could take a look? Or
maybe rth knows best. Adding Ccs.

It would help to have examples of code generation demonstrating the
problem and how you would solve it. Input from the s390 maintainers
whether this is correct for their port would also be appreciated.


We are having the same problem on S/390. I think the GCC change is correct for 
S/390 as well.

-Andreas-



i think the approvals of arm and aarch64 maintainers
are needed to apply this fix for pr target/66912.

(only s390, arm and aarch64 use this predicate.)





consider the TU

__attribute__((visibility("protected"))) int n;

int f () { return n; }

if n "binds_local" then gcc -O -fpic -S is like

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, n
  ldr w0, [x0, #:lo12:n]
  ret
  .size   f, .-f
  .protected  n
  .comm   n,4,4

so 'n' is a direct reference, not accessed through
the GOT ('n' will be in the .bss of the dso).
this is the current behavior.

if i remove the protected visibility attribute
then the access goes through GOT:

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, _GLOBAL_OFFSET_TABLE_
  ldr x0, [x0, #:gotpage_lo15:n]
  ldr w0, [x0]
  ret
  .size   f, .-f
  .comm   n,4,4

protected visibility means the definition cannot
be overridden by another module, but it should
still allow extern references.

if the main module references such an object then
(as an implementation detail) it may use copy
relocation against it, which places 'n' in the
main module and the dynamic linker should make
sure that references to 'n' point there.

this is only possible if references to 'n' go
through the GOT (i.e. it should not be "binds_local").

Re: [PATCH 5/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:20:06PM +0300, Maxim Ostapenko wrote:
> This patch removes UBSan stubs from ASan and TSan code. We don't embed UBSan
> to ASan and UBSan because that would lead to undefined references to C++
> stuff when linking with -static-libasan. AFAIK, sanitizer developers use
> different libraries for C and CXX runtimes, but I think this is out of scope
> of this merge.

Where is CAN_SANITIZE_UB defined?  I don't see it anywhere in the current
libsanitizer and in the patch only:
grep CAN_SANITIZE_UB libsanitizer-249633-2.diff 
+#if CAN_SANITIZE_UB
+# define TSAN_CONTAINS_UBSAN (CAN_SANITIZE_UB && !defined(SANITIZER_GO))
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB
+#if CAN_SANITIZE_UB
+#endif  // CAN_SANITIZE_UB

So, unless I'm missing something, it would be best to arrange for
-DCAN_SANITIZE_UB=1 to be in CXXFLAGS for ubsan/ source files and
-DCAN_SANITIZE_UB=0 to be in CXXFLAGS for {a,t}san/ source files?
Are there any other defines that are supposedly set from cmake or wherever
upstream and are left undefined?

> 2015-10-13  Maxim Ostapenko  
> 
>   * tsan/tsan_defs.h: Define TSAN_CONTAINS_UBSAN to 0.
>   * asan/asan_flags.cc (InitializeFlags): Do not initialize UBSan flags.
>   * asan/asan_rtl.cc (AsanInitInternal): Do not init UBSan.

Jakub

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:
> On 13/10/15 14:15, Maxim Ostapenko wrote:
> >This is the raw merge itself. I'm bumping SONAME to libasan.so.3.
> >
> >-Maxim
> 
> I have just noticed that I've misused autoconf stuff (used wrong version).
> Here a fixed version of the same patch. Sorry for inconvenience.

Is libubsan, libtsan backwards compatible, or do we want to change SONAME
there too?

The aarch64 changes are terrible, not just because it doesn't yet have
runtime decision on what VA to use or that it doesn't support 48-bit VA,
but also that for the 42-bit VA it uses a different shadow offset from
39-bit VA.  But on the compiler side we have just one...
Though, only the 39-bit VA is enabled right now by default, so out of the
box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
VA have to patch libsanitizer.

Have you verified libbacktrace sanitization still works properly (that is
something upstream does not test)?

Do you plan to update the asan tests we have to reflect the changes in
upstream?

Jakub

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Richard Biener

On Wed, Oct 14, 2015 at 11:19 AM, Richard Biener
 wrote:
> On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:
>> On 10/14/2015 02:49 AM, Jeff Law wrote:
>>>
>>> The problem here is we don't know what address space the *0 is going to
>>> hit,
>>> right?
>>
>>
>> Correct, not before we do the walk of stmt to see what's present.
>>
>>> Isn't that also an issue for code generation as well?
>>
>>
>> What sort of problem are you thinking of?  I haven't seen one yet.
>
> The actual dereference of course has a properly address-space qualified zero.
>
> Only your walking depends on operand_equal_p to treat different address-space
> zero addresses as equal (which they are of course not ...):
>
>
> int
> operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
> {
> ...
>   /* Check equality of integer constants before bailing out due to
>  precision differences.  */
>   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
> {
>   /* Address of INTEGER_CST is not defined; check that we did not forget
>  to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
>   gcc_checking_assert (!(flags
>  & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
>   return tree_int_cst_equal (arg0, arg1);
> }
>
> but only later we do
>
>   /* We cannot consider pointers to different address space equal.  */
>   if (POINTER_TYPE_P (TREE_TYPE (arg0))
>   && POINTER_TYPE_P (TREE_TYPE (arg1))
>   && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
>   != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
> return 0;
>
> So "fixing" that would make the walker only look for default
> address-space zero dereferences.
>
> I think we need to fix operand_equal_p anyway because 0 is clearly not
> equal to 0 (only if
> they convert to the same literal)

I think you could trigger bogus CSE of dereferences of literal addresses
from different address-spaces.

Richard.

> Richard.
>
>
>>
>> r~

Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber

On 14/10/15 10:04, Jakub Jelinek wrote:

On Wed, Oct 14, 2015 at 09:34:48AM +0200, Sebastian Huber wrote:

>/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
>/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: expected
>'int *' but argument is of type 'int32_t * {aka long int *}'

Ugh, wasn't aware that some targets use long int for int32_t:(.

This is actually a feature of the newlib-stdint.h:

[...]
/* newlib uses 32-bit long in certain cases for all non-SPU
   targets.  */
#ifndef STDINT_LONG32
#define STDINT_LONG32 (LONG_TYPE_SIZE == 32)
#endif

#define SIG_ATOMIC_TYPE "int"

/* The newlib logic actually checks for sizes greater than 32 rather
   than equal to 64 for various 64-bit types.  */

#define INT8_TYPE (CHAR_TYPE_SIZE == 8 ? "signed char" : 0)
#define INT16_TYPE (SHORT_TYPE_SIZE == 16 ? "short int" : INT_TYPE_SIZE 
== 16 ? "int" : CHAR_TYPE_SIZE == 16 ? "signed char" : 0)
#define INT32_TYPE (STDINT_LONG32 ? "long int" : INT_TYPE_SIZE == 32 ? 
"int" : SHORT_TYPE_SIZE == 32 ? "short int" : CHAR_TYPE_SIZE == 32 ? 
"signed char" : 0)

[...]

This regularly causes problems like this. In addition it leads to a 
complicated PRI* macro definition in .

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Eric Botcazou

> If you're using one of the switches that checks for stack overflow at the
> start of the function, you certainly don't want to do any such stores.

There is a protection area for -fstack-check (STACK_CHECK_PROTECT bytes) so 
you can do stores just below the stack pointer as far as it's concerned.

There is indeed the issue of the mere writing below the stack pointer.  Our 
experience with various OSes and architectures shows that this almost always 
works.  The only problematic case is x86{-64}/Linux historically, where you 
cannot write below the page pointed to by the stack pointer (that's why there 
is a specific implementation of -fstack-check for x86{-64}/Linux).

-- 
Eric Botcazou

Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-14 Thread Richard Biener

On Tue, Oct 13, 2015 at 5:32 PM, David Malcolm  wrote:
> On Thu, 2015-09-24 at 10:15 +0200, Richard Biener wrote:
>> On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm  wrote:
>> > On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
>> >> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
>> >> > Hi,
>> >> >
>> >> > On Tue, 22 Sep 2015, David Malcolm wrote:
>> >> >
>> >> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> >> >> table ever get smaller, or does it only ever get inserted into?
>> >> >
>> >> > It only ever grows.
>> >> >
>> >> >> An idea I had is that we could stash short ranges directly into the 32
>> >> >> bits of location_t, by offsetting the per-column-bits somewhat.
>> >> >
>> >> > It's certainly worth an experiment: let's say you restrict yourself to
>> >> > tokens less than 8 characters, you need an additional 3 bits (using one
>> >> > value, e.g. zero, as the escape value).  That leaves 20 bits for the 
>> >> > line
>> >> > numbers (for the normal 8 bit columns), which might be enough for most
>> >> > single-file compilations.  For LTO compilation this often won't be 
>> >> > enough.
>> >> >
>> >> >> My plan is to investigate the impact these patches have on the time and
>> >> >> memory consumption of the compiler,
>> >> >
>> >> > When you do so, make sure you're also measuring an LTO compilation with
>> >> > debug info of something big (firefox).  I know that we already had 
>> >> > issues
>> >> > with the size of the linemap data in the past for these cases (probably
>> >> > when we added columns).
>> >>
>> >> The issue we have with LTO is that the linemap gets populated in quite
>> >> random order and thus we repeatedly switch files (we've mitigated this
>> >> somewhat for GCC 5).  We also considered dropping column info
>> >> (and would drop range info) as diagnostics are from optimizers only
>> >> with LTO and we keep locations merely for debug info.
>> >
>> > Thanks.  Presumably the mitigation you're referring to is the
>> > lto_location_cache class in lto-streamer-in.c?
>> >
>> > Am I right in thinking that, right now, the LTO code doesn't support
>> > ad-hoc locations? (presumably the block pointers only need to exist
>> > during optimization, which happens after the serialization)
>>
>> LTO code does support ad-hoc locations but they are "restored" only
>> when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).
>>
>> > The obvious simplification would be, as you suggest, to not bother
>> > storing range information with LTO, falling back to just the existing
>> > representation.  Then there's no need to extend LTO to serialize ad-hoc
>> > data; simply store the underlying locus into the bit stream.  I think
>> > that this happens already: lto-streamer-out.c calls expand_location and
>> > stores the result, so presumably any ad-hoc location_t values made by
>> > the v2 patches would have dropped their range data there when I ran the
>> > test suite.
>>
>> Yep.  We only preserve BLOCKs, so if you don't add extra code to
>> preserve ranges they'll be "dropped".
>>
>> > If it's acceptable to not bother with ranges for LTO, one way to do the
>> > "stashing short ranges into the location_t" idea might be for the
>> > bits-per-range of location_t values to be a property of the line_table
>> > (or possibly the line map), set up when the struct line_maps is created.
>> > For non-LTO it could be some tuned value (maybe from a param?); for LTO
>> > it could be zero, so that we have as many bits as before for line/column
>> > data.
>>
>> That could be a possibility (likewise for column info?)
>>
>> Richard.
>>
>> > Hope this sounds sane
>> > Dave
>
> I did some crude benchmarking of the patchkit, using these scripts:
>   https://github.com/davidmalcolm/gcc-benchmarking
> (specifically, bb0222b455df8cefb53bfc1246eb0a8038256f30),
> using the "big-code.c" and "kdecore.cc" files Michael posted as:
>   https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00062.html
> and "influence.i", a preprocessed version of SPEC2006's 445.gobmk
> engine/influence.c (as an example of a moderate-sized pure C source
> file).
>
> This doesn't yet cover very large autogenerated C files, and the .cc
> file is only being measured to see the effect on the ad-hoc table (and
> tokenization).
>
> "control" was r227977.
> "experiment" was the same revision with the v2 patchkit applied.
>
> Recall that this patchkit captures ranges for tokens as an extra field
> within tokens within libcpp and the C FE, and adds ranges to the ad-hoc
> location lookaside, storing them for all tree nodes within the C FE that
> have a location_t, and passing them around within c_expr for all C
> expressions (including those that don't have a location_t).
>
> Both control and experiment were built with
>   --enable-checking=release \
>   --disable-bootstrap \
>   --disable-multilib \
>

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Yury Gribov


On 10/14/2015 12:34 PM, Maxim Ostapenko wrote:

On 14/10/15 10:54, Jakub Jelinek wrote:

Do you plan to update the asan tests we have to reflect the changes in
upstream?


Hm, there aren't changes into instrumentation, so the only thing is new
interceptors. If it is desirable, I can migrate some tests for new
interceptors from upstream.


What about e.g. "-Improvements for ASan deactivated start were performed"?

-Y

Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Sebastian Huber


Hello,

I get now the following error:

libtool: compile: /scratch/git-build/b-gcc-git-arm-rtems4.12/./gcc/xgcc 
-B/scratch/git-build/b-gcc-git-arm-rtems4.12/./gcc/ -nostdinc 
-B/scratch/git-build/b-gcc-git-arm-rtems4.12/arm-rtems4.12/newlib/ 
-isystem 
/scratch/git-build/b-gcc-git-arm-rtems4.12/arm-rtems4.12/newlib/targ-include 
-isystem /home/EB/sebastian_h/archive/gcc-git/newlib/libc/include 
-B/opt/rtems-4.12/arm-rtems4.12/bin/ 
-B/opt/rtems-4.12/arm-rtems4.12/lib/ -isystem 
/opt/rtems-4.12/arm-rtems4.12/include -isystem 
/opt/rtems-4.12/arm-rtems4.12/sys-include -DHAVE_CONFIG_H -I. 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/config/rtems 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/config/posix 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp 
-I/home/EB/sebastian_h/archive/gcc-git/libgomp/../include -Wall -Werror 
-g -O2 -MT fortran.lo -MD -MP -MF .deps/fortran.Tpo -c 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c -o fortran.o
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c: In function 
'omp_get_place_proc_ids_':
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:484:39: error: 
passing argument 2 of 'omp_get_place_proc_ids' from incompatible pointer 
type [-Werror=incompatible-pointer-types]

   omp_get_place_proc_ids (*place_num, ids);
   ^
In file included from 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: 
expected 'int *' but argument is of type 'int32_t * {aka long int *}'

 ialias_redirect (omp_get_place_proc_ids)
  ^
/home/EB/sebastian_h/archive/gcc-git/libgomp/libgomp.h:1011:24: note: in 
definition of macro 'ialias_redirect'
   extern __typeof (fn) fn __asm__ (ialias_ulp "gomp_ialias_" #fn) 
attribute_hidden;

^
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c: In function 
'omp_get_partition_place_nums_':
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:508:33: error: 
passing argument 1 of 'omp_get_partition_place_nums' from incompatible 
pointer type [-Werror=incompatible-pointer-types]

   omp_get_partition_place_nums (place_nums);
 ^
In file included from 
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
/home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:76:18: note: 
expected 'int *' but argument is of type 'int32_t * {aka long int *}'

 ialias_redirect (omp_get_partition_place_nums)
  ^
/home/EB/sebastian_h/archive/gcc-git/libgomp/libgomp.h:1011:24: note: in 
definition of macro 'ialias_redirect'
   extern __typeof (fn) fn __asm__ (ialias_ulp "gomp_ialias_" #fn) 
attribute_hidden;


We have for example (libgomp/omp_lib.f90.in):

  subroutine omp_get_place_proc_ids (place_num, ids)
integer (4), intent(in) :: place_num
integer (4), intent(out) :: ids(*)
  end subroutine omp_get_place_proc_ids

So this interface is different to (libgomp/omp.h.in):

extern void omp_get_place_proc_ids (int, int *) __GOMP_NOTHROW;

The following patch fixes the problem, but I am not sure if this is 
really the best way to address this issue:


diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index ceff9ac..44aaf92 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -481,7 +481,9 @@ omp_get_place_num_procs_8_ (const int64_t *place_num)
 void
 omp_get_place_proc_ids_ (const int32_t *place_num, int32_t *ids)
 {
-  omp_get_place_proc_ids (*place_num, ids);
+  int int_ids;
+  omp_get_place_proc_ids (*place_num, _ids);
+  *ids = int_ids;
 }

 void
@@ -505,7 +507,9 @@ omp_get_partition_num_places_ (void)
 void
 omp_get_partition_place_nums_ (int32_t *place_nums)
 {
-  omp_get_partition_place_nums (place_nums);
+  int int_place_nums;
+  omp_get_partition_place_nums (_place_nums);
+  *place_nums = int_place_nums;
 }

 void

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko


On 14/10/15 10:54, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote:

On 13/10/15 14:15, Maxim Ostapenko wrote:

This is the raw merge itself. I'm bumping SONAME to libasan.so.3.

-Maxim

I have just noticed that I've misused autoconf stuff (used wrong version).
Here a fixed version of the same patch. Sorry for inconvenience.

Is libubsan, libtsan backwards compatible, or do we want to change SONAME
there too?


No, they are not (for UBSan heuristic doesn't work well for GCC, TSan 
has some type changes into interceptors and data structures, e.g. in 
struct ReportStack). I  can share more details, if desired.




The aarch64 changes are terrible, not just because it doesn't yet have
runtime decision on what VA to use or that it doesn't support 48-bit VA,
but also that for the 42-bit VA it uses a different shadow offset from
39-bit VA.  But on the compiler side we have just one...
Though, only the 39-bit VA is enabled right now by default, so out of the
box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit
VA have to patch libsanitizer.

Have you verified libbacktrace sanitization still works properly (that is
something upstream does not test)?


I'm sorry, didn't catch well your words about libbacktrace sanitization. 
Did you mean symbolization? If so, I didn't perform any special 
validation here (thought output patterns tests use libbacktrace output, 
no?). But I wonder how can I verify this more or less automatically.




Do you plan to update the asan tests we have to reflect the changes in
upstream?


Hm, there aren't changes into instrumentation, so the only thing is new 
interceptors. If it is desirable, I can migrate some tests for new 
interceptors from upstream.




Jakub

Re: [PATCH 4/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Tue, Oct 13, 2015 at 02:18:41PM +0300, Maxim Ostapenko wrote:
> This is a reapplied Jakub's patch for disabling ODR violation detection.
> More details can be found here
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888).

This is ok when all the other changes are acked.

> 2015-10-12  Maxim Ostapenko  
> 
>   PR bootstrap/63888
>   Reapply:
>   2015-02-20  Jakub Jelinek  
> 
>   * asan/asan_globals.cc (RegisterGlobal): Disable detect_odr_violation
>   support until it is rewritten upstream.
> 
>   * c-c++-common/asan/pr63888.c: New test.
> 
> Index: libsanitizer/asan/asan_globals.cc
> ===
> --- libsanitizer/asan/asan_globals.cc (revision 250059)
> +++ libsanitizer/asan/asan_globals.cc (working copy)
> @@ -146,7 +146,9 @@
>CHECK(AddrIsInMem(g->beg));
>CHECK(AddrIsAlignedByGranularity(g->beg));
>CHECK(AddrIsAlignedByGranularity(g->size_with_redzone));
> -  if (flags()->detect_odr_violation) {
> +  // This "ODR violation" detection is fundamentally incompatible with
> +  // how GCC registers globals.  Disable as useless until rewritten upstream.
> +  if (0 && flags()->detect_odr_violation) {
>  // Try detecting ODR (One Definition Rule) violation, i.e. the situation
>  // where two globals with the same name are defined in different modules.
>  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {


Jakub

Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Yury Gribov


On 10/13/2015 02:16 PM, Maxim Ostapenko wrote:

This patch introduces required compiler changes. Now, we don't version
asan_init, we have a special __asan_version_mismatch_check_v[n] symbol
for this.

Also, asan_stack_malloc_[n] doesn't take a local stack as a second
parameter anymore, so don't pass it.


Did you compare libasan.so and libclang_rt-asan.so for other ABI 
incompatibilities e.g. via libabigail?


-Y

Re: Merge from gomp-4_1-branch to trunk

2015-10-14 Thread Jakub Jelinek

On Wed, Oct 14, 2015 at 09:34:48AM +0200, Sebastian Huber wrote:
> /home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:28:0:
> /home/EB/sebastian_h/archive/gcc-git/libgomp/fortran.c:73:18: note: expected
> 'int *' but argument is of type 'int32_t * {aka long int *}'

Ugh, wasn't aware that some targets use long int for int32_t :(.

> The following patch fixes the problem, but I am not sure if this is really
> the best way to address this issue:
> 
> diff --git a/libgomp/fortran.c b/libgomp/fortran.c
> index ceff9ac..44aaf92 100644
> --- a/libgomp/fortran.c
> +++ b/libgomp/fortran.c
> @@ -481,7 +481,9 @@ omp_get_place_num_procs_8_ (const int64_t *place_num)
>  void
>  omp_get_place_proc_ids_ (const int32_t *place_num, int32_t *ids)
>  {
> -  omp_get_place_proc_ids (*place_num, ids);
> +  int int_ids;
> +  omp_get_place_proc_ids (*place_num, _ids);
> +  *ids = int_ids;
>  }
> 
>  void
> @@ -505,7 +507,9 @@ omp_get_partition_num_places_ (void)
>  void
>  omp_get_partition_place_nums_ (int32_t *place_nums)
>  {
> -  omp_get_partition_place_nums (place_nums);
> +  int int_place_nums;
> +  omp_get_partition_place_nums (_place_nums);
> +  *place_nums = int_place_nums;
>  }

No, both the above changes are wrong.  There is not a single int32_t
written, but could be many more, it is an array of 32-bit integers.
I'd say you just want to cast explicitly,
  omp_get_place_proc_ids (*place_num, (int *) ids);
and
  omp_get_parition_place_nums ((int *) place_nums);
The reason for int32_t is that on the Fortran side it is integer(kind=4)
and everywhere else for that int32_t is used.
If this works, the patch is preapproved.

As for aliasing, it will always be int stores vs. integer(kind=4) reads,
int32_t is just a type used in the wrappers.

Jakub

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Richard Biener

On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  wrote:
> 2015-10-13 16:54 GMT+03:00 Richard Biener :
>> On Thu, Oct 8, 2015 at 5:11 PM, Ilya Enkovich  wrote:
>>> Hi,
>>>
>>> This patch adds a special handling of boolean vector invariants.  We need 
>>> additional code to determine type of generated invariant.  For 
>>> VEC_COND_EXPR case we even provide this type directly because statement 
>>> vectype doesn't allow us to compute it.  Separate code is used to generate 
>>> and expand such vectors.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2015-10-08  Ilya Enkovich  
>>>
>>> * expr.c (const_vector_mask_from_tree): New.
>>> (const_vector_from_tree): Use const_vector_mask_from_tree
>>> for boolean vectors.
>>> * tree-vect-stmts.c (vect_init_vector): Support boolean vector
>>> invariants.
>>> (vect_get_vec_def_for_operand): Add VECTYPE arg.
>>> (vectorizable_condition): Directly provide vectype for invariants
>>> used in comparison.
>>> * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
>>> arg.
>>>
>>>
>>> diff --git a/gcc/expr.c b/gcc/expr.c
>>> index 88da8cb..a624a34 100644
>>> --- a/gcc/expr.c
>>> +++ b/gcc/expr.c
>>> @@ -11320,6 +11320,40 @@ try_tablejump (tree index_type, tree index_expr, 
>>> tree minval, tree range,
>>>return 1;
>>>  }
>>>
>>> +/* Return a CONST_VECTOR rtx representing vector mask for
>>> +   a VECTOR_CST of booleans.  */
>>> +static rtx
>>> +const_vector_mask_from_tree (tree exp)
>>> +{
>>> +  rtvec v;
>>> +  unsigned i;
>>> +  int units;
>>> +  tree elt;
>>> +  machine_mode inner, mode;
>>> +
>>> +  mode = TYPE_MODE (TREE_TYPE (exp));
>>> +  units = GET_MODE_NUNITS (mode);
>>> +  inner = GET_MODE_INNER (mode);
>>> +
>>> +  v = rtvec_alloc (units);
>>> +
>>> +  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
>>> +{
>>> +  elt = VECTOR_CST_ELT (exp, i);
>>> +
>>> +  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
>>> +  if (integer_zerop (elt))
>>> +   RTVEC_ELT (v, i) = CONST0_RTX (inner);
>>> +  else if (integer_onep (elt)
>>> +  || integer_minus_onep (elt))
>>> +   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
>>> +  else
>>> +   gcc_unreachable ();
>>> +}
>>> +
>>> +  return gen_rtx_CONST_VECTOR (mode, v);
>>> +}
>>> +
>>>  /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
>>>  static rtx
>>>  const_vector_from_tree (tree exp)
>>> @@ -11335,6 +11369,9 @@ const_vector_from_tree (tree exp)
>>>if (initializer_zerop (exp))
>>>  return CONST0_RTX (mode);
>>>
>>> +  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
>>> +  return const_vector_mask_from_tree (exp);
>>> +
>>>units = GET_MODE_NUNITS (mode);
>>>inner = GET_MODE_INNER (mode);
>>>
>>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>>> index 6949c71..337ea7b 100644
>>> --- a/gcc/tree-vect-stmts.c
>>> +++ b/gcc/tree-vect-stmts.c
>>> @@ -1308,27 +1308,61 @@ vect_init_vector_1 (gimple *stmt, gimple *new_stmt, 
>>> gimple_stmt_iterator *gsi)
>>>  tree
>>>  vect_init_vector (gimple *stmt, tree val, tree type, gimple_stmt_iterator 
>>> *gsi)
>>>  {
>>> +  tree val_type = TREE_TYPE (val);
>>> +  machine_mode mode = TYPE_MODE (type);
>>> +  machine_mode val_mode = TYPE_MODE(val_type);
>>>tree new_var;
>>>gimple *init_stmt;
>>>tree vec_oprnd;
>>>tree new_temp;
>>>
>>>if (TREE_CODE (type) == VECTOR_TYPE
>>> -  && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
>>> -{
>>> -  if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
>>> +  && TREE_CODE (val_type) != VECTOR_TYPE)
>>> +{
>>> +  /* Handle vector of bool represented as a vector of
>>> +integers here rather than on expand because it is
>>> +a default mask type for targets.  Vector mask is
>>> +built in a following way:
>>> +
>>> +tmp = (int)val
>>> +vec_tmp = {tmp, ..., tmp}
>>> +vec_cst = VIEW_CONVERT_EXPR(vec_tmp);  */
>>> +  if (TREE_CODE (val_type) == BOOLEAN_TYPE
>>> + && VECTOR_MODE_P (mode)
>>> + && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
>>> + && GET_MODE_INNER (mode) != val_mode)
>>> {
>>> - if (CONSTANT_CLASS_P (val))
>>> -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
>>> - else
>>> + unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
>>> + tree stype = build_nonstandard_integer_type (size, 1);
>>> + tree vectype = get_vectype_for_scalar_type (stype);
>>> +
>>> + new_temp = make_ssa_name (stype);
>>> + init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
>>> + vect_init_vector_1 (stmt, init_stmt, gsi);
>>> +
>>> + val = make_ssa_name (vectype);
>>> + new_temp = build_vector_from_val (vectype, new_temp);
>>> + init_stmt =

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Richard Biener

On Tue, Oct 13, 2015 at 10:59 PM, Richard Henderson  wrote:
> On 10/14/2015 02:49 AM, Jeff Law wrote:
>>
>> The problem here is we don't know what address space the *0 is going to
>> hit,
>> right?
>
>
> Correct, not before we do the walk of stmt to see what's present.
>
>> Isn't that also an issue for code generation as well?
>
>
> What sort of problem are you thinking of?  I haven't seen one yet.

The actual dereference of course has a properly address-space qualified zero.

Only your walking depends on operand_equal_p to treat different address-space
zero addresses as equal (which they are of course not ...):

int
operand_equal_p (const_tree arg0, const_tree arg1, unsigned int flags)
{
...
  /* Check equality of integer constants before bailing out due to
 precision differences.  */
  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
{
  /* Address of INTEGER_CST is not defined; check that we did not forget
 to drop the OEP_ADDRESS_OF/OEP_CONSTANT_ADDRESS_OF flags.  */
  gcc_checking_assert (!(flags
 & (OEP_ADDRESS_OF | OEP_CONSTANT_ADDRESS_OF)));
  return tree_int_cst_equal (arg0, arg1);
}

but only later we do

  /* We cannot consider pointers to different address space equal.  */
  if (POINTER_TYPE_P (TREE_TYPE (arg0))
  && POINTER_TYPE_P (TREE_TYPE (arg1))
  && (TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg0)))
  != TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (arg1)
return 0;

So "fixing" that would make the walker only look for default
address-space zero dereferences.

I think we need to fix operand_equal_p anyway because 0 is clearly not
equal to 0 (only if
they convert to the same literal)

Richard.

>
> r~

Re: [PR67891] drop is_gimple_reg test from set_parm_rtl

2015-10-14 Thread Richard Biener

On Wed, Oct 14, 2015 at 5:25 AM, Alexandre Oliva  wrote:
> On Oct 12, 2015, Richard Biener  wrote:
>
>> On Sat, Oct 10, 2015 at 3:16 PM, Alexandre Oliva  wrote:
>>> On Oct  9, 2015, Richard Biener  wrote:
>>>
 Ok.  Note that I think emit_block_move shouldn't mess with the addressable 
 flag.
>>>
>>> I have successfully tested a patch that stops it from doing so,
>>> reverting https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49429#c11 but
>>> according to bugs 49429 and 49454, it looks like removing it would mess
>>> with escape analysis introduced in r175063 for bug 44194.  The thread
>>> that introduces the mark_addressable calls suggests some discomfort with
>>> this solution, and even a suggestion that the markings should be
>>> deferred past the end of expand, but in the end there was agreement to
>>> go with it.  https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01746.html
>
>> Aww, indeed.  Of course the issue is that we don't track pointers to the
>> stack introduced during RTL properly.
>
>> Thanks for checking.  Might want to add a comment before that
>> addressable setting now that you've done the archeology.
>
> I decided to give the following approach a try instead.  The following
> patch was regstrapped on x86_64-linux-gnu and i686-linux-gnu.
> Ok to install?

It looks ok to me but lacks a comment in mark_addressable_1 why we
do this queueing when currently expanding to RTL.

Richard.

> Would anyone with access to hpux (pa and ia64 are both affected) give it
> a spin?
>
>
> defer mark_addressable calls during expand till the end of expand
>
> From: Alexandre Oliva 
>
> for  gcc/ChangeLog
>
> * gimple-expr.c: Include hash-set.h and rtl.h.
> (mark_addressable_queue): New var.
> (mark_addressable): Factor actual marking into...
> (mark_addressable_1): ... this.  Queue it up during expand.
> (mark_addressable_2): New.
> (flush_mark_addressable_queue): New.
> * gimple-expr.h (flush_mark_addressable_queue): Declare.
> * cfgexpand.c: Include gimple-expr.h.
> (pass_expand::execute): Flush mark_addressable queue.
> ---
>  gcc/cfgexpand.c   |3 +++
>  gcc/gimple-expr.c |   50 --
>  gcc/gimple-expr.h |1 +
>  3 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index eaad859..a362e17 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "internal-fn.h"
>  #include "tree-eh.h"
>  #include "gimple-iterator.h"
> +#include "gimple-expr.h"
>  #include "gimple-walk.h"
>  #include "cgraph.h"
>  #include "tree-cfg.h"
> @@ -6373,6 +6374,8 @@ pass_expand::execute (function *fun)
>/* We're done expanding trees to RTL.  */
>currently_expanding_to_rtl = 0;
>
> +  flush_mark_addressable_queue ();
> +
>FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (fun)->next_bb,
>   EXIT_BLOCK_PTR_FOR_FN (fun), next_bb)
>  {
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index 2a6ba1a..db249a3 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "stor-layout.h"
>  #include "demangle.h"
> +#include "hash-set.h"
> +#include "rtl.h"
>
>  /* - Type related -  */
>
> @@ -823,6 +825,50 @@ is_gimple_mem_ref_addr (tree t)
>   || decl_address_invariant_p (TREE_OPERAND (t, 0);
>  }
>
> +/* Hold trees marked addressable during expand.  */
> +
> +static hash_set *mark_addressable_queue;
> +
> +/* Mark X as addressable or queue it up if called during expand.  */
> +
> +static void
> +mark_addressable_1 (tree x)
> +{
> +  if (!currently_expanding_to_rtl)
> +{
> +  TREE_ADDRESSABLE (x) = 1;
> +  return;
> +}
> +
> +  if (!mark_addressable_queue)
> +mark_addressable_queue = new hash_set();
> +  mark_addressable_queue->add (x);
> +}
> +
> +/* Adaptor for mark_addressable_1 for use in hash_set traversal.  */
> +
> +bool
> +mark_addressable_2 (tree const , void * ATTRIBUTE_UNUSED = NULL)
> +{
> +  mark_addressable_1 (x);
> +  return false;
> +}
> +
> +/* Mark all queued trees as addressable, and empty the queue.  To be
> +   called right after clearing CURRENTLY_EXPANDING_TO_RTL.  */
> +
> +void
> +flush_mark_addressable_queue ()
> +{
> +  gcc_assert (!currently_expanding_to_rtl);
> +  if (mark_addressable_queue)
> +{
> +  mark_addressable_queue->traverse (NULL);
> +  delete mark_addressable_queue;
> +  mark_addressable_queue = NULL;
> +}
> +}
> +
>  /* Mark X addressable.  Unlike the langhook we expect X to be in gimple
> form and we don't do any syntax checking.  */
>
> @@ -838,7 +884,7 @@ mark_addressable (tree x)
>&& TREE_CODE

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Andrew Pinski

On Wed, Oct 14, 2015 at 11:38 AM, Renato Golin  wrote:
> On 14 October 2015 at 19:21, Evgenii Stepanov  
> wrote:
>> Wait. As Jakub correctly pointed out in the other thread, there is no
>> obvious reason why there could not be a single shadow offset value
>> that would work for all 3 possible VMA settings. I suggest figuring
>> this out first.
>
> We are.

Then until that happens I think we should disable asan and tsan for
AARCH64 for GCC.

>
> cheers,
> --renato

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Renato Golin

On 14 October 2015 at 20:00, Andrew Pinski  wrote:
> Then until that happens I think we should disable asan and tsan for
> AARCH64 for GCC.

I can't comment on that, but we'll continue running the tests on our
side on both 39 and 42 VMA configurations, to make sure we don't
regress until we actually ready to go for a final solution.

For the people that weren't directly involved (or don't want to be),
we'll be letting you know when a final solution is agreed and
validated between all parties.

cheers,
--renato

Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Andrew Pinski

On Wed, Oct 14, 2015 at 12:15 PM, Renato Golin  wrote:
> On 14 October 2015 at 20:00, Andrew Pinski  wrote:
>> Then until that happens I think we should disable asan and tsan for
>> AARCH64 for GCC.
>
> I can't comment on that, but we'll continue running the tests on our
> side on both 39 and 42 VMA configurations, to make sure we don't
> regress until we actually ready to go for a final solution.

It does not work for any standard Linux distros if they support dual
node ThunderX at all.  It is completely broken and does not even make
sense to do regression testing on a broken library.

>
> For the people that weren't directly involved (or don't want to be),
> we'll be letting you know when a final solution is agreed and
> validated between all parties.
>
> cheers,
> --renato

Re: [PATCH] print help for undocumented options

2015-10-14 Thread Joseph Myers

On Wed, 14 Oct 2015, Martin Sebor wrote:

> IMO, printing the aliased option's help text makes using the output
> easier for users who find the undocumented option first, in that
> they don't then have to go look for the one that does have
> documentation, so I left that part in place.  If you or someone
> feels strongly that it shouldn't be there I'll remove it.

Well, I think it might also encourage people to use the aliases, when for 
the most part we'd rather people used the canonical names (and so made it 
easier e.g. to search for other uses of the same option).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-14 Thread Alexandre Oliva

On Oct 14, 2015, Jan Hubicka  wrote:

>> On Oct 13, 2015, Eric Botcazou  wrote:
>> 
>> > Note that this is PR middle-end/67912.
>> 
>> Thanks.  I added this piece of information to the ChangeLog entry, and
>> checked the patch in.

> Thanks, Alexandre. That indeed looks better than my variant of the patch.
> Does it also fix the IA-64 issue?

I didn't know about the ia64 issue when I prepared the patch; I only
found the thread when the patch was fully tested.  I didn't test it on
that arch, but I'm hoping to hear from whoever reported the ia64 problem
that the patch fixed it too ;-)

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PATCH] Fix wrong-code when folding X - (X / Y) * Y (PR tree-optimization/67953)

2015-10-14 Thread Marc Glisse


On Wed, 14 Oct 2015, Marek Polacek wrote:


Evidently, the X - (X / Y) * Y -> X % Y pattern can't change the
signedness of X from signed to unsigned, otherwise we'd generate
wrong code.  (But unsigned -> signed should be fine.)

Does anyone see a better fix than this?


I was wondering about producing:

(convert (trunc_mod @0 @1))

but didn't really think about it.

Aren't there also problems if the conversion changes the precision? I can 
imagine your patch ending in x % 0, with @1 non-zero but a narrowing cast.



Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-14  Marek Polacek  

PR tree-optimization/67953
* match.pd (X - (X / Y) * Y): Don't change signedness of @0.

* gcc.dg/fold-minus-6.c (fn4): Change the type of A to
unsigned.
* gcc.dg/torture/pr67953.c: New test.

diff --git gcc/match.pd gcc/match.pd
index 655c9ff..24e19a9 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -267,7 +267,8 @@ along with GCC; see the file COPYING3.  If not see
/* X - (X / Y) * Y is the same as X % Y.  */
(simplify
 (minus (convert1? @0) (convert2? (mult (trunc_div @0 @1) @1)))
- (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (type))
  (trunc_mod (convert @0) (convert @1

/* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
diff --git gcc/testsuite/gcc.dg/fold-minus-6.c 
gcc/testsuite/gcc.dg/fold-minus-6.c
index 1c22c25..1535452 100644
--- gcc/testsuite/gcc.dg/fold-minus-6.c
+++ gcc/testsuite/gcc.dg/fold-minus-6.c
@@ -20,7 +20,7 @@ fn3 (long int x)
}

int
-fn4 (int a, int b)
+fn4 (unsigned int a, int b)
{
  return a - (unsigned) ((a / b) * b);
}
diff --git gcc/testsuite/gcc.dg/torture/pr67953.c 
gcc/testsuite/gcc.dg/torture/pr67953.c
index e69de29..5ce399b 100644
--- gcc/testsuite/gcc.dg/torture/pr67953.c
+++ gcc/testsuite/gcc.dg/torture/pr67953.c
@@ -0,0 +1,42 @@
+/* PR tree-optimization/67953 */
+/* { dg-do run } */
+
+unsigned int
+fn1 (signed int a)
+{
+  return (unsigned int) a - ((a / 3) * 3);
+}
+
+unsigned int
+fn2 (signed int a)
+{
+  return a - ((a / 3) * 3);
+}
+
+unsigned int
+fn3 (int a)
+{
+  return a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn4 (int a)
+{
+  return (unsigned) a - (unsigned) ((a / 3) * 3);
+}
+
+signed int
+fn5 (unsigned int a)
+{
+  return (signed) a - (int) ((a / 3) * 3);
+}
+
+int
+main ()
+{
+  if (fn1 (-5) != -2
+  || fn2 (-5) != -2
+  || fn3 (-5) != -2
+  || fn4 (-5) != -2)
+__builtin_abort ();
+}

Marek



--
Marc Glisse

Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-14 Thread Bernd Schmidt

On 10/14/2015 07:43 PM, Jeff Law wrote:

Obviously some pessimization relative to current code is necessary to
fix some of the problems WRT thread safety and avoiding things like
introducing faults in code which did not previously fault.

Huh? This patch is purely an (attempt at) optimization, not something 
that fixes any problems.

However, pessimization of safe code is, err, um, bad and needs to be
avoided.

Here's an example:

  > subq$16, %rsp
[...]
  > leaq8(%rsp), %r8
  > leaq256(%rax), %rdx
cmpq256(%rax), %rcx   | cmpq256(%rax), %rsi
jne.L97   <
movq$0, 256(%rax) <
.L97: <
  > movq%rdx, %rax
  > cmovne  %r8, %rax
  > movq$0, (%rax)
[...]
  > addq$16, %rsp

In the worst case that executes six more instructions, and always causes 
unnecessary stack frame bloat. This on x86 where AFAIK it's doubtful 
whether cmov is a win at all anyway. I think this shows the approach is 
just bad, even ignoring problems like that it could allocate multiple 
scratchpads when one would suffice, or allocate one and end up not using 
it because the transformation fails.

I can't test valgrind right now, it fails to run on my machine, but I 
guess it could adapt to allow stores slightly below the stack (maybe 
warning once)? It seems like a bit of an edge case to worry about, but 
if supporting it is critical and it can't be changed to adapt to new 
optimizations, then I think we're probably better off entirely without 
this scratchpad transformation.

Alternatively I can think of a few other possible approaches which 
wouldn't require this kind of bloat:

 * add support for allocating space in the stack redzone. That could be
   interesting for the register allocator as well. Would help only
   x86_64, but that's a large fraction of gcc's userbase.
 * add support for opportunistically finding unused alignment padding
   in the existing stack frame. Less likely to work but would produce
   better results when it does.
 * on embedded targets we probably don't have to worry about valgrind,
   so do the optimal (sp - x) thing there
 * allocate a single global as the dummy target. Might be more
   expensive to load the address on some targets though.
 * at least find a way to express costs for this transformation.
   Difficult since you don't yet necessarily know if the function is
   going to have a stack frame. Hence, IMO this approach is flawed.
   (You'll still want cost estimates even when not allocating stuff in
   the normal stack frame, because generated code will still execute
   between two and four extra instructions).

Bernd

[PATCH, i386]: Fix PR 67967, ICE in i386_pe_seh_unwind_emit

2015-10-14 Thread Uros Bizjak

Attached patch fixes PR 67967, where we emit REG_CFA_EXPRESSION,
attached to aligned SSE store. This is unnecessary, and confuses SEH
targets.

2015-10-14  Uros Bizjak  

PR target/67967
* config/i386/i386.c (ix86_emit_save_reg_using_mov): Do not add
REG_CFA_EXPRESSION to aligned SSE stores.

Bootstrapped and regression tested on x86_64-linux-gnu, also tested
with a crosscompiler to x86_64-unknown-cygwin.

Committed to mainline SVN, will be committed to gcc-5 branch.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 228818)
+++ config/i386/i386.c  (working copy)
@@ -11612,6 +11612,7 @@ ix86_emit_save_reg_using_mov (machine_mode mode, u
 {
   struct machine_function *m = cfun->machine;
   rtx reg = gen_rtx_REG (mode, regno);
+  rtx unspec = NULL_RTX;
   rtx mem, addr, base, insn;
   unsigned int align;
 
@@ -11626,13 +11627,9 @@ ix86_emit_save_reg_using_mov (machine_mode mode, u
  In case INCOMING_STACK_BOUNDARY is misaligned, we have
  to emit unaligned store.  */
   if (mode == V4SFmode && align < 128)
-{
-  rtx unspec = gen_rtx_UNSPEC (mode, gen_rtvec (1, reg), UNSPEC_STOREU);
-  insn = emit_insn (gen_rtx_SET (mem, unspec));
-}
-  else
-insn = emit_insn (gen_rtx_SET (mem, reg));
+unspec = gen_rtx_UNSPEC (mode, gen_rtvec (1, reg), UNSPEC_STOREU);
 
+  insn = emit_insn (gen_rtx_SET (mem, unspec ? unspec : reg));
   RTX_FRAME_RELATED_P (insn) = 1;
 
   base = addr;
@@ -11679,7 +11676,7 @@ ix86_emit_save_reg_using_mov (machine_mode mode, u
   mem = gen_rtx_MEM (mode, addr);
   add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, reg));
 }
-  else
+  else if (unspec)
 add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg));
 }

RE: [PATCH, mips]: Use ROUND_UP and ROUND_DOWN macros

2015-10-14 Thread Matthew Fortune

Uros Bizjak  writes:
> Fairly trivial patch that introduces no functional changes.
> 
> * config/mips/mips.h (MIPS_STACK_ALIGN): Implement using
> ROUND_UP macro.
> * config/mips/mips.c (mips_setup_incoming_varargs): Use
> ROUND_DOWN to calculate off.
> (mips_gimplify_va_arg_expr): Use ROUND_UP to calculate rsize.
> (mips_emit_probe_stack_range): Use ROUND_DOWN to calculate
> rounded_size.
> 
> Tested by building a crosscompiler to powerpc64-linux-gnu.
> 
> OK for mainline?

OK assuming you meant mips64-linux-gnu or some MIPS triple?

Thanks,
Matthew

Re: [PATCH] c/67925 - update documentation on `inline'

2015-10-14 Thread Arkadiusz Drabczyk

On Wed, Oct 14, 2015 at 08:36:43AM -0600, Martin Sebor wrote:
> On 10/13/2015 04:47 PM, Arkadiusz Drabczyk wrote:
> >* gcc/doc/extend.texi: documentation says that functions declared
> >`inline' would not be integrated if they are called before they are
> >defined or if they are recursive. Both of these statements is now
> >false as shown in examples on Bugzilla.
> 
> It might also be worth updating the note in the subsequent
> paragraph and removing the mention of variable-length data types
> which no longer prevent inlining.

Done.  I also removed the mention of nested functions as the following
code compiled with GCC 6.0 doesn't give any warning with -O2 -Winline
and main() is the only function defined in assembler code:

#include 
#include 

inline static int foo (int a, int b)
{
printf("a == %d\n", a);
inline int square (int z) { return z * z; }

return square (a) + square (b);
}

int main(int argc, char *argv[])
{
printf("%d\n", foo(atoi(argv[1]), atoi(argv[2])));
exit(0);
}

I also removed a reference to Bugzilla from the commit message, I
don't think it's necessary.

> FWIW, the list of most -Winline warnings issued by GCC is here
> (there are two more in Ada which, AFAICT, have to do with nested
> functions):
> 
> $ grep -A1 "can never be inlined" gcc/tree-inline.c
> (...)
> --
> = G_("function %q+F can never be inlined because "
>  "it uses non-local goto");

I tested of all of these and listed them in the documentation but
wasn't able to reproduce this one.  The following code does not give
any warning with -O2 -Winline:

#include 
#include 
#include 

static jmp_buf buf;

inline static void longjmp_test(int n)
{
puts("hi");

if (n == 2)
longjmp(buf, 2);
}

int main(int argc, char *argv[])
{
printf("%d\n", setjmp(buf));
longjmp_test(atoi(argv[1]));
puts("next line in a normal execution");
exit(0);
}

Is this test case correct?  However, I get the following warning using
__builtin_longjmp() instead of longjmp():

warning: function 'longjmp_test' can never be inlined because it uses
setjmp-longjmp exception handling [-Winline]
>8--8<
* gcc/doc/extend.texi: documentation says that functions declared
`inline' would not be integrated if they are called before they are
defined, if they are recursive, if they use variable-length data types
or if they are nested.  All of these statements are now false and have
been removed. Mention of setjmp(), __builtin_return() and
__builtin_apply_args() has been added.
---
 gcc/doc/extend.texi | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..be95cc3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7088,21 +7088,18 @@ function are integrated into the caller, and the 
function's address is
 never used, then the function's own assembler code is never referenced.
 In this case, GCC does not actually output assembler code for the
 function, unless you specify the option @option{-fkeep-inline-functions}.
-Some calls cannot be integrated for various reasons (in particular,
-calls that precede the function's definition cannot be integrated, and
-neither can recursive calls within the definition).  If there is a
-nonintegrated call, then the function is compiled to assembler code as
-usual.  The function must also be compiled as usual if the program
-refers to its address, because that can't be inlined.
+If there is a nonintegrated call, then the function is compiled to
+assembler code as usual.  The function must also be compiled as usual if
+the program refers to its address, because that can't be inlined.
 
 @opindex Winline
 Note that certain usages in a function definition can make it unsuitable
-for inline substitution.  Among these usages are: variadic functions, use of
-@code{alloca}, use of variable-length data types (@pxref{Variable Length}),
-use of computed goto (@pxref{Labels as Values}), use of nonlocal goto,
-and nested functions (@pxref{Nested Functions}).  Using @option{-Winline}
-warns when a function marked @code{inline} could not be substituted,
-and gives the reason for the failure.
+for inline substitution.  Among these usages are: variadic functions,
+use of @code{alloca}, use of computed goto (@pxref{Labels as Values}),
+use of @code{setjmp} and use of @code{__builtin_return} or
+@code{__builtin_apply_args}.  Using @option{-Winline} warns when a
+function marked @code{inline} could not be substituted, and gives the
+reason for the failure.
 
 @cindex automatic @code{inline} for C++ member fns
 @cindex @code{inline} automatic for C++ member fns
-- 
2.3.5


-- 
Arkadiusz Drabczyk

Re: [patch] Minor adjustment to gimplify_addr_expr

2015-10-14 Thread Jeff Law


On 10/14/2015 02:57 PM, Eric Botcazou wrote:

Can you use the TMR_OFFSET macro rather than TREE_OPERAND (op0, 1)?

It also seems that you need a stronger check here.

Essentially you have to verify that
STEP * INDEX + INDEX2 + OFFSET == 0

Right?


No, it's MEM_REF, not TARGET_MEM_REF, see build_fold_addr_expr_with_type_loc.

AH, in that case, no objection, patch approved.

jeff

Re: [PATCH 8/9] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-14 Thread Richard Henderson


On 10/14/2015 08:22 PM, Richard Biener wrote:

I think you could trigger bogus CSE of dereferences of literal addresses
from different address-spaces.


Good catch.  You're spot on with that.


r~


int test(void)
{
  int __seg_fs *f = (int __seg_fs *)16;
  int __seg_gs *g = (int __seg_gs *)16;
  return *f + *g;
}

test:
movl%fs:16, %eax
addl%eax, %eax
ret

Re: [PATCH, mips]: Use ROUND_UP and ROUND_DOWN macros

2015-10-14 Thread Uros Bizjak

On Wed, Oct 14, 2015 at 11:38 PM, Matthew Fortune
 wrote:
> Uros Bizjak  writes:
>> Fairly trivial patch that introduces no functional changes.
>>
>> * config/mips/mips.h (MIPS_STACK_ALIGN): Implement using
>> ROUND_UP macro.
>> * config/mips/mips.c (mips_setup_incoming_varargs): Use
>> ROUND_DOWN to calculate off.
>> (mips_gimplify_va_arg_expr): Use ROUND_UP to calculate rsize.
>> (mips_emit_probe_stack_range): Use ROUND_DOWN to calculate
>> rounded_size.
>>
>> Tested by building a crosscompiler to powerpc64-linux-gnu.
>>
>> OK for mainline?
>
> OK assuming you meant mips64-linux-gnu or some MIPS triple?

Ah, yes. FTR, it was mips-linux-gnu.

Thanks,
Uros.

Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.

2015-10-14 Thread Peter Bergner

On Fri, 2015-10-09 at 22:19 +0200, Torvald Riegel wrote:
> On Fri, 2015-10-09 at 11:52 -0500, Peter Bergner wrote:
>> Thanks for the review!  I've attached the changes to the documentation below.
>> Is this better?
> 
> Yes, thanks!

Great, thanks!  This is committed a revision 228827.  I'm just starting
the testing of the backports to the release branches and will commit it
there too, assuming everything is clean.

Peter

Re: [PATCH 1/3] [ARM] PR63870 Add qualifiers for NEON builtins

2015-10-14 Thread Charles Baylis

On 12 October 2015 at 11:58, Alan Lawrence  wrote:
> On 07/10/15 00:59, charles.bay...@linaro.org wrote:
>>
>> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
>
> ...
>>
>>   case NEON_ARG_MEMORY:
>>   /* Check if expand failed.  */
>>   if (op[argc] == const0_rtx)
>>   {
>> -   va_end (ap);
>> return 0;
>>   }
>
>
> ...and drop the braces?

Will do.

>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 02f5dc3..448cde3 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -30117,4 +30117,5 @@ arm_sched_fusion_priority (rtx_insn *insn, int
>> max_pri,
>> *pri = tmp;
>> return;
>>   }
>> +
>>   #include "gt-arm.h"
>
>
> This looks unrelated (and is the only change to arm.c) - perhaps commit
> separately? (Note I am not a maintainer! But this looks "obvious"...)

It doesn't seem very useful. I'll drop it.

>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index 87c9f90..27ac4dc 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -288,6 +288,9 @@ extern void
>> (*arm_lang_output_object_attributes_hook)(void);
>>   #define TARGET_BPABI false
>>   #endif
>>
>> +#define ENDIAN_LANE_N(mode, n)  \
>> +  (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
>> +
>
>
> Given we are making changes here to how this all works on bigendian, have
> you tested armeb at all?

I tested on big endian, and it passes, except for a testsuite issue
with the *_f16 tests, which fail because they are built without the
fp16 options on big endian. This is because
check_effective_target_arm_neon_fp16_ok_nocache gets an ICE when it
attempts to compile the test program. I think those fp16 intrinsics
are in your area, do you want to take a look? :)

Thanks for the review

Charles

Re: [patch] Minor adjustment to gimplify_addr_expr

2015-10-14 Thread Eric Botcazou

> Can you use the TMR_OFFSET macro rather than TREE_OPERAND (op0, 1)?
> 
> It also seems that you need a stronger check here.
> 
> Essentially you have to verify that
> STEP * INDEX + INDEX2 + OFFSET == 0
> 
> Right?

No, it's MEM_REF, not TARGET_MEM_REF, see build_fold_addr_expr_with_type_loc.

-- 
Eric Botcazou

Re: [PATCH] New attribute to create target clones

2015-10-14 Thread Evgeny Stupachenko

Bootstrap and make check for x86 passed. No new fails.
Please ignore an empty line added to omp-low.c in the patch, the
misprint will be removed prior to a commit.

Thanks,
Evgeny

On Tue, Oct 13, 2015 at 2:35 AM, Evgeny Stupachenko  wrote:
> Hi All,
>
> Here is a new version of patch (attached).
> Bootstrap and make check are in progress (all new tests passed).
>
> New test case g++.dg/ext/mvc4.C fails with ICE, when options lower
> than "-mavx" are passed.
> However it has the same behavior if "target_clones" attribute is
> replaced by 2 corresponding "target" attributes.
> I've filed PR67946 on this:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67946
>
> Thanks,
> Evgeny
>
> ChangeLog:
>
> 2015-10-13  Evgeny Stupachenko  
> gcc/
> * Makefile.in (OBJS): Add multiple_target.o.
> * attrib.c (make_attribute): Moved from config/i386/i386.c
> * config/i386/i386.c (make_attribute): Deleted.
> * multiple_target.c (make_attribute): New.
> (create_dispatcher_calls): Ditto.
> (get_attr_len): Ditto.
> (get_attr_str): Ditto.
> (is_valid_asm_symbol): Ditto.
> (create_new_asm_name): Ditto.
> (create_target_clone): Ditto.
> (expand_target_clones): Ditto.
> (ipa_target_clone): Ditto.
> (ipa_dispatcher_calls): Ditto.
> * passes.def (pass_target_clone): Two new ipa passes.
> * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
> * c-common.c (handle_target_clones_attribute): New.
> * (c_common_attribute_table): Add handle_target_clones_attribute.
> * (handle_always_inline_attribute): Add check on target_clones
> attribute.
> * (handle_target_attribute): Ditto.
>
> gcc/testsuite
> * gcc.dg/mvc1.c: New test for multiple targets cloning.
> * gcc.dg/mvc2.c: Ditto.
> * gcc.dg/mvc3.c: Ditto.
> * gcc.dg/mvc4.c: Ditto.
> * gcc.dg/mvc5.c: Ditto.
> * gcc.dg/mvc6.c: Ditto.
> * gcc.dg/mvc7.c: Ditto.
> * g++.dg/ext/mvc1.C: Ditto.
> * g++.dg/ext/mvc2.C: Ditto.
> * g++.dg/ext/mvc3.C: Ditto.
> * g++.dg/ext/mvc4.C: Ditto.
>
> gcc/doc
> * doc/extend.texi (target_clones): New attribute description.
>
> On Sat, Oct 10, 2015 at 12:44 AM, Evgeny Stupachenko  
> wrote:
>> On Fri, Oct 9, 2015 at 11:04 PM, Jan Hubicka  wrote:
 On Fri, Oct 9, 2015 at 9:27 PM, Jan Hubicka  wrote:
 >> >Of course it also depends what you inline into function. You can have
 >> >
 >> >bar() target(-mavx) {fancy avx code}
 >> >foobar() { .. if (avx) bar();}
 >> >foo() ctarget(-mavx,-mno-avx) {foobar();}

 "no-" targets are not supported
>>>
>>> Why not? I suppose I can use -march=x86_64 in a file compiled with 
>>> -march=core-avx2 or something like that, too.
>> Sure, you can. target(arch=x86-64) is ok. I mean exactly target(no-avx) 
>> returns:
>>
>> aaa.cpp: In function '':
>> aaa.cpp:7:5: error: No dispatcher found for no-avx
>>  int bar()
>>  ^
>>

 >> >
 >> >Now if you compile with -mavx and because ctarget takes effect only 
 >> >after inlining,
 >> >at inlining time the target attributes will match and we can edn up 
 >> >inline bar->foobar->foo.
 >> >After that we multiversion foo and drop AVX flag we will likely get 
 >> >ICE at expansion
 >> >time.
 >> But isn't that avoided by fixing up the call graph so that all calls
 >> to the affected function are going through the dispatcher?  Or is
 >> that happening too late?
 >
 > There is dispatcher only for foo that is the root of the callgarph tree.
 > When inlining we compare target attributes for match (in 
 > can_inline_edge_p).
 > We do not compare ctarget attributes.  Expanding ctarget to target early 
 > would
 > avoid need for ctarget handling.
 Currently inlining is disabled for functions with target_clone attribute:
>>>
>>> Do you also disable inlining into functions with target_clone?
>>> What I am concerned about is early inliner inlining (say) AVX code into 
>>> ctarget
>>> function because at early inlining time the target is not applied, yet.
>> Right. Now I've got your point and ICE on the test.
>> Yes the solution is to disable inline into target_clones function.
>> Or to move the pass creating clones before inline (as you suggested)
>> and leave dispatcher creator after inline.
>>
>> I like you suggestion. It fixes the ICE.
>> I'll fix the patch and retest.
>>
>> Thank you for the review,
>> Evgeny.
>>
>>
>>>
>>> Honza

Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Marc Glisse


On Wed, 14 Oct 2015, Richard Biener wrote:


+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
+   (lshift @0 (convert @2

You don't need/want to convert @2 (fold-const doesn't convert, does it?),
and you don't need to check for tree_nop_conversion_p.


I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.


- that check should be with TREE_TYPE (@1)
- 1u << 33 is undefined, isn't it?

x * (int)(1ul << b), which for b=33 should yield 0, would give the 
undefined x << b so some check does seem needed indeed.


--
Marc Glisse

Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> Ok, got it. The first solution would require changes in libsanitizer because
> heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> upstream, right? Or this may be implemented as local patch for GCC?

No.  The heuristics relies on:
1) either it is old style float cast overflow without location
2) or it is new style float cast with location, but the location must:
   a) not have NULL filename
   b) the filename must not be ""
   c) the filename must not be "\1"
So, my proposal was to emit in GCC the old style float cast overflow if a), b) 
or
c) is true, otherwise the new style.  I have no idea what you mean by
heuristic doesn't work for GCC after that.

> BTW, I actually saw UNKNOWN_LOCATION for this expr:
> 
> volatile double var;  // this is tcc_decaration, so we have UNKNOWN_LOCATION
> for it.

This is not a complete testcase, so I wonder what exactly you are talking
about.  The above doesn't not generate any
__ubsan_handle_float_cast_overflow calls with
-fsanitize=float-cast-overflow, and
volatile double d;
int bar (void) { return d; }
has location.

Jakub

Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko


On 14/10/15 10:48, Jakub Jelinek wrote:

On Tue, Oct 13, 2015 at 02:22:36PM +0300, Maxim Ostapenko wrote:

This is the final patch. Force libsanitizer to use an old ABI for ubsan
float cast data descriptors, because for some exprs (e.g. that type of
tcc_declaration) we can't get the right location for now. I'm not sure about
this, perhaps it should be fixed in GCC somehow.

I don't like this (neither the heuristics on the libubsan, it wouldn't be a
big deal to add a new library entrypoint).
If because of the heuristics you need to ensure that the SourceLocation is
always known, then either you check in ubsan.c whether expand_location
gives you NULL xloc.file and in that case use old style float cast overflow
(without location) - i.e. pass 0, NULL, otherwise you use new style, i.e.
pass 1,   Or arrange through some special option to emit something like
{ "", 0, 0 } instead of { NULL, 0, 0 } for the float cast case.
And, regardless of this, any progress in making sure we have fewer cases
with UNKNOWN_LOCATION on this will not hurt.  I think at this point I'd
prefer the first choice, i.e. using old style for locations without
filename, and new style otherwise.


2015-10-13  Maxim Ostapenko  

* ubsan/ubsan_handlers.cc (looksLikeFloatCastOverflowDataV1): Always
return true for now.

Index: libsanitizer/ubsan/ubsan_handlers.cc
===
--- libsanitizer/ubsan/ubsan_handlers.cc(revision 250059)
+++ libsanitizer/ubsan/ubsan_handlers.cc(working copy)
@@ -307,6 +307,9 @@
  }
  
  static bool looksLikeFloatCastOverflowDataV1(void *Data) {

+  // (TODO): propagate SourceLocation into DataDescriptor and use this
+  // heuristic than.
+  return true;
// First field is either a pointer to filename or a pointer to a
// TypeDescriptor.
u8 *FilenameOrTypeDescriptor;


Jakub



Ok, got it. The first solution would require changes in libsanitizer 
because heuristic doesn't work for GCC, so perhaps new UBSan entry point 
should go upstream, right? Or this may be implemented as local patch for 
GCC?


BTW, I actually saw UNKNOWN_LOCATION for this expr:

volatile double var;  // this is tcc_decaration, so we have 
UNKNOWN_LOCATION for it.


I wonder if we need emit __ubsan_handle_float_cast_overflow here at all.

Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Richard Biener

On Wed, Oct 14, 2015 at 7:39 AM, Marc Glisse  wrote:
>
> +(simplify
> + (plus (convert? @0) (convert? (xdivamulminusa @0 @1)))
> +  (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> +   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
> +   (trunc_mod (convert @0) (convert @1
>
> See PR 67953.

Please drop xdivamulminusa.  It was a bad idea of mine, just add two patterns.

> +(match (abitandnotb @0 @1)
> + (bit_and:c @0 (bit_not INTEGER_CST@1)))
>
> Does that work?

No.  Please drop these helpers and instead duplicate the patterns.

>
> +/* Fold (a * (1 << b)) into (a << b)  */
> +(simplify
> + (mult:c @0 (convert? (lshift integer_onep@1 @2)))
> +  (if (! FLOAT_TYPE_P (type)
> +&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
> +   (lshift @0 (convert @2
>
> You don't need/want to convert @2 (fold-const doesn't convert, does it?),
> and you don't need to check for tree_nop_conversion_p.

I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.  Indeed you don't need the convert on @2.

Richard.

>
> --
> Marc Glisse

Re: [PATCH] Allow FSM to thread single block cases too

2015-10-14 Thread Richard Biener

On Tue, Oct 13, 2015 at 2:52 PM, Richard Biener
 wrote:
> On Tue, Oct 13, 2015 at 2:21 PM, Jeff Law  wrote:
>>
>> One of the cases that was missing in the FSM support is threading when the
>> path is a single block.  ie, a control statement's output can be statically
>> determined just by looking at PHIs in the control statement's block for one
>> or incoming edges.
>>
>> This is necessary to fix a regression if I turn off the old jump threader's
>> backedge support.  Just as important, Jan has in the past asked about a
>> trivial jump threader to be run during early optimizations.  Limiting the
>> FSM bits to this case would likely satisfy that need in the future.
>
> I think he asked for trivial forward threads though due to repeated tests.
> I hacked FRE to do this (I think), but maybe some trivial cleanup 
> opportunities
> are still left here.  Honza?

This or other related patches in the range r228731:228774 has caused a quite
big jump in SPEC CPU 2000 binary sizes (notably 176.gcc - so maybe affecting
bootstrap as well, at -O3).  Are you sure this doesn't re-introduce DOM
effectively peeling all loops once?

Richard.

> Richard.
>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on the
>> trunk.
>>
>> Jeff
>>
>> commit a53bb29a1dffd329aa6235b88b0c2a830aa5a59e
>> Author: Jeff Law 
>> Date:   Tue Oct 13 06:19:20 2015 -0600
>>
>> [PATCH] Allow FSM to thread single block cases too
>>
>> * tree-ssa-threadbackward.c
>> (fsm_find_control_statement_thread_paths):
>> Allow single block jump threading paths.
>>
>> * gcc.dg/tree-ssa/ssa-thread-13.c: New test.
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index d71bcd2..caab533 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2015-10-13  Jeff Law  
>> +
>> +   * tree-ssa-threadbackward.c
>> (fsm_find_control_statement_thread_paths):
>> +   Allow single block jump threading paths.
>> +
>>  2015-10-13  Tom de Vries  
>>
>> PR tree-optimization/67476
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 4a08f0f..acf6df5 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,7 @@
>> +2015-10-13  Jeff Law  
>> +
>> +   * gcc.dg/tree-ssa/ssa-thread-13.c: New test.
>> +
>>  2015-10-12  Jeff Law  
>>
>> * gcc.dg/tree-ssa/ssa-thread-12.c: New test.
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> new file mode 100644
>> index 000..5051d11
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-13.c
>> @@ -0,0 +1,70 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */
>> +/* { dg-final { scan-tree-dump "FSM" "vrp1" } } */
>> +
>> +typedef struct rtx_def *rtx;
>> +typedef const struct rtx_def *const_rtx;
>> +enum rtx_code
>> +{
>> +  UNKNOWN, VALUE, DEBUG_EXPR, EXPR_LIST, INSN_LIST, SEQUENCE, ADDRESS,
>> +DEBUG_INSN, INSN, JUMP_INSN, CALL_INSN, BARRIER, CODE_LABEL, NOTE,
>> +COND_EXEC, PARALLEL, ASM_INPUT, ASM_OPERANDS, UNSPEC, UNSPEC_VOLATILE,
>> +ADDR_VEC, ADDR_DIFF_VEC, PREFETCH, SET, USE, CLOBBER, CALL, RETURN,
>> +EH_RETURN, TRAP_IF, CONST_INT, CONST_FIXED, CONST_DOUBLE, CONST_VECTOR,
>> +CONST_STRING, CONST, PC, REG, SCRATCH, SUBREG, STRICT_LOW_PART, CONCAT,
>> +CONCATN, MEM, LABEL_REF, SYMBOL_REF, CC0, IF_THEN_ELSE, COMPARE, PLUS,
>> +MINUS, NEG, MULT, SS_MULT, US_MULT, DIV, SS_DIV, US_DIV, MOD, UDIV,
>> UMOD,
>> +AND, IOR, XOR, NOT, ASHIFT, ROTATE, ASHIFTRT, LSHIFTRT, ROTATERT, SMIN,
>> +SMAX, UMIN, UMAX, PRE_DEC, PRE_INC, POST_DEC, POST_INC, PRE_MODIFY,
>> +POST_MODIFY, NE, EQ, GE, GT, LE, LT, GEU, GTU, LEU, LTU, UNORDERED,
>> +ORDERED, UNEQ, UNGE, UNGT, UNLE, UNLT, LTGT, SIGN_EXTEND, ZERO_EXTEND,
>> +TRUNCATE, FLOAT_EXTEND, FLOAT_TRUNCATE, FLOAT, FIX, UNSIGNED_FLOAT,
>> +UNSIGNED_FIX, FRACT_CONVERT, UNSIGNED_FRACT_CONVERT, SAT_FRACT,
>> +UNSIGNED_SAT_FRACT, ABS, SQRT, BSWAP, FFS, CLZ, CTZ, POPCOUNT, PARITY,
>> +SIGN_EXTRACT, ZERO_EXTRACT, HIGH, LO_SUM, VEC_MERGE, VEC_SELECT,
>> +VEC_CONCAT, VEC_DUPLICATE, SS_PLUS, US_PLUS, SS_MINUS, SS_NEG, US_NEG,
>> +SS_ABS, SS_ASHIFT, US_ASHIFT, US_MINUS, SS_TRUNCATE, US_TRUNCATE, FMA,
>> +VAR_LOCATION, DEBUG_IMPLICIT_PTR, ENTRY_VALUE, LAST_AND_UNUSED_RTX_CODE
>> +};
>> +union rtunion_def
>> +{
>> +  rtx rt_rtx;
>> +};
>> +typedef union rtunion_def rtunion;
>> +struct rtx_def
>> +{
>> +  __extension__ enum rtx_code code:16;
>> +  union u
>> +  {
>> +rtunion fld[1];
>> +  }
>> +  u;
>> +};
>> +
>> +unsigned int rtx_cost (rtx, enum rtx_code, unsigned char);
>> +rtx single_set_2 (const_rtx, rtx);
>> +
>> +unsigned
>> +seq_cost (const_rtx seq, unsigned char speed)
>> +{
>> +  unsigned cost = 0;
>> +  rtx set;
>> +  for (; seq; seq =

Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Richard Biener

On Wed, Oct 14, 2015 at 12:45 PM, Marc Glisse  wrote:
> On Wed, 14 Oct 2015, Richard Biener wrote:
>
>>> +/* Fold (a * (1 << b)) into (a << b)  */
>>> +(simplify
>>> + (mult:c @0 (convert? (lshift integer_onep@1 @2)))
>>> +  (if (! FLOAT_TYPE_P (type)
>>> +&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
>>> +   (lshift @0 (convert @2
>>>
>>> You don't need/want to convert @2 (fold-const doesn't convert, does it?),
>>> and you don't need to check for tree_nop_conversion_p.
>>
>>
>> I think for long x and x * (long)(1u << b) you need to do because the
>> result for b == 33 would be different.
>
>
> - that check should be with TREE_TYPE (@1)

of course

> - 1u << 33 is undefined, isn't it?

Is it?  I thought it were fine for unsigned.  Not sure if we should exploit this
undefinedness here.  Btw, if it were a truncating conversion then the
resulting shift could be invalid if @2 is too big (but not too big for the
wider shift).  So either way I think we should only allow nop conversions
here (as fold-const.c did).

Richard.

> x * (int)(1ul << b), which for b=33 should yield 0, would give the undefined
> x << b so some check does seem needed indeed.
>
> --
> Marc Glisse

Re: Move some bit and binary optimizations in simplify and match

2015-10-14 Thread Marc Glisse


On Wed, 14 Oct 2015, Richard Biener wrote:


On Wed, Oct 14, 2015 at 12:45 PM, Marc Glisse  wrote:

On Wed, 14 Oct 2015, Richard Biener wrote:


+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
+   (lshift @0 (convert @2

You don't need/want to convert @2 (fold-const doesn't convert, does it?),
and you don't need to check for tree_nop_conversion_p.



I think for long x and x * (long)(1u << b) you need to do because the
result for b == 33 would be different.



- that check should be with TREE_TYPE (@1)


of course


- 1u << 33 is undefined, isn't it?


Is it?  I thought it were fine for unsigned.


Can't be, Intel thinks it is 2u while some other hardware thinks it is 0.


Not sure if we should exploit this
undefinedness here.  Btw, if it were a truncating conversion then the
resulting shift could be invalid if @2 is too big (but not too big for the
wider shift).


Yes, that was my example below.


So either way I think we should only allow nop conversions
here (as fold-const.c did).


I agree that's the safest / easiest for now.


x * (int)(1ul << b), which for b=33 should yield 0, would give the undefined
x << b so some check does seem needed indeed.


--
Marc Glisse

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-10-14 Thread Ilya Enkovich

2015-10-14 11:49 GMT+03:00 Richard Biener :
> On Tue, Oct 13, 2015 at 4:52 PM, Ilya Enkovich  wrote:
>> I don't understand what you mean. vect_get_vec_def_for_operand has two
>> changes made.
>> 1. For boolean invariants use build_same_sized_truth_vector_type
>> instead of get_vectype_for_scalar_type in case statement produces a
>> boolean vector. This covers cases when we use invariants in
>> comparison, AND, IOR, XOR.
>
> Yes, I understand we need this special-casing to differentiate between
> the vector type
> used for boolean-typed loads/stores and the type for boolean typed constants.
> What happens if we mix them btw, like with
>
>   _Bool b = bools[i];
>   _Bool c = b || d;
>   ...
>
> ?

Here both statements should get vector of char as a vectype and we
never go VECTOR_BOOLEAN_TYPE_P way for them

>
>> 2. COND_EXPR is an exception because it has built-in boolean vector
>> result not reflected in its vecinfo. Thus I added additional operand
>> for vect_get_vec_def_for_operand to directly specify vectype for
>> vector definition in case it is a loop invariant.
>> So what do you propose to do with these changes?
>
> This is the change I don't like and don't see why we need it.  It works today
> and the comparison operands should be of appropriate type already?

Today it works because we always create vector of integer constant.
With boolean vectors it may be either integer vector or boolean vector
depending on context. Consider:

_Bool _1;
int _2;

_2 = _1 != 0 ? 0 : 1

We have two zero constants here requiring different vectypes.

Ilya

>
> Richard.
>
>> Thanks,
>> Ilya

Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-10-14 Thread Ilya Enkovich

2015-10-13 16:37 GMT+03:00 Richard Biener :
> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch handles statements with boolean result in vectorization factor 
>> computation.  For comparison its operands type is used instead of restult 
>> type to compute VF.  Other boolean statements are ignored for VF.
>>
>> Vectype for comparison is computed using type of compared values.  Computed 
>> type is propagated into other boolean operations.
>
> This feels rather ad-hoc, mixing up the existing way of computing
> vector type and VF.  I'd rather have turned the whole
> vector type computation around to the scheme working on the operands
> rather than on the lhs and then searching
> for smaller/larger types on the rhs'.
>
> I know this is a tricky function (heh, but you make it even worse...).
> And it needs a helper with knowledge about operations
> so one can compute the result vector type for an operation on its
> operands.  The seeds should be PHIs (handled like now)
> and loads, and yes, externals need special handling.
>
> Ideally we'd do things in two stages, first compute vector types in a
> less constrained manner (not forcing a single vector size)
> and then in a 2nd run promote to a common size also computing the VF to do 
> that.

This sounds like a refactoring, not a functional change, right? Also I
don't see a reason to analyze DF to compute vectypes if we promote it
to a single vector size anyway. For booleans we have to do it because
boolean vectors of the same size may have different number of
elements. What is the reason to do it for other types?

Shouldn't it be a patch independent from comparison vectorization series?

>
> Btw, I think you "mishandle" bool b = boolvar != 0;

This should be handled fine. Statement will inherit a vectype of
'boolvar' definition. If it's invariant - then yes, invariant boolean
statement case is not handled. But this is only because I supposed we
just shouldn't have such statements in a loop. If we may have them,
then using 'vector _Bool (VF)' type for that should be OK.

Ilya

>
> Richard.
>

Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-14 Thread Jan Hubicka

> On October 14, 2015 6:27:02 PM GMT+02:00, Jan Hubicka  wrote:
> >Hi,
> >this patch adds the CONSTRUCTOR case discussed while back.  Only empty
> >constructors are matched, as those are only appearing in gimple
> >operand.
> >I tested that during bootstrap about 7500 matches are for empty ctors.
> >There are couple hundred for non-empty probably used on generic. 
> >
> >Bootstrapped/regtested x86_64-linux, OK?
> >
> >Honza
> >
> > * fold-const.c (operand_equal_p): Match empty constructors.
> >Index: fold-const.c
> >===
> >--- fold-const.c (revision 228735)
> >+++ fold-const.c (working copy)
> >@@ -2890,6 +2891,11 @@ operand_equal_p (const_tree arg0, const_
> > return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1,
> >0),
> > flags | OEP_ADDRESS_OF
> > | OEP_CONSTANT_ADDRESS_OF);
> >+  case CONSTRUCTOR:
> >+/* In GIMPLE empty constructors are allowed in initializers of
> >+   vector types.  */
> 
> The comment is wrong (or at least odd),
> On gimple an empty vector constructor should be folded to a VECTOR_CST.

Hmm, I tought we have a={} i.e. for clearing whole structure.
If we never have constructors appearing as gimple operands, I guess we don't
really need the code at gimple level (because at least so far ipa-icf has
separate path to compare constructors).

Honza
> 
> >+return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
> >+&& !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
> >   default:
> > break;
> >   }
>

Re: [PATCH] c/67925 - update documentation on `inline'

2015-10-14 Thread Martin Sebor


On 10/14/2015 03:42 PM, Arkadiusz Drabczyk wrote:

On Wed, Oct 14, 2015 at 08:36:43AM -0600, Martin Sebor wrote:

On 10/13/2015 04:47 PM, Arkadiusz Drabczyk wrote:

* gcc/doc/extend.texi: documentation says that functions declared
`inline' would not be integrated if they are called before they are
defined or if they are recursive. Both of these statements is now
false as shown in examples on Bugzilla.


It might also be worth updating the note in the subsequent
paragraph and removing the mention of variable-length data types
which no longer prevent inlining.


Done.  I also removed the mention of nested functions as the following
code compiled with GCC 6.0 doesn't give any warning with -O2 -Winline
and main() is the only function defined in assembler code:


I think this is the Ada-specific warning I mentioned (see
check_inlining_for_nested_subprog in gcc/ada/gcc-interface/trans.c)
so the part about nested functions needs to stay.


  = G_("function %q+F can never be inlined because "
   "it uses non-local goto");


I tested of all of these and listed them in the documentation but
wasn't able to reproduce this one.  The following code does not give
any warning with -O2 -Winline:


The warning above is issued for non-local and computed goto. You can
find examples of both in the test suite (find gcc/testsuite/ -name
"*goto*.[cC]")



#include 
#include 
#include 

static jmp_buf buf;

inline static void longjmp_test(int n)
{
puts("hi");

if (n == 2)
longjmp(buf, 2);
}

int main(int argc, char *argv[])
{
printf("%d\n", setjmp(buf));
longjmp_test(atoi(argv[1]));
puts("next line in a normal execution");
exit(0);
}

Is this test case correct?


Not really. You can see the warning by calling setjmp in longjmp_test.


 However, I get the following warning using
__builtin_longjmp() instead of longjmp():

warning: function 'longjmp_test' can never be inlined because it uses
setjmp-longjmp exception handling [-Winline]


Right. You can see why in the comment in gcc/tree-inline.c:
/* We can't inline functions that call __builtin_longjmp at
   all.  The non-local goto machinery really requires the
   destination be in a different function.  If we allow the
   function calling __builtin_longjmp to be inlined into the
   function calling __builtin_setjmp, Things will Go Awry.  */
inline_forbidden_reason
  = G_("function %q+F can never be inlined because "
   "it uses setjmp-longjmp exception handling");


8--8<

* gcc/doc/extend.texi: documentation says that functions declared
`inline' would not be integrated if they are called before they are
defined, if they are recursive, if they use variable-length data types
or if they are nested.  All of these statements are now false and have
been removed. Mention of setjmp(), __builtin_return() and
__builtin_apply_args() has been added.


Great, thank you for taking the time to make these additional updates.
Modulo the nested functions, the patch looks good to me (someone else
will need to formally approve the final version).

Martin


---
  gcc/doc/extend.texi | 21 +
  1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 79440d3..be95cc3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7088,21 +7088,18 @@ function are integrated into the caller, and the 
function's address is
  never used, then the function's own assembler code is never referenced.
  In this case, GCC does not actually output assembler code for the
  function, unless you specify the option @option{-fkeep-inline-functions}.
-Some calls cannot be integrated for various reasons (in particular,
-calls that precede the function's definition cannot be integrated, and
-neither can recursive calls within the definition).  If there is a
-nonintegrated call, then the function is compiled to assembler code as
-usual.  The function must also be compiled as usual if the program
-refers to its address, because that can't be inlined.
+If there is a nonintegrated call, then the function is compiled to
+assembler code as usual.  The function must also be compiled as usual if
+the program refers to its address, because that can't be inlined.

  @opindex Winline
  Note that certain usages in a function definition can make it unsuitable
-for inline substitution.  Among these usages are: variadic functions, use of
-@code{alloca}, use of variable-length data types (@pxref{Variable Length}),
-use of computed goto (@pxref{Labels as Values}), use of nonlocal goto,
-and nested functions (@pxref{Nested Functions}).  Using @option{-Winline}
-warns when a function marked @code{inline} could not be substituted,
-and gives the reason for the failure.
+for inline substitution.  Among these usages are: variadic

Re: libgomp testsuite: Remove some explicit acc_device_nvidia usage

2015-10-14 Thread Bernd Schmidt


On 10/09/2015 05:11 PM, Thomas Schwinge wrote:

On Wed, 22 Jul 2015 16:39:54 +0200, I wrote:

[...] cleanup; committed to
gomp-4_0-branch in r226072: [...]


OK for trunk?


I think all three patches here look OK.


Bernd

Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-10-14 Thread Ilya Enkovich

2015-10-13 16:45 GMT+03:00 Richard Biener :
> On Thu, Oct 8, 2015 at 5:03 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch supports comparison statements vectrization basing on introduced 
>> optabs.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-10-08  Ilya Enkovich  
>>
>> * tree-vect-data-refs.c (vect_get_new_vect_var): Support 
>> vect_mask_var.
>> (vect_create_destination_var): Likewise.
>> * tree-vect-stmts.c (vectorizable_comparison): New.
>> (vect_analyze_stmt): Add vectorizable_comparison.
>> (vect_transform_stmt): Likewise.
>> * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
>> (enum stmt_vec_info_type): Add comparison_vec_info_type.
>> (vectorizable_comparison): New.
>>
>>
>> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> index 3befa38..9edc663 100644
>> --- a/gcc/tree-vect-data-refs.c
>> +++ b/gcc/tree-vect-data-refs.c
>> @@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
>> var_kind, const char *name)
>>case vect_scalar_var:
>>  prefix = "stmp";
>>  break;
>> +  case vect_mask_var:
>> +prefix = "mask";
>> +break;
>>case vect_pointer_var:
>>  prefix = "vectp";
>>  break;
>> @@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree 
>> vectype)
>>tree type;
>>enum vect_var_kind kind;
>>
>> -  kind = vectype ? vect_simple_var : vect_scalar_var;
>> +  kind = vectype
>> +? VECTOR_BOOLEAN_TYPE_P (vectype)
>> +? vect_mask_var
>> +: vect_simple_var
>> +: vect_scalar_var;
>>type = vectype ? vectype : TREE_TYPE (scalar_dest);
>>
>>gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 8eda8e9..6949c71 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -7525,6 +7525,211 @@ vectorizable_condition (gimple *stmt, 
>> gimple_stmt_iterator *gsi,
>>return true;
>>  }
>>
>> +/* vectorizable_comparison.
>> +
>> +   Check if STMT is comparison expression that can be vectorized.
>> +   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
>> +   comparison, put it in VEC_STMT, and insert it at GSI.
>> +
>> +   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
>> +
>> +bool
>> +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
>> +gimple **vec_stmt, tree reduc_def,
>> +slp_tree slp_node)
>> +{
>> +  tree lhs, rhs1, rhs2;
>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
>> +  tree vec_compare;
>> +  tree new_temp;
>> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>> +  tree def;
>> +  enum vect_def_type dt, dts[4];
>> +  unsigned nunits;
>> +  int ncopies;
>> +  enum tree_code code;
>> +  stmt_vec_info prev_stmt_info = NULL;
>> +  int i, j;
>> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>> +  vec vec_oprnds0 = vNULL;
>> +  vec vec_oprnds1 = vNULL;
>> +  tree mask_type;
>> +  tree mask;
>> +
>> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
>> +return false;
>> +
>> +  mask_type = vectype;
>> +  nunits = TYPE_VECTOR_SUBPARTS (vectype);
>> +
>> +  if (slp_node || PURE_SLP_STMT (stmt_info))
>> +ncopies = 1;
>> +  else
>> +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>> +
>> +  gcc_assert (ncopies >= 1);
>> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>> +return false;
>> +
>> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
>> +  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
>> +  && reduc_def))
>> +return false;
>> +
>> +  if (STMT_VINFO_LIVE_P (stmt_info))
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"value used after loop.\n");
>> +  return false;
>> +}
>> +
>> +  if (!is_gimple_assign (stmt))
>> +return false;
>> +
>> +  code = gimple_assign_rhs_code (stmt);
>> +
>> +  if (TREE_CODE_CLASS (code) != tcc_comparison)
>> +return false;
>> +
>> +  rhs1 = gimple_assign_rhs1 (stmt);
>> +  rhs2 = gimple_assign_rhs2 (stmt);
>> +
>> +  if (TREE_CODE (rhs1) == SSA_NAME)
>> +{
>> +  gimple *rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
>> +  if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
>> +_def_stmt, , , ))
>> +   return false;
>> +}
>> +  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
>> +  && TREE_CODE (rhs1) != FIXED_CST)
>> +return false;
>
> I think vect_is_simple_use_1 handles constants just fine an def_stmt
> is an output,
> you don't need to initialize it.

OK

>
>> +
>> +  if

[PATCH][ARM] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-10-14 Thread Kyrill Tkachov


Hi all,

This patch fixes the referenced PR by rewriting the vfp3_const_double_for_bits 
function in arm.c
The function is supposed to accept positive CONST_DOUBLE rtxes whose value is 
an exact power of 2
and whose log2 is between 1 and 32. That is values like 2.0, 4.0, 8.9, 16.0 
etc...

The current implementation seems to have been written under the assumption that 
exact_real_truncate returns
false if the input value is not an exact integer, whereas in fact 
exact_real_truncate returns false if the
truncation operation was not exact, which are different things. This would lead 
the function to accept any
CONST_DOUBLE that can truncate to a power of 2, such as 4.9, 16.2 etc.

In any case, I've rewritten this function and used the real_isinteger predicate 
to check if the real value
is an exact integer.

The testcase demonstrates the kind of wrong code that this patch addresses.

This bug appears on GCC 5 and 4.9 as well, but due to the recent introduction 
of CONST_DOUBLE_REAL_VALUE
this patch doesn't apply on those branches. I will soon post the backportable 
variant.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-10-12  Kyrylo Tkachov  

PR target/67929
* config/arm/arm.c (vfp3_const_double_for_bits): Rewrite.
* config/arm/constraints.md (Dp): Update callsite.
* config/arm/predicates.md (const_double_vcvt_power_of_two): Likewise.

2015-10-12  Kyrylo Tkachov  

PR target/67929
* gcc.target/arm/pr67929_1.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0bf1164..29dd489 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27734,25 +27734,37 @@ vfp3_const_double_for_fract_bits (rtx operand)
   return 0;
 }
 
+/* If X is a CONST_DOUBLE with a value that is a power of 2 whose
+   log2 is in [1, 32], return that log2.  Otherwise return -1.
+   This is used in the patterns for vcvt.s32.f32 floating-point to
+   fixed-point conversions.  */
+
 int
-vfp3_const_double_for_bits (rtx operand)
+vfp3_const_double_for_bits (rtx x)
 {
-  const REAL_VALUE_TYPE *r0;
+  const REAL_VALUE_TYPE *r;
 
-  if (!CONST_DOUBLE_P (operand))
-return 0;
+  if (!CONST_DOUBLE_P (x))
+return -1;
 
-  r0 = CONST_DOUBLE_REAL_VALUE (operand);
-  if (exact_real_truncate (DFmode, r0))
-{
-  HOST_WIDE_INT value = real_to_integer (r0);
-  value = value & 0x;
-  if ((value != 0) && ( (value & (value - 1)) == 0))
-	return int_log2 (value);
-}
+  r = CONST_DOUBLE_REAL_VALUE (x);
 
-  return 0;
+  if (REAL_VALUE_NEGATIVE (*r)
+  || REAL_VALUE_ISNAN (*r)
+  || REAL_VALUE_ISINF (*r)
+  || !real_isinteger (r, SFmode))
+return -1;
+
+  HOST_WIDE_INT hwint = exact_log2 (real_to_integer (r));
+
+/* The exact_log2 above will have returned -1 if this is
+   not an exact log2.  */
+  if (!IN_RANGE (hwint, 1, 32))
+return -1;
+
+  return hwint;
 }
+
 
 /* Emit a memory barrier around an atomic sequence according to MODEL.  */
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index e24858f..901cfe5 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -339,7 +339,8 @@
  "@internal
   In ARM/ Thumb2 a const_double which can be used with a vcvt.s32.f32 with bits operation"
   (and (match_code "const_double")
-   (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_bits (op)")))
+   (match_test "TARGET_32BIT && TARGET_VFP
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_register_constraint "Ts" "(arm_restrict_it) ? LO_REGS : GENERAL_REGS"
  "For arm_restrict_it the core registers @code{r0}-@code{r7}.  GENERAL_REGS otherwise.")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 08cc899..48e4ba8 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -668,7 +668,7 @@
 (define_predicate "const_double_vcvt_power_of_two"
   (and (match_code "const_double")
(match_test "TARGET_32BIT && TARGET_VFP
-   && vfp3_const_double_for_bits (op)")))
+		&& vfp3_const_double_for_bits (op) > 0")))
 
 (define_predicate "neon_struct_operand"
   (and (match_code "mem")
diff --git a/gcc/testsuite/gcc.target/arm/pr67929_1.c b/gcc/testsuite/gcc.target/arm/pr67929_1.c
new file mode 100644
index 000..14943b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr67929_1.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_vfp3_ok } */
+/* { dg-options "-O2 -fno-inline" } */
+/* { dg-add-options arm_vfp3 } */
+/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+
+int
+foo (float a)
+{
+  return a * 4.9f;
+}
+
+
+int
+main (void)
+{
+  if (foo (10.0f) != 49)
+__builtin_abort ();
+
+  return 0;
+}
\ No newline at end of file

[PATCH][AArch64] Enable fusion of AES instructions

2015-10-14 Thread Wilco Dijkstra

Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This 
can give up to 2x
speedup on many AArch64 implementations. Also model the crypto instructions on 
Cortex-A57 according
to the Optimization Guide.

Passes regression tests.

ChangeLog:
2015-10-14  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(arch_macro_fusion_pair_p): Add support for AES fusion.
* gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry.
* gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue):
Allow virtual registers before reload so early scheduling works.
* gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use
correct latency and pipeline.
(cortex_a57_crypto_complex): Likewise.
(cortex_a57_crypto_xor): Likewise.
(define_bypass): Add AES bypass.


---
 gcc/config/aarch64/aarch64-fusion-pairs.def |  1 +
 gcc/config/aarch64/aarch64.c| 10 +++---
 gcc/config/arm/aarch-common.c   |  7 +--
 gcc/config/arm/cortex-a57.md| 17 +++--
 4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 53bbef4..fea79fc 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -33,4 +33,5 @@ AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
 AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
+AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 230902d..96368c6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -376,7 +376,7 @@ static const struct tune_params cortexa53_tunings =
   _branch_cost,
   4, /* memmov_cost  */
   2, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   8,   /* function_align.  */
   8,   /* jump_align.  */
@@ -398,7 +398,7 @@ static const struct tune_params cortexa57_tunings =
   _branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
@@ -420,7 +420,7 @@ static const struct tune_params cortexa72_tunings =
   _branch_cost,
   4, /* memmov_cost  */
   3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
| AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
@@ -12843,6 +12843,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
 }
 }
 
+  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC)
+   && aarch_crypto_can_dual_issue (prev, curr))
+return true;
+
   if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
   && any_condjump_p (curr))
 {
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 5dd8222..e191ab6 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -63,8 +63,11 @@ aarch_crypto_can_dual_issue (rtx_insn *producer_insn, 
rtx_insn *consumer_insn)
   {
 unsigned int regno = REGNO (SET_DEST (producer_set));
 
-return REGNO (SET_DEST (consumer_set)) == regno
-   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
+/* Before reload the registers are virtual, so the destination of
+   consumer_set doesn't need to match.  */
+
+return (REGNO (SET_DEST (consumer_set)) == regno || !reload_completed)
+   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
   }
 
   return 0;
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index a32c848..eab9d99 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -745,20 +745,20 @@
 neon_fp_sqrt_s_q, neon_fp_sqrt_d_q"))
   "ca57_cx2_block*3")
 
-(define_insn_reservation "cortex_a57_crypto_simple" 4
+(define_insn_reservation "cortex_a57_crypto_simple" 3
   (and (eq_attr "tune" "cortexa57")
(eq_attr "type" 
"crypto_aese,crypto_aesmc,crypto_sha1_fast,crypto_sha256_fast"))
-  "ca57_cx2")
+  "ca57_cx1")
 
-(define_insn_reservation "cortex_a57_crypto_complex" 7
+(define_insn_reservation "cortex_a57_crypto_complex" 6
   (and (eq_attr "tune" "cortexa57")
(eq_attr "type" "crypto_sha1_slow,crypto_sha256_slow"))
-  "ca57_cx2+(ca57_cx2_issue,ca57_cx2)")
+  "ca57_cx1*2")

Re: [gomp4] privatize internal array variables introduced by the fortran FE

2015-10-14 Thread Paul Richard Thomas

Dear Cesar,

>
> Is there any reason why only certain arrays have array descriptors? The
> arrays with descriptors don't have this problem. It's only the ones
> without descriptors that leak new internal variables that cause errors
> with default(none).
>

This is an obvious question to which there is no obvious answer. When
asked it of one of the originators of gfortran, I was told that they
tried but got into some unspecified mess.

I would add the question as to why characters and scalars do not have
descriptors as well? One day, the volunteer maintainers will have
sorted out enough of the PRs to turn to issues like this. However,
simplification of this kind is just not on the cards at present.

Cheers

Paul

Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Maxim Ostapenko


On 14/10/15 14:06, Jakub Jelinek wrote:

On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:

Ok, got it. The first solution would require changes in libsanitizer because
heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
upstream, right? Or this may be implemented as local patch for GCC?

No.  The heuristics relies on:
1) either it is old style float cast overflow without location
2) or it is new style float cast with location, but the location must:
a) not have NULL filename
b) the filename must not be ""
c) the filename must not be "\1"
So, my proposal was to emit in GCC the old style float cast overflow if a), b) 
or
c) is true, otherwise the new style.  I have no idea what you mean by
heuristic doesn't work for GCC after that.


I mean that there are some cases where (FilenameOrTypeDescriptor[0] + 
FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we 
should use old style. I actually caught this on float-cast-overflow-10.c 
testcase. Here:


$ /home/max/build/master-ref/gcc/xgcc -B/home/max/build/master-ref/gcc/ 
/home/max/workspace/downloads/svn/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c 
-B/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ 
-B/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ubsan/ 
-L/home/max/build/master-ref/x86_64-unknown-linux-gnu/./libsanitizer/ubsan/.libs 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 
-fsanitize=float-cast-overflow -fsanitize-recover=float-cast-overflow 
-DUSE_INT128 -DUSE_DFP -DBROKEN_DECIMAL_INT128 -lm -o 
./float-cast-overflow-10.s -S


$ cat float-cast-overflow-10.s

cvt_sc_d32:
.LFB0:
.cfi_startproc
pushq   %rbx
..
.L6:
movl%ebx, %esi
movl$.Lubsan_data0, %edi
call__ubsan_handle_float_cast_overflow
...
.Lubsan_data0:
.quad   .Lubsan_type1
.quad   .Lubsan_type0
.align 2
.type   .Lubsan_type1, @object
.size   .Lubsan_type1, 17
.Lubsan_type1:
.value  -1  // <- TypeKind
.value  32
.string "'_Decimal32'"
.align 2
.type   .Lubsan_type0, @object
.size   .Lubsan_type0, 18
.Lubsan_type0:
.value  0 // <- TypeKind
.value  7
.string "'signed char'"
.section.rodata.cst4,"aM",@progbits,4
.align 4

Here, one can see, we have FilenameOrTypeDescriptor[0]  == -1 and 
FilenameOrTypeDescriptor[1] == 0. So, we end up with wrong decision and 
have SEGV later.

BTW, I actually saw UNKNOWN_LOCATION for this expr:

volatile double var;  // this is tcc_decaration, so we have UNKNOWN_LOCATION
for it.

This is not a complete testcase, so I wonder what exactly you are talking
about.  The above doesn't not generate any
__ubsan_handle_float_cast_overflow calls with
-fsanitize=float-cast-overflow, and
volatile double d;
int bar (void) { return d; }
has location.

Jakub

Re: [PR libgomp/65437, libgomp/66518] Initialize runtime in acc_update_device, acc_update_self

2015-10-14 Thread Bernd Schmidt


On 10/09/2015 05:14 PM, Thomas Schwinge wrote:

Hi!

On Fri, 19 Jun 2015 09:47:41 +0200, I wrote:

On Tue, 5 May 2015 11:43:20 +0200, I wrote:

On Mon, 4 May 2015 10:20:14 -0400, John David Anglin  
wrote:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-42.c
-DACC_DEVICE_TYPE_hos
t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match
\[[0-9a-fA-FxX]+,2
56\] is not mapped



In r224639, now (at least, and at last...) XFAILed


OK to commit?


Ok.


Bernd

Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-14 Thread Jakub Jelinek

On Wed, Oct 14, 2015 at 03:02:22PM +0300, Maxim Ostapenko wrote:
> On 14/10/15 14:06, Jakub Jelinek wrote:
> >On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> >>Ok, got it. The first solution would require changes in libsanitizer because
> >>heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> >>upstream, right? Or this may be implemented as local patch for GCC?
> >No.  The heuristics relies on:
> >1) either it is old style float cast overflow without location
> >2) or it is new style float cast with location, but the location must:
> >a) not have NULL filename
> >b) the filename must not be ""
> >c) the filename must not be "\1"
> >So, my proposal was to emit in GCC the old style float cast overflow if a), 
> >b) or
> >c) is true, otherwise the new style.  I have no idea what you mean by
> >heuristic doesn't work for GCC after that.
> 
> I mean that there are some cases where (FilenameOrTypeDescriptor[0] +
> FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we should
> use old style. I actually caught this on float-cast-overflow-10.c testcase.

Ah, ok, in that case the heuristics is flawed.  If they want to keep it,
they should check if MaybeFromTypeKind is either < 2 or equal to 0x1fe.
Can you report it upstream?  If that is changed, we'd need to change the
above and also add
  d) the filename must not start with "\xff\xff"
to the rules.

I think it would be better to just add a whole new entrypoint, but if they
think the heuristics is good enough, they should at least fix it up.

Jakub

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.

2015-10-14 Thread Kirill Yukhin

Hello,
On 07 Oct 11:09, Jeff Law wrote:
> On 10/05/2015 07:24 AM, Joseph Myers wrote:
> >On Mon, 5 Oct 2015, Kirill Yukhin wrote:
> >
> >>To enable vectorization of loops w/ calls to math functions it is reasonable
> >>to enable parsing of attribute vector for functions unconditionally and
> >>change GlibC's header file not to use `omp declare simd', but use
> >>__attribute__((vector)) instead.
> >
> >I assume you mean __vector__, for namespace reasons.  Obviously you need
> >appropriate GCC version conditionals in the headers to use the attribute
> >only when supported.  In addition, (a) this attribute doesn't seem to be
> >documented in extend.texi, and you'll need to include documentation in
> >your GCC patch that makes this a generic extension rather than just part
> >of Cilkplus, and (b) you'll need to agree with the x86_64 ABI mailing list
> >an extension of the ABI document (as attached to
> >) to cover this attribute, and
> >update the document there.
> I'm not sure why this attribute isn't documented, but clearly that should be
> fixed.
> 
> From the GCC side, I don't see a compelling reason to keep this attribute
> conditional on Cilk+ support.   One could very easily want to use the math
> library's vector entry points independent of OpenMP or Cilk+.
> 
> I thought the ABI for this stuff was consistent across the implementations
> (that was certainly the goal).  So aside from an example of how to use the
> attribute to get calls into the vector math library, I'm not sure what's
> needed.  Essentially the attribute is just another way to ensure we exploit
> the vector library when possible.
> 
> It also seems to me that showing that example on the libmvec page would be
> advisable.

Patch in the bottom introduces new attribute called `simd'.
I've decided to introduce a new one since Cilk's `vector' attribute
generates only one version of SIMD-enabled function [1] (which one -
implementation specific).

This new attribute shouldn't be used along w/ Cilk's `vector'.
If it is used w/ `pragma omp declare simd' - it is ignored.

Bootstrapped. New tests pass.

gcc/
* c/c-parser.c (c_parser): Add simd_attr_present flag.
(c_parser_declaration_or_fndef): Call c_parser_declaration_or_fndef
if simd_attr_present is set.
(c_finish_omp_declare_simd): Handle simd_attr_present.
* omp-low.c (pass_omp_simd_clone::gate): If target allows - call
without additional conditions.
gcc/testsuite/
* c-c++-common/attr-simd.c: New test.
* c-c++-common/attr-simd-2.c: Ditto.
* c-c++-common/attr-simd-3.c: Ditto.

Is it ok for trunk?

Here is a description for new attribute:
simd
Enables creation of one or more versions that can process multiple 
arguments using
SIMD instructions from a single invocation from a SIMD loop. It is 
ultimately an alias
to `omp declare simd’ pragma, available w/o additional compiler 
switches.
It is prohibited to use the attribute along with Cilk Plus’s `vector’ 
attribute.
If the attribute is specified and `pragma omp declare simd’ presents on 
a decl, then
the attribute is ignored.

[1] - 
https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm#elem.attr

--
Thanks, K

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2d24c21..b83c9d8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -224,6 +224,9 @@ struct GTY(()) c_parser {
   /* Buffer to hold all the tokens from parsing the vector attribute for the
  SIMD-enabled functions (formerly known as elemental functions).  */
   vec  *cilk_simd_fn_tokens;
+
+  /* Designates if "simd" attribute is specified in decl.  */
+  BOOL_BITFIELD simd_attr_present : 1;
 };
 
 
@@ -1700,7 +1703,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
   if (declarator == NULL)
{
  if (omp_declare_simd_clauses.exists ()
- || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
+ || !vec_safe_is_empty (parser->cilk_simd_fn_tokens)
+ || parser->simd_attr_present)
c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
   omp_declare_simd_clauses);
  c_parser_skip_to_end_of_block_or_statement (parser);
@@ -1796,7 +1800,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  if (!d)
d = error_mark_node;
  if (omp_declare_simd_clauses.exists ()
- || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
+ || !vec_safe_is_empty (parser->cilk_simd_fn_tokens)
+ || parser->simd_attr_present)
c_finish_omp_declare_simd (parser, d, NULL_TREE,
   omp_declare_simd_clauses);
}
@@ -1809,7 +1814,8 @@ c_parser_declaration_or_fndef

1 2 >

1 - 100 of 124 matches

Mail list logo