[RFC] Add OPTGROUP_PAR

2015-10-19 Thread Tom de Vries

Hi,

this patch adds OPTGROUP_PAR.

It allows a user to see on stderr what loops are parallelized by 
pass_parallelize_loops, using -fopt-info-par:

...
$ gcc -O2 -fopt-info-par test.c -ftree-parallelize-loops=32
test.c:5:3: note: parallelized inner loop
...

This patch doesn't include any MSG_MISSED_OPTIMIZATION/MSG_NOTE messages 
yet.


Idea of the patch OK?

Any other comments?

Thanks,
- Tom
Add OPTGROUP_PAR

2015-10-19  Tom de Vries  

	* doc/invoke.texi (@item -fopt-info): Add @item par in group of
	optimizations table.
	* dumpfile.c (optgroup_options): Add OPTGROUP_PAR entry.
	* dumpfile.h (OPTGROUP_PAR): New define.
	(OPTGROUP_OTHER): Renumber.
	(OPTGROUP_ALL): Add OPTGROUP_PAR.
	* tree-parloops.c (parallelize_loops): Handle -fopt-info-par.
	(pass_data_parallelize_loops): Change optinfo_flags from OPTGROUP_LOOP
	to OPTGROUP_PAR.
---
 gcc/doc/invoke.texi |  2 ++
 gcc/dumpfile.c  |  1 +
 gcc/dumpfile.h  |  5 +++--
 gcc/tree-parloops.c | 16 ++--
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 54e9f12..629ee37 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7319,6 +7319,8 @@ Enable dumps from all loop optimizations.
 Enable dumps from all inlining optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
+@item par
+Enable dumps from all auto-parallelization optimizations.
 @item optall
 Enable dumps from all optimizations. This is a superset of
 the optimization groups listed above.
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index e4c4748..421d19b 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -138,6 +138,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
   {"vec", OPTGROUP_VEC},
+  {"par", OPTGROUP_PAR},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
 };
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 5f30077..52371f4 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,9 +97,10 @@ enum tree_dump_index
 #define OPTGROUP_LOOP(1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE  (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER   (1 << 5)   /* All other passes */
+#define OPTGROUP_PAR	 (1 << 5)   /* Auto-parallelization passes */
+#define OPTGROUP_OTHER   (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	 (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-  | OPTGROUP_VEC | OPTGROUP_OTHER)
+			  | OPTGROUP_VEC | OPTGROUP_PAR | OPTGROUP_OTHER)
 
 /* Define a tree dump switch.  */
 struct dump_file_info
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index c7aa62c..e98c2c7 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2718,17 +2718,21 @@ parallelize_loops (void)
 
   changed = true;
   skip_loop = loop->inner;
+  const char *loop_describe = (loop->inner
+   ? "outer"
+   : "inner");
+  loop_loc = find_loop_location (loop);
   if (dump_file && (dump_flags & TDF_DETAILS))
   {
-	if (loop->inner)
-	  fprintf (dump_file, "parallelizing outer loop %d\n",loop->header->index);
-	else
-	  fprintf (dump_file, "parallelizing inner loop %d\n",loop->header->index);
-	loop_loc = find_loop_location (loop);
+	fprintf (dump_file, "parallelizing %s loop %d\n", loop_describe,
+		 loop->header->index);
 	if (loop_loc != UNKNOWN_LOCATION)
 	  fprintf (dump_file, "\nloop at %s:%d: ",
 		   LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
   }
+  if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loop_loc,
+			 "parallelized %s loop\n", loop_describe);
   gen_parallel_loop (loop, _list,
 			 n_threads, _desc);
 }
@@ -2752,7 +2756,7 @@ const pass_data pass_data_parallelize_loops =
 {
   GIMPLE_PASS, /* type */
   "parloops", /* name */
-  OPTGROUP_LOOP, /* optinfo_flags */
+  OPTGROUP_PAR, /* optinfo_flags */
   TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
   ( PROP_cfg | PROP_ssa ), /* properties_required */
   0, /* properties_provided */
-- 
1.9.1



Re: [PATCH 7/7] Libsanitizer merge from upstream r249633.

2015-10-19 Thread Jakub Jelinek
On Fri, Oct 16, 2015 at 02:29:08PM +0300, Maxim Ostapenko wrote:
> On 14/10/15 15:12, Jakub Jelinek wrote:
> >On Wed, Oct 14, 2015 at 03:02:22PM +0300, Maxim Ostapenko wrote:
> >>On 14/10/15 14:06, Jakub Jelinek wrote:
> >>>On Wed, Oct 14, 2015 at 01:51:44PM +0300, Maxim Ostapenko wrote:
> Ok, got it. The first solution would require changes in libsanitizer 
> because
> heuristic doesn't work for GCC, so perhaps new UBSan entry point should go
> upstream, right? Or this may be implemented as local patch for GCC?
> >>>No.  The heuristics relies on:
> >>>1) either it is old style float cast overflow without location
> >>>2) or it is new style float cast with location, but the location must:
> >>>a) not have NULL filename
> >>>b) the filename must not be ""
> >>>c) the filename must not be "\1"
> >>>So, my proposal was to emit in GCC the old style float cast overflow if 
> >>>a), b) or
> >>>c) is true, otherwise the new style.  I have no idea what you mean by
> >>>heuristic doesn't work for GCC after that.
> >>I mean that there are some cases where (FilenameOrTypeDescriptor[0] +
> >>FilenameOrTypeDescriptor[1] < 2) is not sufficient to determine if we should
> >>use old style. I actually caught this on float-cast-overflow-10.c testcase.
> >Ah, ok, in that case the heuristics is flawed.  If they want to keep it,
> >they should check if MaybeFromTypeKind is either < 2 or equal to 0x1fe.
> >Can you report it upstream?  If that is changed, we'd need to change the
> >above and also add
> >   d) the filename must not start with "\xff\xff"
> >to the rules.
> >
> >I think it would be better to just add a whole new entrypoint, but if they
> >think the heuristics is good enough, they should at least fix it up.
> >
> > Jakub
> >
> 
> Done. I've realized that we could just set loc to input_location if loc ==
> UNKNOWN_LOCATION. In this case, we always would have new style. This would

While using input_location in this case (as it is invoked from the FEs)
might help sometimes, it still doesn't guarantee input_location will not
be UNKNOWN_LOCATION afterwards, or builtin location, or b), c) or d) above.

Plus there is no fix on the library side to the heuristics, which we need
anyway.

Jakub


Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Eric Botcazou
> Why is Ada fliddling with the modes? Is it only for packed structures?

Yes, in Ada packing or representation clauses are allowed to modify the type 
of components, so you can have e.g. a record type with size S1 and BLKmode and 
fields of this type with a packed version of this record type (with size S2 I was wondering how to produce VCE convesions of aggregates with C frontend
> at all (that is getting them synthetized by the middle-end) to get non-ada
> testcases.  Storing through union is never folded to one and I don't see
> any other obvious way of getting them.  Perhaps it may be possible to get
> them via inliner on incompatible parameter and LTO, but that seems to be
> the only case I can think of right now.

That makes sense, all the machinery implementing type fiddling for the Ada 
compiler is in gigi, not in stor-layout.c for example.

> I am testing the change to compare modes and revert the two expr.c changes.
> Lets see what is Richard's opinion. The whole concept of modes on aggregate
> types is bit funny post-tree-ssa days when we do SRA. I suppose they may be
> tied to calling conventions but should no longer be needed for code quality?

Ideally it should not be tied to calling conventions either, but it is known 
that some back-ends still use it for this purpose.

-- 
Eric Botcazou


Re: [PATCH 2/7] Libsanitizer merge from upstream r249633.

2015-10-19 Thread Jakub Jelinek
On Thu, Oct 15, 2015 at 01:34:06PM +0300, Maxim Ostapenko wrote:
> Ah, right, fixing this now. Does this looks better now?

Yes, it is ok now.

> 2015-10-12  Maxim Ostapenko  
> 
> config/
> 
>   * bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with
>   LSAN_OPTIONS=detect_leaks.
> 
> gcc/
> 
>   * asan.c (asan_emit_stack_protection): Don't pass local stack to
>   asan_stack_malloc_[n] anymore. Check if asan_stack_malloc_[n] returned
>   NULL and use local stack than.
>   (asan_finish_file): Insert __asan_version_mismatch_check_v[n] call
>   in addition to __asan_init.
>   * sanitizer.def (BUILT_IN_ASAN_INIT): Rename to __asan_init.
>   (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK): Add new builtin call.
> 
> gcc/testsuite/
> 
>   g++.dg/asan/default-options-1.C: Adjust testcase.

Jakub


Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Eric Botcazou
> Adding back the mode check is fine if all types with the same TYPE_CANONICAL
> have the same mode.  Otherwise we'd regress here.

It's true for the Ada compiler, the type fiddling machinery always resets it.

-- 
Eric Botcazou


Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
 wrote:
> This patch adds a pass that collects information that is common to all
> uses of an SSA name X and back-propagates that information up the statements
> that generate X.  The general idea is to use the information to simplify
> instructions (rather than a pure DCE) so I've simply called it
> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>
> At the moment the only use of the pass is to remove unnecessry sign
> operations, so that it's effectively a global version of
> fold_strip_sign_ops.  I'm hoping it could be extended in future to
> record which bits of an integer are significant.  There are probably
> other potential uses too.
>
> A later patch gets rid of fold_strip_sign_ops.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
> * common.opt (ftree-backprop): New option.
> * fold-const.h (negate_mathfn_p): Declare.
> * fold-const.c (negate_mathfn_p): Make public.
> * timevar.def (TV_TREE_BACKPROP): New.
> * tree-passes.h (make_pass_backprop): Declare.
> * passes.def (pass_backprop): Add.
> * tree-ssa-backprop.c: New file.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New tests.
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 783e4c9..69e669d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1445,6 +1445,7 @@ OBJS = \
> tree-switch-conversion.o \
> tree-ssa-address.o \
> tree-ssa-alias.o \
> +   tree-ssa-backprop.o \
> tree-ssa-ccp.o \
> tree-ssa-coalesce.o \
> tree-ssa-copy.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5060208..5aef625 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2364,6 +2364,10 @@ ftree-pta
>  Common Report Var(flag_tree_pta) Optimization
>  Perform function-local points-to analysis on trees.
>
> +ftree-backprop
> +Common Report Var(flag_tree_backprop) Init(1) Optimization
> +Enable backward propagation of use properties at the tree level.

Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
is good, but let's go for it.

> +
>  ftree-reassoc
>  Common Report Var(flag_tree_reassoc) Init(1) Optimization
>  Enable reassociation on tree level
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 54e9f12..fe15d08 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -343,6 +343,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fdump-tree-dse@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-phiopt@r{[}-@var{n}@r{]} @gol
> +-fdump-tree-backprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-forwprop@r{[}-@var{n}@r{]} @gol
>  -fdump-tree-nrv -fdump-tree-vect @gol
>  -fdump-tree-sink @gol
> @@ -451,8 +452,8 @@ Objective-C and Objective-C++ Dialects}.
>  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
>  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
> --ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> --ftree-loop-if-convert-stores -ftree-loop-im @gol
> +-ftree-dse -ftree-backprop -ftree-forwprop -ftree-fre @gol
> +-ftree-loop-if-convert -ftree-loop-if-convert-stores -ftree-loop-im @gol
>  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>  -ftree-loop-vectorize @gol
> @@ -7236,6 +7237,12 @@ name is made by appending @file{.dse} to the source 
> file name.
>  Dump each function after optimizing PHI nodes into straightline code.  The 
> file
>  name is made by appending @file{.phiopt} to the source file name.
>
> +@item backprop
> +@opindex fdump-tree-backprop
> +Dump each function after back-propagating use information up the definition
> +chain.  The file name is made by appending @file{.backprop} to the
> +source file name.
> +
>  @item forwprop
>  @opindex fdump-tree-forwprop
>  Dump each function after forward propagating single use variables.  The file
> @@ -7716,6 +7723,7 @@ compilation time.
>  -ftree-dce @gol
>  -ftree-dominator-opts @gol
>  -ftree-dse @gol
> +-ftree-backprop @gol
>  -ftree-forwprop @gol
>  -ftree-fre @gol
>  -ftree-phiprop @gol
> @@ -8658,6 +8666,13 @@ enabled by default at @option{-O2} and @option{-O3}.
>  Make partial redundancy elimination (PRE) more aggressive.  This flag is
>  enabled by default at @option{-O3}.
>
> +@item -ftree-backprop
> +@opindex ftree-backprop
> +Propagate 

Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Marc Glisse

+/* Fold X + (X / CST) * -CST to X % CST.  */

This one is still wrong. It is extremely similar to X-(X/CST)*CST, and the 
current version of that one in match.pd is broken, we should fix that one 
first.


+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1))
+  (if (! FLOAT_TYPE_P (type))
+   (minus (bit_xor @0 @1) @1)))

I don't understand the point of the FLOAT_TYPE_P check.

Will we also simplify (A & B) - (A & ~B) into B - (A ^ B) ?

+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1))
+ (if (! FLOAT_TYPE_P (type)
+  && wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))

TREE_TYPE (type) ???

+  (minus (bit_xor @0 @1) @1)))

(just a random comment, not for your patch)
When we generalize this to vector, should that be:
operand_equal_p (const_unop (BIT_NOT_EXPR, type, @2), @1, OEP_ONLY_CONST)
or maybe
integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))
?

+/* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
+  (bit_xor @0 @1))

:c on bit_ior? It should also allow you to merge the 2 CST versions into 
one.


+ (bit_ior (bit_and:c INTEGER_CST@0 (bit_not @1)) (bit_and:c (bit_not 
INTEGER_CST@2) @1))


gcc always puts the constant last in bit_and, so
(bit_and (bit_not @1) INTEGER_CST@0)

You still have a (bit_not INTEGER_CST@2)...

-/* X & !X -> 0.  */
+/* X & !X or X & ~X -> 0.  */
 (simplify
  (bit_and:c @0 (logical_inverted_value @0))
- { build_zero_cst (type); })
+  { build_zero_cst (type); })
 /* X | !X and X ^ !X -> 1, , if X is truth-valued.  */
 (for op (bit_ior bit_xor)
  (simplify

I think that was already in your other patch, and I am not really in favor 
of the indentation change (or the comment).


--
Marc Glisse


[vec-cmp, patch 7/6] Vector comparison enabling in SLP

2015-10-19 Thread Ilya Enkovich
Hi,

It appeared our testsuite doesn't have a test which would require vector 
comparison support in SLP even after boolean pattern disabling.  This patch 
adds such tests and allow comparison for SLP.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_build_slp_tree_1): Allow
comparison statements.
(vect_get_constant_vectors): Support boolean vector
constants.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.dg/vect/slp-cond-5.c: New test.

diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c 
b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
new file mode 100644
index 000..5ade7d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
@@ -0,0 +1,81 @@
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+
+#define N 128
+
+static inline int
+foo (int x, int y, int a, int b)
+{
+  if (x >= y && a > b)
+return a;
+  else
+return b;
+}
+
+__attribute__((noinline, noclone)) void
+bar (int * __restrict__ a, int * __restrict__ b,
+ int * __restrict__ c, int * __restrict__ d,
+ int * __restrict__ e, int w)
+{
+  int i;
+  for (i = 0; i < N/16; i++, a += 16, b += 16, c += 16, d += 16, e += 16)
+{
+  e[0] = foo (c[0], d[0], a[0] * w, b[0] * w);
+  e[1] = foo (c[1], d[1], a[1] * w, b[1] * w);
+  e[2] = foo (c[2], d[2], a[2] * w, b[2] * w);
+  e[3] = foo (c[3], d[3], a[3] * w, b[3] * w);
+  e[4] = foo (c[4], d[4], a[4] * w, b[4] * w);
+  e[5] = foo (c[5], d[5], a[5] * w, b[5] * w);
+  e[6] = foo (c[6], d[6], a[6] * w, b[6] * w);
+  e[7] = foo (c[7], d[7], a[7] * w, b[7] * w);
+  e[8] = foo (c[8], d[8], a[8] * w, b[8] * w);
+  e[9] = foo (c[9], d[9], a[9] * w, b[9] * w);
+  e[10] = foo (c[10], d[10], a[10] * w, b[10] * w);
+  e[11] = foo (c[11], d[11], a[11] * w, b[11] * w);
+  e[12] = foo (c[12], d[12], a[12] * w, b[12] * w);
+  e[13] = foo (c[13], d[13], a[13] * w, b[13] * w);
+  e[14] = foo (c[14], d[14], a[14] * w, b[14] * w);
+  e[15] = foo (c[15], d[15], a[15] * w, b[15] * w);
+}
+}
+
+
+int a[N], b[N], c[N], d[N], e[N];
+
+int main ()
+{
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+{
+  a[i] = i;
+  b[i] = 5;
+  e[i] = 0;
+
+  switch (i % 9)
+{
+case 0: asm (""); c[i] = i; d[i] = i + 1; break;
+case 1: c[i] = 0; d[i] = 0; break;
+case 2: c[i] = i + 1; d[i] = i - 1; break;
+case 3: c[i] = i; d[i] = i + 7; break;
+case 4: c[i] = i; d[i] = i; break;
+case 5: c[i] = i + 16; d[i] = i + 3; break;
+case 6: c[i] = i - 5; d[i] = i; break;
+case 7: c[i] = i; d[i] = i; break;
+case 8: c[i] = i; d[i] = i - 7; break;
+}
+}
+
+  bar (a, b, c, d, e, 2);
+  for (i = 0; i < N; i++)
+if (e[i] != ((i % 3) == 0 || i <= 5 ? 10 : 2 * i))
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { i?86-*-* x86_64-*-* } } } } */
+
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1424123..fa8291e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -827,6 +827,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  if (TREE_CODE_CLASS (rhs_code) != tcc_binary
  && TREE_CODE_CLASS (rhs_code) != tcc_unary
  && TREE_CODE_CLASS (rhs_code) != tcc_expression
+ && TREE_CODE_CLASS (rhs_code) != tcc_comparison
  && rhs_code != CALL_EXPR)
{
  if (dump_enabled_p ())
@@ -2596,7 +2597,14 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
   struct loop *loop;
   gimple_seq ctor_seq = NULL;
 
-  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+  /* Check if vector type is a boolean vector.  */
+  if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+  && (VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_vinfo))
+ || (code == COND_EXPR && op_num < 2)))
+vector_type
+  = build_same_sized_truth_vector_type (STMT_VINFO_VECTYPE (stmt_vinfo));
+  else
+vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
@@ -2768,8 +2776,21 @@ vect_get_constant_vectors (tree op, slp_tree slp_node,
{
  if (CONSTANT_CLASS_P (op))
{
- op = fold_unary (VIEW_CONVERT_EXPR,
-  TREE_TYPE (vector_type), op);
+ if (VECTOR_BOOLEAN_TYPE_P (vector_type))
+   {
+ /* Can't use VIEW_CONVERT_EXPR for booleans because
+of possibly different sizes of scalar value and
+vector element.  */
+ if (integer_zerop (op))
+   op = build_int_cst (TREE_TYPE (vector_type), 0);
+ else 

Re: [PATCH] PR target/67995: __attribute__ ((target("arch=XXX"))) enables unsupported ISA

2015-10-19 Thread Uros Bizjak
On Sat, Oct 17, 2015 at 12:38 AM, H.J. Lu  wrote:
> When processing __attribute__ ((target("arch=XXX"))), we should clear
> the ISA bits in x_ix86_isa_flags first to avoid leaking ISA from
> command line.
>
> Tested on x86-64.  OK for trunk?

OK.

Thanks,
Uros.

> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/67995
> * config/i386/i386.c (ix86_valid_target_attribute_tree): If
> arch= is set,  clear all bits in x_ix86_isa_flags, except for
> ISA_64BIT, ABI_64, ABI_X32, and CODE16.
>
> gcc/testsuite/
>
> PR target/67995
> * gcc.target/i386/pr67995-1.c: New test.
> * gcc.target/i386/pr67995-2.c: Likewise.
> * gcc.target/i386/pr67995-3.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 13 -
>  gcc/testsuite/gcc.target/i386/pr67995-1.c | 16 
>  gcc/testsuite/gcc.target/i386/pr67995-2.c | 16 
>  gcc/testsuite/gcc.target/i386/pr67995-3.c | 16 
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67995-3.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d0e1f4c..b0281c9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -6145,7 +6145,18 @@ ix86_valid_target_attribute_tree (tree args,
>/* If we are using the default tune= or arch=, undo the string 
> assigned,
>  and use the default.  */
>if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH])
> -   opts->x_ix86_arch_string = 
> option_strings[IX86_FUNCTION_SPECIFIC_ARCH];
> +   {
> + opts->x_ix86_arch_string
> +   = option_strings[IX86_FUNCTION_SPECIFIC_ARCH];
> +
> + /* If arch= is set,  clear all bits in x_ix86_isa_flags,
> +except for ISA_64BIT, ABI_64, ABI_X32, and CODE16.  */
> + opts->x_ix86_isa_flags &= (OPTION_MASK_ISA_64BIT
> +| OPTION_MASK_ABI_64
> +| OPTION_MASK_ABI_X32
> +| OPTION_MASK_CODE16);
> +
> +   }
>else if (!orig_arch_specified)
> opts->x_ix86_arch_string = NULL;
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-1.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-1.c
> new file mode 100644
> index 000..072b1fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=haswell" } */
> +
> +unsigned int
> +__attribute__ ((target("arch=core2")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step() == 0) /* { dg-error "needs isa 
> option" } */
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-2.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-2.c
> new file mode 100644
> index 000..632bb63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=core2" } */
> +
> +unsigned int
> +__attribute__ ((target("arch=haswell")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step() == 0)
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr67995-3.c 
> b/gcc/testsuite/gcc.target/i386/pr67995-3.c
> new file mode 100644
> index 000..11993b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67995-3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=core2" } */
> +
> +unsigned int
> +__attribute__ ((target("rdrnd")))
> +__x86_rdrand(void)
> +{
> +  unsigned int retries = 100;
> +  unsigned int val;
> +
> +  while (__builtin_ia32_rdrand32_step() == 0)
> +if (--retries == 0)
> +  return 0;
> +
> +  return val;
> +}
> --
> 2.4.3
>


Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Hurugalawadi, Naveen
Hi,

Please find attached the modified patch of duplicate patterns which were
posted in the earlier part.

Please review them and let me know if any further modifications are required.

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index de45a2c..b36e2f5 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9232,26 +9232,6 @@ fold_binary_loc (location_t loc,
   return NULL_TREE;
 
 case PLUS_EXPR:
-  if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
-	{
-	  /* X + (X / CST) * -CST is X % CST.  */
-	  if (TREE_CODE (arg1) == MULT_EXPR
-	  && TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
-	  && operand_equal_p (arg0,
-  TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0))
-	{
-	  tree cst0 = TREE_OPERAND (TREE_OPERAND (arg1, 0), 1);
-	  tree cst1 = TREE_OPERAND (arg1, 1);
-	  tree sum = fold_binary_loc (loc, PLUS_EXPR, TREE_TYPE (cst1),
-  cst1, cst0);
-	  if (sum && integer_zerop (sum))
-		return fold_convert_loc (loc, type,
-	 fold_build2_loc (loc, TRUNC_MOD_EXPR,
-		  TREE_TYPE (arg0), arg0,
-		  cst0));
-	}
-	}
-
   /* Handle (A1 * C1) + (A2 * C2) with A1, A2 or C1, C2 being the same or
 	 one.  Make sure the type is not saturating and has the signedness of
 	 the stripped operands, as fold_plusminus_mult_expr will re-associate.
@@ -9692,28 +9672,6 @@ fold_binary_loc (location_t loc,
 			fold_convert_loc (loc, type,
 	  TREE_OPERAND (arg0, 0)));
 
-  if (! FLOAT_TYPE_P (type))
-	{
-	  /* Fold (A & ~B) - (A & B) into (A ^ B) - B, where B is
-	 any power of 2 minus 1.  */
-	  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && TREE_CODE (arg1) == BIT_AND_EXPR
-	  && operand_equal_p (TREE_OPERAND (arg0, 0),
-  TREE_OPERAND (arg1, 0), 0))
-	{
-	  tree mask0 = TREE_OPERAND (arg0, 1);
-	  tree mask1 = TREE_OPERAND (arg1, 1);
-	  tree tem = fold_build1_loc (loc, BIT_NOT_EXPR, type, mask0);
-
-	  if (operand_equal_p (tem, mask1, 0))
-		{
-		  tem = fold_build2_loc (loc, BIT_XOR_EXPR, type,
- TREE_OPERAND (arg0, 0), mask1);
-		  return fold_build2_loc (loc, MINUS_EXPR, type, tem, mask1);
-		}
-	}
-	}
-
   /* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
 	 __complex__ ( x, -y ).  This is not the same for SNaNs or if
 	 signed zeros are involved.  */
@@ -10013,28 +9971,6 @@ fold_binary_loc (location_t loc,
 arg1);
 	}
 
-  /* (X & ~Y) | (~X & Y) is X ^ Y */
-  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && TREE_CODE (arg1) == BIT_AND_EXPR)
-{
-	  tree a0, a1, l0, l1, n0, n1;
-
-	  a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
-	  a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
-
-	  l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
-	  l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
-	  
-	  n0 = fold_build1_loc (loc, BIT_NOT_EXPR, type, l0);
-	  n1 = fold_build1_loc (loc, BIT_NOT_EXPR, type, l1);
-	  
-	  if ((operand_equal_p (n0, a0, 0)
-	   && operand_equal_p (n1, a1, 0))
-	  || (operand_equal_p (n0, a1, 0)
-		  && operand_equal_p (n1, a0, 0)))
-	return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1);
-	}
-
   /* See if this can be simplified into a rotate first.  If that
 	 is unsuccessful continue in the association code.  */
   goto bit_rotate;
diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..5ee345e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -324,6 +324,42 @@ along with GCC; see the file COPYING3.  If not see
 (if (real_isinteger (_REAL_CST (@1), ) && (n & 1) == 0)
  (pows @0 @1))
 
+/* Fold X + (X / CST) * -CST to X % CST.  */
+(simplify
+ (plus (convert? @0) (convert? (mult (trunc_div @0 @1) (negate @1
+  (if (INTEGRAL_TYPE_P (type)
+   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (trunc_mod (convert @0) (convert @1
+(simplify
+ (plus (convert? @0) (convert? (mult (trunc_div @0 INTEGER_CST@1) INTEGER_CST@2)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+   && wi::add (@1, @2) == 0)
+   (trunc_mod (convert @0) (convert @1
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+(simplify
+ (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1))
+  (if (! FLOAT_TYPE_P (type))
+   (minus (bit_xor @0 @1) @1)))
+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1))
+ (if (! FLOAT_TYPE_P (type)
+  && wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))
+  (minus (bit_xor @0 @1) @1)))
+
+/* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
+(simplify
+ (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
+  (bit_xor @0 @1))
+(simplify
+ (bit_ior (bit_and:c @0 INTEGER_CST@2) (bit_and:c (bit_not @0) INTEGER_CST@1))
+  (if (wi::eq_p (const_unop (BIT_NOT_EXPR, TREE_TYPE (type), @2), @1))
+   (bit_xor @0 @1)))
+(simplify
+ (bit_ior (bit_and:c INTEGER_CST@0 (bit_not @1)) (bit_and:c (bit_not INTEGER_CST@2) @1))
+  (if (wi::eq_p (const_unop (BIT_NOT_EXPR, 

Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Hurugalawadi, Naveen
Hi,

>> That's not what Richard meant. We already have:

Done. As per the comments.

Please find attached the modified patch as per your comments.

Please review them and let me know if any further modifications are required.

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index de45a2c..1e7fbb4 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9803,20 +9803,6 @@ fold_binary_loc (location_t loc,
   goto associate;
 
 case MULT_EXPR:
-  /* (-A) * (-B) -> A * B  */
-  if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg0, 0)),
-			fold_convert_loc (loc, type,
-	  negate_expr (arg1)));
-  if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_convert_loc (loc, type,
-	  negate_expr (arg0)),
-			fold_convert_loc (loc, type,
-	  TREE_OPERAND (arg1, 0)));
-
   if (! FLOAT_TYPE_P (type))
 	{
 	  /* Transform x * -C into -x * C if x is easily negatable.  */
@@ -9830,16 +9816,6 @@ fold_binary_loc (location_t loc,
 		  negate_expr (arg0)),
 tem);
 
-	  /* (a * (1 << b)) is (a << b)  */
-	  if (TREE_CODE (arg1) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg1, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op0,
-TREE_OPERAND (arg1, 1));
-	  if (TREE_CODE (arg0) == LSHIFT_EXPR
-	  && integer_onep (TREE_OPERAND (arg0, 0)))
-	return fold_build2_loc (loc, LSHIFT_EXPR, type, op1,
-TREE_OPERAND (arg0, 1));
-
 	  /* (A + A) * C -> A * 2 * C  */
 	  if (TREE_CODE (arg0) == PLUS_EXPR
 	  && TREE_CODE (arg1) == INTEGER_CST
@@ -9882,21 +9858,6 @@ fold_binary_loc (location_t loc,
 	}
   else
 	{
-	  /* Convert (C1/X)*C2 into (C1*C2)/X.  This transformation may change
- the result for floating point types due to rounding so it is applied
- only if -fassociative-math was specify.  */
-	  if (flag_associative_math
-	  && TREE_CODE (arg0) == RDIV_EXPR
-	  && TREE_CODE (arg1) == REAL_CST
-	  && TREE_CODE (TREE_OPERAND (arg0, 0)) == REAL_CST)
-	{
-	  tree tem = const_binop (MULT_EXPR, TREE_OPERAND (arg0, 0),
-  arg1);
-	  if (tem)
-		return fold_build2_loc (loc, RDIV_EXPR, type, tem,
-TREE_OPERAND (arg0, 1));
-	}
-
   /* Strip sign operations from X in X*X, i.e. -Y*-Y -> Y*Y.  */
 	  if (operand_equal_p (arg0, arg1, 0))
 	{
@@ -10053,22 +10014,6 @@ fold_binary_loc (location_t loc,
   goto bit_rotate;
 
 case BIT_AND_EXPR:
-  /* ~X & X, (X == 0) & X, and !X & X are always zero.  */
-  if ((TREE_CODE (arg0) == BIT_NOT_EXPR
-	   || TREE_CODE (arg0) == TRUTH_NOT_EXPR
-	   || (TREE_CODE (arg0) == EQ_EXPR
-	   && integer_zerop (TREE_OPERAND (arg0, 1
-	  && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
-	return omit_one_operand_loc (loc, type, integer_zero_node, arg1);
-
-  /* X & ~X , X & (X == 0), and X & !X are always zero.  */
-  if ((TREE_CODE (arg1) == BIT_NOT_EXPR
-	   || TREE_CODE (arg1) == TRUTH_NOT_EXPR
-	   || (TREE_CODE (arg1) == EQ_EXPR
-	   && integer_zerop (TREE_OPERAND (arg1, 1
-	  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
-	return omit_one_operand_loc (loc, type, integer_zero_node, arg0);
-
   /* Fold (X ^ 1) & 1 as (X & 1) == 0.  */
   if (TREE_CODE (arg0) == BIT_XOR_EXPR
 	  && INTEGRAL_TYPE_P (type)
diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..04b6138 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -324,6 +324,27 @@ along with GCC; see the file COPYING3.  If not see
 (if (real_isinteger (_REAL_CST (@1), ) && (n & 1) == 0)
  (pows @0 @1))
 
+/* Fold (a * (1 << b)) into (a << b)  */
+(simplify
+ (mult:c @0 (convert? (lshift integer_onep@1 @2)))
+  (if (! FLOAT_TYPE_P (type)
+   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+   (lshift @0 @2)))
+
+/* Fold (C1/X)*C2 into (C1*C2)/X.  */
+(simplify
+ (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2)
+  (if (flag_associative_math)
+   (with
+{ tree tem = const_binop (MULT_EXPR, type, @0, @2); }
+(if (tem)
+ (rdiv { tem; } @1)
+
+/* Simplify ~X & X as zero.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_not @0)))
+  { build_zero_cst (type); })
+
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
@@ -543,6 +564,13 @@ along with GCC; see the file COPYING3.  If not see
 (match negate_expr_p
  VECTOR_CST
  (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type
+
+/* (-A) * (-B) -> A * B  */
+(simplify
+ (mult:c (convert1? (negate @0)) (convert2? negate_expr_p@1))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+   (mult (convert @0) (convert (negate @1)
  
 /* -(A + B) -> (-B) - A.  */
 (simplify
@@ -629,6 +657,8 @@ along with GCC; see the file COPYING3.  If not see
   (truth_not @0))
 

Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 4:12 AM, Uros Bizjak  wrote:
> On Mon, Oct 19, 2015 at 1:12 PM, H.J. Lu  wrote:
>> On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
>>> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
 Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
 is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
 vector natural alignment to check misaligned vector move.

 OK for trunk?

 Thanks.

 H.J.
 ---
 * config/i386/i386.c (ix86_expand_vector_move): Use
 GET_MODE_BITSIZE to get vector natural alignment.
 ---
  gcc/config/i386/i386.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index ebe2b0a..d0e1f4c 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -18650,7 +18650,9 @@ void
  ix86_expand_vector_move (machine_mode mode, rtx operands[])
  {
rtx op0 = operands[0], op1 = operands[1];
 -  unsigned int align = GET_MODE_ALIGNMENT (mode);
 +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
 + biggest alignment is 4 byte for IA MCU psABI.  */
 +  unsigned int align = GET_MODE_BITSIZE (mode);
>>>
>>> How about using TARGET_IAMCU condition here and using bitsize only for
>>> TARGET_IAMCU?
>>>
>>
>> Works for me.  Is it OK with that change?
>
> Yes.
>

This is what I checked in.

Thanks.

-- 
H.J.
---
[PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE for IA MCU
psABI to get vector natural alignment to check misaligned vector move.

* config/i386/i386.c (ix86_expand_vector_move): Use
GET_MODE_BITSIZE for IA MCU psABI to get vector natural
alignment.
---
 gcc/config/i386/i386.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1049455..a4f4b6f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18645,7 +18645,11 @@ void
 ix86_expand_vector_move (machine_mode mode, rtx operands[])
 {
   rtx op0 = operands[0], op1 = operands[1];
-  unsigned int align = GET_MODE_ALIGNMENT (mode);
+  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
+ psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
+  unsigned int align = (TARGET_IAMCU
+ ? GET_MODE_BITSIZE (mode)
+ : GET_MODE_ALIGNMENT (mode));

   if (push_operand (op0, VOIDmode))
 op0 = emit_move_resolve_push (mode, op0);
-- 
2.4.3


Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-10-19 Thread Richard Biener
On Fri, Oct 16, 2015 at 5:25 PM, Alan Lawrence  wrote:
> This lets the vectorizer handle some simple strides expressed using left-shift
> rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
> been handled).
>
> This patch does *not* handle the general case of shifts - neither a[i << j]
> nor a[1 << i] will be handled; that would be a significantly bigger patch
> (probably duplicating or generalizing much of chrec_fold_multiply and
> chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only
> be applicable to machines with gather-load support.
>
> Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on 
> x86_64.
>
> Is this OK for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Handle some 
> LSHIFT_EXPRs
> as equivalent MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
> 
>  gcc/tree-scalar-evolution.c  | 18 +
>  2 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963.  */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> +  for (int i = 0; i < N; i++)
> +out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  check_vect ();
> +  for (int i = 0; i < 2*N; i++)
> +{
> +  in[i] = i;
> +  __asm__ volatile ("" : : : "memory");
> +}
> +  loop ();
> +  __asm__ volatile ("" : : : "memory");
> +  for (int i = 0; i < N; i++)
> +{
> +  if (out[i] != i*2 + 7)
> +   abort ();
> +}
> +  return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..e478b0e 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple 
> *at_stmt,
>break;
>
>  case MULT_EXPR:
> +case LSHIFT_EXPR:
> +  /* Handle Achrec2 = analyze_scalar_evolution (loop, rhs2);
>chrec1 = chrec_convert (type, chrec1, at_stmt);
>chrec2 = chrec_convert (type, chrec2, at_stmt);
>chrec1 = instantiate_parameters (loop, chrec1);
>chrec2 = instantiate_parameters (loop, chrec2);
> +  if (code == LSHIFT_EXPR)
> +   {
> + /* Do the shift in the larger size, as in e.g. (long) << (int)32,
> +we must do 1<<32 as a long or we'd overflow.  */

Err, you should always do the shift in the type of rhs1.  You should also
avoid the chrec_convert of rhs2 above for shifts.  I think globbing
shifts and multiplies together doesn't make the code any clearer.

Richard.

> + tree type = TREE_TYPE (chrec2);
> + if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type))
> +   type = TREE_TYPE (chrec1);
> + if (TYPE_PRECISION (type) == 0)
> +   {
> + res = chrec_dont_know;
> + break;
> +   }
> + chrec2 = fold_build2 (LSHIFT_EXPR, type,
> +   build_int_cst (type, 1),
> +   chrec2);
> +   }
>res = chrec_fold_multiply (type, chrec1, chrec2);
>break;
>
> --
> 1.9.1
>


[mask conversion, patch 1/2] Add pattern for mask conversions

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds a vectorization pattern which detects cases where mask 
conversion is needed and adds it.  It is done for all statements which may 
consume mask.  Some additional changes were made to support MASK_LOAD with 
pattern and allow scalar mode for vectype of pattern stmt.  It is applied on 
top of all other boolean vector series.  Does it look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* optabs.c (expand_binop_directly): Allow scalar mode for
vec_pack_trunc_optab.
* tree-vect-loop.c (vect_determine_vectorization_factor): Skip
boolean vector producers from pattern sequence when computing VF.
* tree-vect-patterns.c (vect_vect_recog_func_ptrs) Add
vect_recog_mask_conversion_pattern.
(search_type_for_mask): Choose the smallest
type if different size types are mixed.
(build_mask_conversion): New.
(vect_recog_mask_conversion_pattern): New.
(vect_pattern_recog_1): Allow scalar mode for boolean vectype.
* tree-vect-stmts.c (vectorizable_mask_load_store): Support masked
load with pattern.
(vectorizable_conversion): Support boolean vectors.
(free_stmt_vec_info): Allow patterns for statements with no lhs.
* tree-vectorizer.h (NUM_PATTERNS): Increase to 14.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index 83f4be3..8d61d33 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -1055,7 +1055,8 @@ expand_binop_directly (machine_mode mode, optab binoptab,
   /* The mode of the result is different then the mode of the
 arguments.  */
   tmp_mode = insn_data[(int) icode].operand[0].mode;
-  if (GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
+  if (VECTOR_MODE_P (mode)
+ && GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
{
  delete_insns_since (last);
  return NULL_RTX;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 14804b3..e388533 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -497,6 +497,17 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
}
 }
 
+ /* Boolean vectors don't affect VF.  */
+ if (VECTOR_BOOLEAN_TYPE_P (vectype))
+   {
+ if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+   {
+ pattern_def_seq = NULL;
+ gsi_next ();
+   }
+ continue;
+   }
+
  /* The vectorization factor is according to the smallest
 scalar type (or the largest vector size, but we only
 support one vector size per loop).  */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index a737129..34b1ea6 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -76,6 +76,7 @@ static gimple *vect_recog_mult_pattern (vec *,
 static gimple *vect_recog_mixed_size_cond_pattern (vec *,
  tree *, tree *);
 static gimple *vect_recog_bool_pattern (vec *, tree *, tree *);
+static gimple *vect_recog_mask_conversion_pattern (vec *, tree *, 
tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
@@ -89,7 +90,8 @@ static vect_recog_func_ptr 
vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_divmod_pattern,
vect_recog_mult_pattern,
vect_recog_mixed_size_cond_pattern,
-   vect_recog_bool_pattern};
+   vect_recog_bool_pattern,
+   vect_recog_mask_conversion_pattern};
 
 static inline void
 append_pattern_def_seq (stmt_vec_info stmt_info, gimple *stmt)
@@ -3180,7 +3182,7 @@ search_type_for_mask (tree var, vec_info *vinfo)
   enum vect_def_type dt;
   tree rhs1;
   enum tree_code rhs_code;
-  tree res = NULL;
+  tree res = NULL, res2;
 
   if (TREE_CODE (var) != SSA_NAME)
 return NULL;
@@ -3213,13 +3215,26 @@ search_type_for_mask (tree var, vec_info *vinfo)
 case BIT_AND_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
-  if (!(res = search_type_for_mask (rhs1, vinfo)))
-   res = search_type_for_mask (gimple_assign_rhs2 (def_stmt), vinfo);
+  res = search_type_for_mask (rhs1, vinfo);
+  res2 = search_type_for_mask (gimple_assign_rhs2 (def_stmt), vinfo);
+  if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION (res2)))
+   res = res2;
   break;
 
 default:
   if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
{
+ tree comp_vectype, mask_type;
+
+ comp_vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
+ if (comp_vectype == NULL_TREE)
+   return NULL;
+
+ mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1));
+ if (!mask_type
+ || !expand_vec_cmp_expr_p (comp_vectype, mask_type))
+   return NULL;
+
  if (TREE_CODE (TREE_TYPE (rhs1)) != 

Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread Uros Bizjak
On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
> vector natural alignment to check misaligned vector move.
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> * config/i386/i386.c (ix86_expand_vector_move): Use
> GET_MODE_BITSIZE to get vector natural alignment.
> ---
>  gcc/config/i386/i386.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ebe2b0a..d0e1f4c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -18650,7 +18650,9 @@ void
>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>  {
>rtx op0 = operands[0], op1 = operands[1];
> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
> + biggest alignment is 4 byte for IA MCU psABI.  */
> +  unsigned int align = GET_MODE_BITSIZE (mode);

How about using TARGET_IAMCU condition here and using bitsize only for
TARGET_IAMCU?

Uros.

>if (push_operand (op0, VOIDmode))
>  op0 = emit_move_resolve_push (mode, op0);
> --
> 2.4.3
>


Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Alan Modra
On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:
> A powerpc toolchain built with (or without) --enable-secureplt
> currently creates a binary that uses bss plt if
> 
> (1) any of the linked PIC objects have bss plt relocs
> (2) or all the linked objects are non-PIC or have no relocs,
> 
> because this is the binutils linker behaviour.
> 
> This patch passes --secure-plt to the linker which makes the linker
> warn in case (1) and produce a binary with secure plt in case (2).

The idea is OK I think, but

> @@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
>  %{R*} \
>  %(link_shlib) \
>  %{!T*: %(link_start) } \
> +%{!static: %(link_secure_plt_default)} \
>  %(link_os)"

this change needs to be conditional on !mbss-plt too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
>> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
>> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
>> vector natural alignment to check misaligned vector move.
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> ---
>> * config/i386/i386.c (ix86_expand_vector_move): Use
>> GET_MODE_BITSIZE to get vector natural alignment.
>> ---
>>  gcc/config/i386/i386.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index ebe2b0a..d0e1f4c 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -18650,7 +18650,9 @@ void
>>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>>  {
>>rtx op0 = operands[0], op1 = operands[1];
>> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
>> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
>> + biggest alignment is 4 byte for IA MCU psABI.  */
>> +  unsigned int align = GET_MODE_BITSIZE (mode);
>
> How about using TARGET_IAMCU condition here and using bitsize only for
> TARGET_IAMCU?
>

Works for me.  Is it OK with that change?


-- 
H.J.


Re: [PATCH] PR middle-end/68002: introduce -fkeep-static-functions

2015-10-19 Thread H.J. Lu
On Mon, Oct 19, 2015 at 2:18 AM, Richard Biener
 wrote:
> On Sat, Oct 17, 2015 at 5:17 PM, VandeVondele  Joost
>  wrote:
>> In some cases (e.g. coverage testing) it is useful to emit code for static 
>> functions even if they are never used, which currently is not possible at 
>> -O1 and above. The following patch introduces a flag for this, which 
>> basically triggers the same code that keeps those functions alive at -O0. 
>> Thanks to Marc Glisse for replying at gcc-help and for suggesting where to 
>> look.
>>
>> Bootstrapped and regtested on x86_64-unknown-linux-gnu
>>
>> OK for trunk ?
>
> Ok.

I checked in this as an obvious fix.

-- 
H.J.
---
Index: ChangeLog
===
--- ChangeLog (revision 228967)
+++ ChangeLog (working copy)
@@ -1,5 +1,9 @@
 2015-10-19  H.J. Lu  

+ * doc/invoke.texi: Replace @optindex with @opindex.
+
+2015-10-19  H.J. Lu  
+
  PR target/67995
  * config/i386/i386.c (ix86_valid_target_attribute_tree): If
  arch= is set,  clear all bits in x_ix86_isa_flags, except for
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 228967)
+++ doc/invoke.texi (working copy)
@@ -8014,7 +8014,7 @@ of its callers.  This switch does not af
 inline functions into the object file.

 @item -fkeep-static-functions
-@optindex fkeep-static-functions
+@opindex fkeep-static-functions
 Emit @code{static} functions into the object file, even if the function
 is never used.


[mask conversion, patch 2/2, i386] Add pack/unpack patterns for scalar masks

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds patterns to be used for vector masks pack/unpack for AVX512.   
Bootstrapped and tested on x86_64-unknown-linux-gnu.  Does it look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* config/i386/sse.md (HALFMASKMODE): New attribute.
(DOUBLEMASKMODE): New attribute.
(vec_pack_trunc_qi): New.
(vec_pack_trunc_): New.
(vec_unpacku_lo_hi): New.
(vec_unpacku_lo_si): New.
(vec_unpacku_lo_di): New.
(vec_unpacku_hi_hi): New.
(vec_unpacku_hi_): New.

gcc/testsuite/

2015-10-19  Ilya Enkovich  

* gcc.target/i386/mask-pack.c: New test.
* gcc.target/i386/mask-unpack.c: New test.


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 452629f..ed0eedc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -799,6 +799,14 @@
   [(V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t")
(V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")])
 
+;; Half mask mode for unpacks
+(define_mode_attr HALFMASKMODE
+  [(DI "SI") (SI "HI")])
+
+;; Double mask mode for packs
+(define_mode_attr DOUBLEMASKMODE
+  [(HI "SI") (SI "DI")])
+
 
 ;; Include define_subst patterns for instructions with mask
 (include "subst.md")
@@ -11578,6 +11586,23 @@
   DONE;
 })
 
+(define_expand "vec_pack_trunc_qi"
+  [(set (match_operand:HI 0 ("register_operand"))
+(ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 1 
("register_operand")))
+   (const_int 8))
+(zero_extend:HI (match_operand:QI 2 ("register_operand")]
+  "TARGET_AVX512F")
+
+(define_expand "vec_pack_trunc_"
+  [(set (match_operand: 0 ("register_operand"))
+(ior: (ashift: 
(zero_extend: (match_operand:SWI24 1 ("register_operand")))
+   (match_dup 3))
+(zero_extend: (match_operand:SWI24 2 
("register_operand")]
+  "TARGET_AVX512BW"
+{
+  operands[3] = GEN_INT (GET_MODE_BITSIZE (mode));
+})
+
 (define_insn "_packsswb"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x")
(vec_concat:VI1_AVX512
@@ -13474,12 +13499,42 @@
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
 
+(define_expand "vec_unpacku_lo_hi"
+  [(set (match_operand:QI 0 "register_operand")
+(subreg:QI (match_operand:HI 1 "register_operand") 0))]
+  "TARGET_AVX512DQ")
+
+(define_expand "vec_unpacku_lo_si"
+  [(set (match_operand:HI 0 "register_operand")
+(subreg:HI (match_operand:SI 1 "register_operand") 0))]
+  "TARGET_AVX512F")
+
+(define_expand "vec_unpacku_lo_di"
+  [(set (match_operand:SI 0 "register_operand")
+(subreg:SI (match_operand:DI 1 "register_operand") 0))]
+  "TARGET_AVX512BW")
+
 (define_expand "vec_unpacku_hi_"
   [(match_operand: 0 "register_operand")
(match_operand:VI124_AVX2_24_AVX512F_1_AVX512BW 1 "register_operand")]
   "TARGET_SSE2"
   "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;")
 
+(define_expand "vec_unpacku_hi_hi"
+  [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
+(lshiftrt:HI (match_operand:HI 1 "register_operand")
+ (const_int 8)))]
+  "TARGET_AVX512F")
+
+(define_expand "vec_unpacku_hi_"
+  [(set (subreg:SWI48x (match_operand: 0 "register_operand") 0)
+(lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand")
+ (match_dup 2)))]
+  "TARGET_AVX512BW"
+{
+  operands[2] = GEN_INT (GET_MODE_BITSIZE (mode));
+})
+
 ;
 ;;
 ;; Miscellaneous
diff --git a/gcc/testsuite/gcc.target/i386/mask-pack.c 
b/gcc/testsuite/gcc.target/i386/mask-pack.c
new file mode 100644
index 000..0b564ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mask-pack.c
@@ -0,0 +1,100 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O3 -fopenmp-simd -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */
+/* { dg-final { scan-assembler-not "maskmov" } } */
+
+#define LENGTH 1000
+
+long l1[LENGTH], l2[LENGTH];
+int i1[LENGTH], i2[LENGTH];
+short s1[LENGTH], s2[LENGTH];
+char c1[LENGTH], c2[LENGTH];
+double d1[LENGTH], d2[LENGTH];
+
+int test1 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (l1[i] > l2[i])
+  i1[i] = 1;
+}
+
+int test2 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (i1[i] > i2[i])
+  s1[i] = 1;
+}
+
+int test3 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (s1[i] > s2[i])
+  c1[i] = 1;
+}
+
+int test4 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+if (d1[i] > d2[i])
+  c1[i] = 1;
+}
+
+int test5 (int n)
+{
+  int i;
+  #pragma omp simd safelen(16)
+  for (i = 0; i < LENGTH; i++)
+i1[i] = 

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Marek Polacek
On Fri, Oct 16, 2015 at 02:07:51PM -1000, Jason Merrill wrote:
> On 10/16/2015 07:35 AM, Marek Polacek wrote:
> >>This code path seems to be for pushing a conversion down into a binary
> >>expression.  We shouldn't do this at all when we aren't folding.
> >
> >I tend to agree, but this case is tricky.  What's this code about is
> >e.g. for
> >
> >int
> >fn (long p, long o)
> >{
> >   return p + o;
> >}
> >
> >we want to narrow the operation and do the addition on unsigned ints and then
> >convert to int.  We do it here because we're still missing the
> >promotion/demotion pass on GIMPLE (PR45397 / PR47477).  Disabling this
> >optimization here would regress a few testcases, so I kept the code as it 
> >was.
> >Thoughts?
> 
> That makes sense, but please add a comment referring to one of those PRs and
> also add a note to the PR about this place.  OK with that change.
 
Done.  But I can't seem to commit the patch to the c++-delayed-folding
branch; is that somehow restricted?  I'm getting:

svn: E170001: Commit failed (details follow):
svn: E170001: Authorization failed
svn: E170001: Your commit message was left in a temporary file:
svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'

and I've checked out the branch using
svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/

> >Moreover, there are some places in the C++ FE where we still call
> >convert_to_integer and not convert_to_integer_nofold -- should they be
> >changed to the _nofold variant?
> 
> Not in build_base_path; its arithmetic is compiler generated and should
> really be delayed until genericize anyway.  Likewise for
> get_delta_difference.
> 
> I think the call in finish_omp_clauses could change.

All right, I'll submit a separate patch.  Thanks,

Marek


Re: Remove fold_strip_sign_ops

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:28 PM, Richard Sandiford
 wrote:
> This patch deletes fold_strip_sign_ops in favour of the tree-ssa-backprop.c
> pass.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?

Ok once the pass goes in.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * fold-const.h (fold_strip_sign_ops): Delete.
> * fold-const.c (fold_strip_sign_ops): Likewise.
> (fold_unary_loc, fold_binary_loc): Remove calls to it.
> * builtins.c (fold_builtin_cos, fold_builtin_cosh)
> (fold_builtin_ccos): Delete.
> (fold_builtin_pow): Don't call fold_strip_sign_ops.
> (fold_builtin_hypot, fold_builtin_copysign): Likewise.
> Remove fndecl argument.
> (fold_builtin_1): Update calls accordingly.  Handle constant
> cos, cosh, ccos and ccosh here.
>
> gcc/testsuite/
> * gcc.dg/builtins-86.c: XFAIL.
> * gcc.dg/torture/builtin-symmetric-1.c: Don't run at -O0.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index b4ac535..1e4ec35 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -160,8 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx);
>  static rtx expand_builtin_signbit (tree, rtx);
>  static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
> -static tree fold_builtin_cos (location_t, tree, tree, tree);
> -static tree fold_builtin_cosh (location_t, tree, tree, tree);
>  static tree fold_builtin_tan (tree, tree);
>  static tree fold_builtin_trunc (location_t, tree, tree);
>  static tree fold_builtin_floor (location_t, tree, tree);
> @@ -7688,77 +7686,6 @@ fold_builtin_cproj (location_t loc, tree arg, tree 
> type)
>return NULL_TREE;
>  }
>
> -/* Fold function call to builtin cos, cosf, or cosl with argument ARG.
> -   TYPE is the type of the return value.  Return NULL_TREE if no
> -   simplification can be made.  */
> -
> -static tree
> -fold_builtin_cos (location_t loc,
> - tree arg, tree type, tree fndecl)
> -{
> -  tree res, narg;
> -
> -  if (!validate_arg (arg, REAL_TYPE))
> -return NULL_TREE;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_cos, NULL, NULL, 0)))
> -return res;
> -
> -  /* Optimize cos(-x) into cos (x).  */
> -  if ((narg = fold_strip_sign_ops (arg)))
> -return build_call_expr_loc (loc, fndecl, 1, narg);
> -
> -  return NULL_TREE;
> -}
> -
> -/* Fold function call to builtin cosh, coshf, or coshl with argument ARG.
> -   Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_cosh (location_t loc, tree arg, tree type, tree fndecl)
> -{
> -  if (validate_arg (arg, REAL_TYPE))
> -{
> -  tree res, narg;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_cosh, NULL, NULL, 0)))
> -   return res;
> -
> -  /* Optimize cosh(-x) into cosh (x).  */
> -  if ((narg = fold_strip_sign_ops (arg)))
> -   return build_call_expr_loc (loc, fndecl, 1, narg);
> -}
> -
> -  return NULL_TREE;
> -}
> -
> -/* Fold function call to builtin ccos (or ccosh if HYPER is TRUE) with
> -   argument ARG.  TYPE is the type of the return value.  Return
> -   NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_ccos (location_t loc, tree arg, tree type, tree fndecl,
> -  bool hyper)
> -{
> -  if (validate_arg (arg, COMPLEX_TYPE)
> -  && TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) == REAL_TYPE)
> -{
> -  tree tmp;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((tmp = do_mpc_arg1 (arg, type, (hyper ? mpc_cosh : mpc_cos
> -   return tmp;
> -
> -  /* Optimize fn(-x) into fn(x).  */
> -  if ((tmp = fold_strip_sign_ops (arg)))
> -   return build_call_expr_loc (loc, fndecl, 1, tmp);
> -}
> -
> -  return NULL_TREE;
> -}
> -
>  /* Fold function call to builtin tan, tanf, or tanl with argument ARG.
> Return NULL_TREE if no simplification can be made.  */
>
> @@ -8174,10 +8101,9 @@ fold_builtin_bswap (tree fndecl, tree arg)
> NULL_TREE if no simplification can be made.  */
>
>  static tree
> -fold_builtin_hypot (location_t loc, tree fndecl,
> -   tree arg0, tree arg1, tree type)
> +fold_builtin_hypot (location_t loc, tree arg0, tree arg1, tree type)
>  {
> -  tree res, narg0, narg1;
> +  tree res;
>
>if (!validate_arg (arg0, REAL_TYPE)
>|| !validate_arg (arg1, REAL_TYPE))
> @@ -8187,16 +8113,6 @@ fold_builtin_hypot (location_t loc, tree fndecl,
>if ((res = do_mpfr_arg2 (arg0, arg1, type, mpfr_hypot)))
>  return res;
>
> -  /* If either argument to hypot has a negate or abs, strip that off.
> - E.g. hypot(-x,fabs(y)) -> hypot(x,y).  */
> -  narg0 = fold_strip_sign_ops (arg0);
> -  narg1 = 

Re: [PATCH 7/9] ENABLE_CHECKING refactoring: middle-end, LTO FE

2015-10-19 Thread Bernd Schmidt

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
-
-#ifdef ENABLE_CHECKING
-  verify_flow_info ();
-#endif
+ checking_verify_flow_info ();


This looks misindented.


-#ifdef ENABLE_CHECKING
cgraph_edge *e;
gcc_checking_assert (
!(e = caller->get_edge (call_stmt)) || e->speculative);
-#endif


While you're here, that would look nicer as
 gcc_checking_assert (!(e = caller->get_edge (call_stmt))
  || e->speculative);


-#ifdef ENABLE_CHECKING
-  if (check_same_comdat_groups)
+  if (CHECKING_P && check_same_comdat_groups)


flag_checking


-#ifdef ENABLE_CHECKING
-  struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g->bb);
-#endif
+  struct df_rd_bb_info *bb_info = flag_checking ? DF_RD_BB_INFO (g->bb)
+   : NULL;


I think no need to make that conditional, that's a bit too ugly.


+  if (CHECKING_P)
+   sparseset_set_bit (active_defs_check, regno);



+  if (CHECKING_P)
+sparseset_clear (active_defs_check);


> -#ifdef ENABLE_CHECKING
> -  active_defs_check = sparseset_alloc (max_reg_num ());
> -#endif

> +  if (CHECKING_P)
> +active_defs_check = sparseset_alloc (max_reg_num ());

> +  if (CHECKING_P)
> +sparseset_free (active_defs_check);

flag_checking. Lots of other occurrences, I'll mention some but not all 
but please fix them for consistency.



  void
  sem_item_optimizer::verify_classes (void)
  {
-#if ENABLE_CHECKING
+  if (!flag_checking)
+return;
+


Not entirely sure whether you want to wrap this into a 
checking_verify_classes instead so that it remains easily callable by 
the debugger?



+ if (flag_checking)
+   {
+ for (symtab_node *n = node->same_comdat_group;
+  n != node;
+  n = n->same_comdat_group)
+   /* If at least one of same comdat group functions is external,
+  all of them have to be, otherwise it is a front-end bug.  */
+   gcc_assert (DECL_EXTERNAL (n->decl));
+   }


Unnecessary set of braces.


diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 2986f57..941a829 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1591,7 +1591,7 @@ lra_assign (void)
bitmap_initialize (_spilled_pseudos, _obstack);
create_live_range_start_chains ();
setup_live_pseudos_and_spill_after_risky_transforms (_spilled_pseudos);
-#ifdef ENABLE_CHECKING
+#if CHECKING_P
if (!flag_ipa_ra)
  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0


Seems inconsistent, use flag_checking and no #if? Looks like the problem 
you're trying to solve is that a structure field exists only with 
checking, I think that could just be made available unconditionally - 
the struct is huge anyway.


As mentioned in the other mail, I see no value changing the #ifdefs to 
#ifs here or elsewhere in the patch.



-  check_rtl (false);
-#endif
+  if (flag_checking)
+check_rtl (/*final_p=*/false);


Lose the /*final_p=*/.


-#ifdef ENABLE_CHECKING
+#if CHECKING_P
  gcc_assert (!bitmap_bit_p (output, DECL_UID (node->decl)));
  bitmap_set_bit (output, DECL_UID (node->decl));
  #endif


Not entirely clear why this isn't using flag_checking.


  tree t = (*trees)[i];
-#ifdef ENABLE_CHECKING
- if (TYPE_P (t))
+ if (CHECKING_P && TYPE_P (t))
verify_type (t);
-#endif


flag_checking


@@ -14108,7 +14102,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
default:
break;
case OMP_CLAUSE_MAP:
-#ifdef ENABLE_CHECKING
+#if CHECKING_P
/* First check what we're prepared to handle in the following.  */
switch (OMP_CLAUSE_MAP_KIND (c))
  {


Here too...


-#ifdef ENABLE_CHECKING
-static void
+static void DEBUG_FUNCTION
  verify_curr_properties (function *fn, void *data)


Hmm, I noticed a few cases where we lost the DEBUG_FUNCTION annotation 
and was going to comment that this is one is odd - but don't we actually 
want to keep DEBUG_FUNCTION annotations for the others as well so that 
they don't get inlined everywhere and eliminated?



+ if (flag_checking)
+   {
+ FOR_EACH_EDGE (e, ei, bb->preds)
+   gcc_assert (!bitmap_bit_p (tovisit, e->src->index)
+   || (e->flags & EDGE_DFS_BACK));
+   }


Unnecessary braces.


+  if (CHECKING_P)
+{
+  for (; argno < PP_NL_ARGMAX; argno++)
+   gcc_assert (!formatters[argno]);
+}


Here too. Use flag_checking.


+  if (CHECKING_P && mode != VOIDmode)


flag_checking.


-#ifdef ENABLE_CHECKING
  static void
  validate_value_data (struct value_data *vd)
  {
+  if (!flag_checking)
+return;


Same thought as before, it might be better to have this check in the 
callers for easier use from the debugger.



-#endif
-

+


Don't change the whitespace 

Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-19 Thread Bernd Schmidt

On 10/18/2015 08:17 AM, Mikhail Maltsev wrote:

On 10/12/2015 11:57 PM, Jeff Law wrote:

-#ifdef ENABLE_CHECKING
+#if CHECKING_P


I fail to see the point of this change.

I'm guessing (and Mikhail, please correct me if I'm wrong), but I think he's
trying to get away from ENABLE_CHECKING and instead use a macro which is
always defined to a value.

Yes, exactly. Such macro is better because it can be used both for conditional
compilation (if needed) and normal if-s (unlike ENABLE_CHECKING).


But for normal C conditions the patches end up using flag_checking, so 
the CHECKING_P macro buys us nothing over ENABLE_CHECKING. A change like 
this is just churn: changing things without making forward progress, and 
every change like that will cause someone else grief when they have to 
adjust their own out-of-tree patches. (It's something I think we've been 
doing too much lately, and others have complained to me about this issue 
as well).


I'm ok with pretty much all of the rest of the changes (some minor 
comments to follow), so if you could eliminate CHECKING_P I'd be likely 
to approve them.



Bernd


Re: [PATCH] Use GET_MODE_BITSIZE to get vector natural alignment

2015-10-19 Thread Uros Bizjak
On Mon, Oct 19, 2015 at 1:12 PM, H.J. Lu  wrote:
> On Mon, Oct 19, 2015 at 4:05 AM, Uros Bizjak  wrote:
>> On Fri, Oct 16, 2015 at 7:42 PM, H.J. Lu  wrote:
>>> Since GET_MODE_ALIGNMENT is defined by psABI and the biggest alignment
>>> is 4 byte for IA MCU psABI, we should use GET_MODE_BITSIZE to get
>>> vector natural alignment to check misaligned vector move.
>>>
>>> OK for trunk?
>>>
>>> Thanks.
>>>
>>> H.J.
>>> ---
>>> * config/i386/i386.c (ix86_expand_vector_move): Use
>>> GET_MODE_BITSIZE to get vector natural alignment.
>>> ---
>>>  gcc/config/i386/i386.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index ebe2b0a..d0e1f4c 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -18650,7 +18650,9 @@ void
>>>  ix86_expand_vector_move (machine_mode mode, rtx operands[])
>>>  {
>>>rtx op0 = operands[0], op1 = operands[1];
>>> -  unsigned int align = GET_MODE_ALIGNMENT (mode);
>>> +  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT since the
>>> + biggest alignment is 4 byte for IA MCU psABI.  */
>>> +  unsigned int align = GET_MODE_BITSIZE (mode);
>>
>> How about using TARGET_IAMCU condition here and using bitsize only for
>> TARGET_IAMCU?
>>
>
> Works for me.  Is it OK with that change?

Yes.

Thanks,
Uros.


[mask-vec_cond, patch 3/2] SLP support

2015-10-19 Thread Ilya Enkovich
Hi,

This patch adds missing support for cond_expr with no embedded comparison in 
SLP.  No new test added because vec cmp SLP test becomes (due to changes in 
bool patterns by the first patch) a regression test for this patch.  Does it 
look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow
cond_exp with no embedded comparison.
(vect_build_slp_tree_1): Likewise.


diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index fa8291e..48311dd 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -257,7 +257,8 @@ vect_get_and_check_slp_defs (vec_info *vinfo,
 {
   enum tree_code code = gimple_assign_rhs_code (stmt);
   number_of_oprnds = gimple_num_ops (stmt) - 1;
-  if (gimple_assign_rhs_code (stmt) == COND_EXPR)
+  if (gimple_assign_rhs_code (stmt) == COND_EXPR
+ && COMPARISON_CLASS_P (gimple_assign_rhs1 (stmt)))
{
  first_op_cond = true;
  commutative = true;
@@ -482,7 +483,6 @@ vect_build_slp_tree_1 (vec_info *vinfo,
   machine_mode vec_mode;
   HOST_WIDE_INT dummy;
   gimple *first_load = NULL, *prev_first_load = NULL;
-  tree cond;
 
   /* For every stmt in NODE find its def stmt/s.  */
   FOR_EACH_VEC_ELT (stmts, i, stmt)
@@ -527,24 +527,6 @@ vect_build_slp_tree_1 (vec_info *vinfo,
  return false;
}
 
-   if (is_gimple_assign (stmt)
-  && gimple_assign_rhs_code (stmt) == COND_EXPR
-   && (cond = gimple_assign_rhs1 (stmt))
-   && !COMPARISON_CLASS_P (cond))
-{
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, 
-  "Build SLP failed: condition is not "
-  "comparison ");
-  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
-  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-}
- /* Fatal mismatch.  */
- matches[0] = false;
-  return false;
-}
-
   scalar_type = vect_get_smallest_scalar_type (stmt, , );
   vectype = get_vectype_for_scalar_type (scalar_type);
   if (!vectype)


Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify

2015-10-19 Thread Richard Biener
On Thu, Oct 15, 2015 at 3:50 PM, Christophe Lyon
 wrote:
> On 9 October 2015 at 18:11, James Greenhalgh  wrote:
>> On Thu, Oct 08, 2015 at 01:29:34PM +0100, Richard Biener wrote:
>>> > Thanks again for the comments Richard!
>>> >
>>> > A new algorithmic optimisation:
>>> >
>>> > ((X inner_op C0) outer_op C1)
>>> > With X being a tree where value_range has reasoned certain bits to always 
>>> > be
>>> > zero throughout its computed value range, we will call this the zero_mask,
>>> > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op.
>>> > if (inner_op == '^') C0 &= ~C1;
>>> > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1)
>>> > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1)
>>> >
>>> > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and
>>> > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or
>>> > and xor operators:
>>> > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and
>>> > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'.
>>> >
>>> > The second transformation enables more applications of the first. Also 
>>> > some
>>> > targets may benefit from delaying shift operations. I am aware that such 
>>> > an
>>> > optimization, in combination with one or more optimizations that cause the
>>> > reverse transformation, may lead to an infinite loop. Though such behavior
>>> > has not been detected during regression testing and bootstrapping on
>>> > aarch64.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira 
>>> >
>>> > * match.pd: Added a new pattern
>>> > ((X inner_op C0) outer_op C1)
>>> > and expanded existing one
>>> > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> > 2015-10-05 Andre Vieira 
>>> >
>>> > Hale Wang 
>>> >
>>> > * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>
>> As Andre does not have commit rights, I've committed this on his behalf as
>> revision 228661. Please watch for any fallout over the weekend.
>>
>
> Since this commit I'm seeing:
> FAIL: gcc.target/arm/xor-and.c scan-assembler orr
> on most arm targets.
>
> See: 
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/228661/report-build-info.html
>
> Since that's already a few days old, I suspect you are already aware of that?

Please file a bugreport.

Thanks,
Richard.

> Christophe.
>
>
>> Andre, please check your ChangeLog format in future. In the end I
>> committed this:
>>
>> gcc/ChangeLog
>>
>> 2015-10-09  Andre Vieira  
>>
>> * match.pd: ((X inner_op C0) outer_op C1) New pattern.
>> ((X & C2) << C1): Expand to...
>> (X {&,^,|} C2 << C1): ...This.
>> ((X & C2) >> C1): Expand to...
>> (X {&,^,|} C2 >> C1): ...This.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2015-10-09  Andre Vieira  
>> Hale Wang  
>>
>> * gcc.dg/tree-ssa/forwprop-33.c: New.
>>
>> Thanks,
>> James
>>


Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 12:48 PM, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> Richard Sandiford  writes:
>>> Marc Glisse  writes:
 On Thu, 15 Oct 2015, Richard Sandiford wrote:

> This patch makes sure that, for every simplification that uses
> fold_strip_sign_ops, there are associated match.pd rules for the
> leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
> will add a pass to handle more complex cases.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
>* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
>and x*x in cases where the operands are sign ops.  Extend these
>rules to handle copysign as a sign op (including for cos, cosh
>and pow, which already treated negate and abs as sign ops).
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 83c48cd..4331df6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
> (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
> (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
> (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI 
> BUILT_IN_CEXPIL)
> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH 
> BUILT_IN_CCOSHL)
> +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT 
> BUILT_IN_HYPOTL)
> +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
> + BUILT_IN_COPYSIGN
> + BUILT_IN_COPYSIGNL)
>
> /* Simplifications of operations with one constant operand and
>simplifications to constants or single values.  */
> @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
>(pows (op @0) REAL_CST@1)
>(with { HOST_WIDE_INT n; }
> (if (real_isinteger (_REAL_CST (@1), ) && (n & 1) == 0)
> - (pows @0 @1))
> + (pows @0 @1)
> + /* Strip negate and abs from both operands of hypot.  */
> + (for hypots (HYPOT)
> +  (simplify
> +   (hypots (op @0) @1)
> +   (hypots @0 @1))
> +  (simplify
> +   (hypots @0 (op @1))
> +   (hypots @0 @1)))

 Out of curiosity, would hypots:c have worked? (it is probably not worth
 gratuitously swapping the operands to save 3 lines though)
>>>
>>> Yeah, I think I'd prefer to keep it like it is if that's OK.
>>>
> + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
> + (for copysigns (COPYSIGN)
> +  (simplify
> +   (copysigns (op @0) @1)
> +   (copysigns @0 @1)))
> + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
> + (simplify
> +  (mult (op @0) (op @1))
> +  (mult @0 @0)))

 Typo @1 -> @0 ?
>>>
>>> Argh!  Thanks for catching that.  Wonder how many proof-reads that
>>> escaped :-(
>>>
 This will partially duplicate Naveen's patch "Move some bit and binary
 optimizations in simplify and match".
>>>
>>> OK.  Should I just limit it to the abs case?
>>>
> +/* copysign(x,y)*copysign(x,y) -> x*x.  */
> +(for copysigns (COPYSIGN)
> + (simplify
> +  (mult (copysigns @0 @1) (copysigns @0 @1))

 (mult (copysigns@2 @0 @1) @2)
 ? Or is there some reason not to rely on CSE? (I don't think copysign has
 any errno issue)
>>>
>>> No, simply didn't know about that trick.  I'll use it for the
>>> (mult (op @0) (op @0)) case as well.
>>
>> Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
>> pattern for now, but can limit it to abs as necessary when
>> Naveen's patch goes in.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>
>> Thanks,
>> Richard
>
> Er...
>
> gcc/
> * match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
> and x*x in cases where the operands are sign ops.  Extend these
> rules to handle copysign as a sign op (including for cos, cosh
> and pow, which already treated negate and abs as sign ops).

Ok.

Thanks,
Richard.

> diff --git a/gcc/match.pd b/gcc/match.pd
> index f3813d8..d677e69 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
>  (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
>  (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
>  (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
> +(define_operator_list HYPOT 

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Richard Biener
On Sun, Oct 18, 2015 at 7:14 PM, Jan Hubicka  wrote:
>>
>> Adding back the mode check is fine if all types with the same TYPE_CANONICAL 
>> have the same mode.  Otherwise we'd regress here.  I thought we do for
>>
>> Struct x { int i; };
>> Typedef y x __attribute__((packed));
>>
>> And then doing
>>
>> X x;
>> Y y;
>> X = y;
>
> Do you have any idea how to turn this into a testcase? I don't think we could
> add packed attribute to typedef. Even in
> gimple_canonical_types_compatible_p
>   /* Can't be the same type if they have different mode.  */
>   if (TYPE_MODE (t1) != TYPE_MODE (t2))
> return false;
> (which IMO may be wrong WRT -mavx flags where modes of same types may be 
> different
> in different TUs)

Ok, so the following works:

struct x { int i; };
typedef struct x y __attribute__((aligned(1)));

void foo (void)
{
  struct x X;
  y Y;
  X = Y;
}

but we use SImode for y as well even though it's alignment is just one byte ...

Not sure what happens on strict-align targets for this and not sure how this
cannot be _not_ a problem.  Consider

void bar (struct x);

and

bar (Y);

or using y *Y and X = *Y or bar (*Y).

> Therefore I would say that TYPE_CANONICAL determine mode modulo the fact that
> incoplete variant of a complete type will have VOIDmode instead of complete
> type's mode (during non-LTO).  That is why I allow mode changes for casts from
> complete to incomplete.

Incomplete have VOIDmode, right?

> In longer run I think that every query to useless_type_conversion_p that
> contains incomplete types is a confused query.  useless_type_conversion_p is
> about operations on the value and there are no operations for incomplete type
> (and function types).  I know that ipa-icf-gimple and the following code in
> gimplify-stmt checks this frequently:
>   /* The FEs may end up building ADDR_EXPRs early on a decl with
>  an incomplete type.  Re-build ADDR_EXPRs in canonical form
>  here.  */
>   if (!types_compatible_p (TREE_TYPE (op0), TREE_TYPE (TREE_TYPE (expr
> *expr_p = build_fold_addr_expr (op0);
> Taking address of incomplete type or functions, naturally, makes sense.  We 
> may
> want to check something else here, like simply
>TREE_TYPE (op0) != TREE_TYPE (TREE_TYPE (expr))
> and once ipa-icf is cleanded up start sanity checking in 
> usless_type_conversion
> that we use it to force equality only on types that do have values.
>
> We also can trip it when checking TYPE_METHOD_BASETYPE which may be 
> incomplete.
> This is in the code checking useless_type_conversion on functions that I think
> are confused querries anyway - we need the ABI matcher, I am looking into 
> that.

Ok, so given we seem to be fine in practive with TYPE_MODE (type) ==
TYPE_MODE (TYPE_CANONICAL (type))
(whether that's a but or not ...) I'm fine with re-instantiating the
mode check for
aggregate types.  Please do that with

Index: gcc/gimple-expr.c
===
--- gcc/gimple-expr.c   (revision 228963)
+++ gcc/gimple-expr.c   (working copy)
@@ -89,8 +89,7 @@ useless_type_conversion_p (tree outer_ty

   /* Changes in machine mode are never useless conversions unless we
  deal with aggregate types in which case we defer to later checks.  */
-  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
-  && !AGGREGATE_TYPE_P (inner_type))
+  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type))
 return false;

   /* If both the inner and outer types are integral types, then the

Can we asses equal sizes when modes are non-BLKmode then?  Thus

@@ -270,10 +269,9 @@ useless_type_conversion_p (tree outer_ty
  use the types in move operations.  */
   else if (AGGREGATE_TYPE_P (inner_type)
   && TREE_CODE (inner_type) == TREE_CODE (outer_type))
-return (!TYPE_SIZE (outer_type)
-   || (TYPE_SIZE (inner_type)
-   && operand_equal_p (TYPE_SIZE (inner_type),
-   TYPE_SIZE (outer_type), 0)));
+return (TYPE_MODE (outer_type) != BLKmode
+   || operand_equal_p (TYPE_SIZE (inner_type),
+   TYPE_SIZE (outer_type), 0));

   else if (TREE_CODE (inner_type) == OFFSET_TYPE
   && TREE_CODE (outer_type) == OFFSET_TYPE)

?  Hoping for VOIDmode incomplete case.

Richard.

> Honza
>>
>> Richard.
>>
>>
>> >Honza
>> >>
>> >> --
>> >> Eric Botcazou
>>


Re: Move some bit and binary optimizations in simplify and match

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 1:14 PM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
>>> That's not what Richard meant. We already have:
>
> Done. As per the comments.
>
> Please find attached the modified patch as per your comments.
>
> Please review them and let me know if any further modifications are required.

This patch is ok when bootstrapped / tested and with a proper changelog entry.

Thanks,
Richard.

> Thanks,
> Naveen


Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
>  wrote:
>> This patch adds a pass that collects information that is common to all
>> uses of an SSA name X and back-propagates that information up the statements
>> that generate X.  The general idea is to use the information to simplify
>> instructions (rather than a pure DCE) so I've simply called it
>> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>>
>> At the moment the only use of the pass is to remove unnecessry sign
>> operations, so that it's effectively a global version of
>> fold_strip_sign_ops.  I'm hoping it could be extended in future to
>> record which bits of an integer are significant.  There are probably
>> other potential uses too.
>>
>> A later patch gets rid of fold_strip_sign_ops.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
>> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
>> * common.opt (ftree-backprop): New option.
>> * fold-const.h (negate_mathfn_p): Declare.
>> * fold-const.c (negate_mathfn_p): Make public.
>> * timevar.def (TV_TREE_BACKPROP): New.
>> * tree-passes.h (make_pass_backprop): Declare.
>> * passes.def (pass_backprop): Add.
>> * tree-ssa-backprop.c: New file.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
>> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
>> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New 
>> tests.
>>
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index 783e4c9..69e669d 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1445,6 +1445,7 @@ OBJS = \
>> tree-switch-conversion.o \
>> tree-ssa-address.o \
>> tree-ssa-alias.o \
>> +   tree-ssa-backprop.o \
>> tree-ssa-ccp.o \
>> tree-ssa-coalesce.o \
>> tree-ssa-copy.o \
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 5060208..5aef625 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2364,6 +2364,10 @@ ftree-pta
>>  Common Report Var(flag_tree_pta) Optimization
>>  Perform function-local points-to analysis on trees.
>>
>> +ftree-backprop
>> +Common Report Var(flag_tree_backprop) Init(1) Optimization
>> +Enable backward propagation of use properties at the tree level.
>
> Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
> also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
> is good, but let's go for it.

OK.

>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> index de45a2c..7f00e72 100644
>> --- a/gcc/fold-const.c
>> +++ b/gcc/fold-const.c
>> @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
> warn_strict_overflow_code wc)
>>  /* Return true if the built-in mathematical function specified by CODE
>> is odd, i.e. -f(x) == f(-x).  */
>>
>> -static bool
>> +bool
>>  negate_mathfn_p (enum built_in_function code)
>>  {
>>switch (code)
>
> Belongs more to builtins.[ch] if exported.

The long-term plan is to abstract away whether it's a built-in function
or an internal function, in which case I hope to have a single predicate
that handles both.  I'm not sure where the code should go after that change.
Maybe a new file?

>> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
>> index ee74dc8..4d5b24b 100644
>> --- a/gcc/fold-const.h
>> +++ b/gcc/fold-const.h
>> @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
>>  extern tree exact_inverse (tree, tree);
>>  extern tree const_unop (enum tree_code, tree, tree);
>>  extern tree const_binop (enum tree_code, tree, tree, tree);
>> +extern bool negate_mathfn_p (enum built_in_function);
>>
>>  /* Return OFF converted to a pointer offset type suitable as offset for
>> POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index dc3f44c..36d2b3b 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
>>/* After CCP we rewrite no longer addressed locals into SSA
>>  form if possible.  */
>>NEXT_PASS (pass_complete_unrolli);
>> +  NEXT_PASS (pass_backprop);
>>NEXT_PASS (pass_phiprop);
>>NEXT_PASS (pass_forwprop);
>
> Any reason to not put this later?  I was thinking before reassoc.

I think we're relying on FRE to notice the redundancy in the
builtins-*.c tests, once this pass has converted the version
with redundant sign ops to make it look like the version without.
reassoc is likely to be too late.

I also thought it should go before rather than after some instance
of forwprop because the pass might expose more forward folding
opportunities.  E.g. if the sign of A = -B * B 

Re: [PATCH] PR middle-end/68002: introduce -fkeep-static-functions

2015-10-19 Thread Richard Biener
On Sat, Oct 17, 2015 at 5:17 PM, VandeVondele  Joost
 wrote:
> In some cases (e.g. coverage testing) it is useful to emit code for static 
> functions even if they are never used, which currently is not possible at -O1 
> and above. The following patch introduces a flag for this, which basically 
> triggers the same code that keeps those functions alive at -O0. Thanks to 
> Marc Glisse for replying at gcc-help and for suggesting where to look.
>
> Bootstrapped and regtested on x86_64-unknown-linux-gnu
>
> OK for trunk ?

Ok.

Thanks,
Richard.

> Joost


Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Sandiford  writes:
>> Marc Glisse  writes:
>>> On Thu, 15 Oct 2015, Richard Sandiford wrote:
>>>
 This patch makes sure that, for every simplification that uses
 fold_strip_sign_ops, there are associated match.pd rules for the
 leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
 will add a pass to handle more complex cases.

 Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
 OK to install?

 Thanks,
 Richard


 gcc/
* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
and x*x in cases where the operands are sign ops.  Extend these
rules to handle copysign as a sign op (including for cos, cosh
and pow, which already treated negate and abs as sign ops).

 diff --git a/gcc/match.pd b/gcc/match.pd
 index 83c48cd..4331df6 100644
 --- a/gcc/match.pd
 +++ b/gcc/match.pd
 @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
 +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
 +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH 
 BUILT_IN_CCOSHL)
 +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT 
 BUILT_IN_HYPOTL)
 +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
 + BUILT_IN_COPYSIGN
 + BUILT_IN_COPYSIGNL)

 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
 @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
(pows (op @0) REAL_CST@1)
(with { HOST_WIDE_INT n; }
 (if (real_isinteger (_REAL_CST (@1), ) && (n & 1) == 0)
 - (pows @0 @1))
 + (pows @0 @1)
 + /* Strip negate and abs from both operands of hypot.  */
 + (for hypots (HYPOT)
 +  (simplify
 +   (hypots (op @0) @1)
 +   (hypots @0 @1))
 +  (simplify
 +   (hypots @0 (op @1))
 +   (hypots @0 @1)))
>>>
>>> Out of curiosity, would hypots:c have worked? (it is probably not worth 
>>> gratuitously swapping the operands to save 3 lines though)
>>
>> Yeah, I think I'd prefer to keep it like it is if that's OK.
>>
 + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
 + (for copysigns (COPYSIGN)
 +  (simplify
 +   (copysigns (op @0) @1)
 +   (copysigns @0 @1)))
 + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
 + (simplify
 +  (mult (op @0) (op @1))
 +  (mult @0 @0)))
>>>
>>> Typo @1 -> @0 ?
>>
>> Argh!  Thanks for catching that.  Wonder how many proof-reads that
>> escaped :-(
>>
>>> This will partially duplicate Naveen's patch "Move some bit and binary 
>>> optimizations in simplify and match".
>>
>> OK.  Should I just limit it to the abs case?
>>
 +/* copysign(x,y)*copysign(x,y) -> x*x.  */
 +(for copysigns (COPYSIGN)
 + (simplify
 +  (mult (copysigns @0 @1) (copysigns @0 @1))
>>>
>>> (mult (copysigns@2 @0 @1) @2)
>>> ? Or is there some reason not to rely on CSE? (I don't think copysign has 
>>> any errno issue)
>>
>> No, simply didn't know about that trick.  I'll use it for the
>> (mult (op @0) (op @0)) case as well.
>
> Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
> pattern for now, but can limit it to abs as necessary when
> Naveen's patch goes in.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>
> Thanks,
> Richard

Er...

gcc/
* match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
and x*x in cases where the operands are sign ops.  Extend these
rules to handle copysign as a sign op (including for cos, cosh
and pow, which already treated negate and abs as sign ops).

diff --git a/gcc/match.pd b/gcc/match.pd
index f3813d8..d677e69 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
+(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
+(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
+(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT BUILT_IN_HYPOTL)
+(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
+  BUILT_IN_COPYSIGN
+  BUILT_IN_COPYSIGNL)
 
 /* Simplifications of operations with one constant 

Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 12:20:23PM +0200, Thomas Schwinge wrote:
> > @@ -77,7 +79,21 @@ enum gomp_map_kind
> >  /* ..., and copy from device.  */
> >  GOMP_MAP_FORCE_FROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
> >  /* ..., and copy to and from device.  */
> > -GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | 
> > GOMP_MAP_TOFROM)
> > +GOMP_MAP_FORCE_TOFROM =(GOMP_MAP_FLAG_FORCE | 
> > GOMP_MAP_TOFROM),
> > +/* If not already present, allocate.  And unconditionally copy to
> > +   device.  */
> > +GOMP_MAP_ALWAYS_TO =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
> > +/* If not already present, allocate.  And unconditionally copy from
> > +   device.  */
> > +GOMP_MAP_ALWAYS_FROM = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
> > +/* If not already present, allocate.  And unconditionally copy to and 
> > from
> > +   device.  */
> > +GOMP_MAP_ALWAYS_TOFROM =   (GOMP_MAP_FLAG_ALWAYS | 
> > GOMP_MAP_TOFROM),
> > +/* OpenMP 4.1 alias for forced deallocation.  */
> > +GOMP_MAP_DELETE =  GOMP_MAP_FORCE_DEALLOC,
> 
> To avoid confusion about two different identifiers naming the same
> functionality, I'd prefer to avoid such aliases ("GOMP_MAP_DELETE =
> GOMP_MAP_FORCE_DEALLOC"), and instead just rename GOMP_MAP_FORCE_DEALLOC
> to GOMP_MAP_DELETE, if that's the name you prefer.

If you are ok with removing GOMP_MAP_FORCE_DEALLOC and just use
GOMP_MAP_DELETE, that is ok by me, just post a patch.

> By the way, looking at GCC 6 libgomp compatibility regarding
> OpenACC/nvptx offloading for executables compiled with GCC 5, for the
> legacy entry point libgomp/oacc-parallel.c:GOACC_parallel only supports
> host-fallback execution, which doesn't pay attention to data clause at
> all (sizes and kinds formal parameters), so you're free to renumber
> GOMP_MAP_* if/where that makes sense.
> 
> > +/* Decrement usage count and deallocate if zero.  */
> > +GOMP_MAP_RELEASE = (GOMP_MAP_FLAG_ALWAYS
> > +| GOMP_MAP_FORCE_DEALLOC)
> >};
> 
> I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
> me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
> clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
> "(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?

It isn't related to always, but always really is something that affects
solely the data movement (i.e. to, from, tofrom), and while it can be
specified elsewhere, it makes no difference.  Wasting one bit just for that
is something we don't have the luxury for, which is why I've started using
that bit for other OpenMP stuff (it acts there like GOMP_MAP_FLAG_SPECIAL_2
to some extent).  It is not just release, but also the struct mapping etc.
I'll still need to make further changes, because the rules for mapping
structure element pointer/reference based array sections and structure
element references have changed again.

Some changes in the enum can be of course still be done until say mid stage3
but at least for OpenMP 4.0 we should keep backwards compatibility (so
whatever we've already used in GCC 4.9/5 should keep working).

Jakub


Re: Add simple sign-stripping cases to match.pd

2015-10-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Marc Glisse  writes:
>> On Thu, 15 Oct 2015, Richard Sandiford wrote:
>>
>>> This patch makes sure that, for every simplification that uses
>>> fold_strip_sign_ops, there are associated match.pd rules for the
>>> leaf sign ops, i.e. abs, negate and copysign.  A follow-on patch
>>> will add a pass to handle more complex cases.
>>>
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>> OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * match.pd: Add rules to simplify ccos, ccosh, hypot, copysign
>>> and x*x in cases where the operands are sign ops.  Extend these
>>> rules to handle copysign as a sign op (including for cos, cosh
>>> and pow, which already treated negate and abs as sign ops).
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 83c48cd..4331df6 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -61,6 +61,12 @@ along with GCC; see the file COPYING3.  If not see
>>> (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
>>> (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
>>> (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
>>> +(define_operator_list CCOS BUILT_IN_CCOSF BUILT_IN_CCOS BUILT_IN_CCOSL)
>>> +(define_operator_list CCOSH BUILT_IN_CCOSHF BUILT_IN_CCOSH BUILT_IN_CCOSHL)
>>> +(define_operator_list HYPOT BUILT_IN_HYPOTF BUILT_IN_HYPOT BUILT_IN_HYPOTL)
>>> +(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF
>>> +  BUILT_IN_COPYSIGN
>>> +  BUILT_IN_COPYSIGNL)
>>>
>>> /* Simplifications of operations with one constant operand and
>>>simplifications to constants or single values.  */
>>> @@ -321,7 +327,69 @@ along with GCC; see the file COPYING3.  If not see
>>>(pows (op @0) REAL_CST@1)
>>>(with { HOST_WIDE_INT n; }
>>> (if (real_isinteger (_REAL_CST (@1), ) && (n & 1) == 0)
>>> - (pows @0 @1))
>>> + (pows @0 @1)
>>> + /* Strip negate and abs from both operands of hypot.  */
>>> + (for hypots (HYPOT)
>>> +  (simplify
>>> +   (hypots (op @0) @1)
>>> +   (hypots @0 @1))
>>> +  (simplify
>>> +   (hypots @0 (op @1))
>>> +   (hypots @0 @1)))
>>
>> Out of curiosity, would hypots:c have worked? (it is probably not worth 
>> gratuitously swapping the operands to save 3 lines though)
>
> Yeah, I think I'd prefer to keep it like it is if that's OK.
>
>>> + /* copysign(-x, y) and copysign(abs(x), y) -> copysign(x, y).  */
>>> + (for copysigns (COPYSIGN)
>>> +  (simplify
>>> +   (copysigns (op @0) @1)
>>> +   (copysigns @0 @1)))
>>> + /* -x*-x and abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
>>> + (simplify
>>> +  (mult (op @0) (op @1))
>>> +  (mult @0 @0)))
>>
>> Typo @1 -> @0 ?
>
> Argh!  Thanks for catching that.  Wonder how many proof-reads that
> escaped :-(
>
>> This will partially duplicate Naveen's patch "Move some bit and binary 
>> optimizations in simplify and match".
>
> OK.  Should I just limit it to the abs case?
>
>>> +/* copysign(x,y)*copysign(x,y) -> x*x.  */
>>> +(for copysigns (COPYSIGN)
>>> + (simplify
>>> +  (mult (copysigns @0 @1) (copysigns @0 @1))
>>
>> (mult (copysigns@2 @0 @1) @2)
>> ? Or is there some reason not to rely on CSE? (I don't think copysign has 
>> any errno issue)
>
> No, simply didn't know about that trick.  I'll use it for the
> (mult (op @0) (op @0)) case as well.

Here's the updated patch.  I've kept the (mult (negate@1 @0) @1)
pattern for now, but can limit it to abs as necessary when
Naveen's patch goes in.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.

Thanks,
Richard



[gomp4, committed] Don't parallelize loops in oacc routine

2015-10-19 Thread Tom de Vries

Hi,

this patch prevents parloops from trying to parallelize loops in an oacc 
routine.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Don't parallelize loops in oacc routine

2015-10-19  Tom de Vries  

	* tree-parloops.c (parallelize_loops): Do not parallelize loops in
	offloaded functions.
---
 gcc/tree-parloops.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b2c2e6e..cef1b52 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -3191,6 +3191,11 @@ parallelize_loops (bool oacc_kernels_p)
   /* Do not parallelize loops in the functions created by parallelization.  */
   if (parallelized_function_p (cfun->decl))
 return false;
+
+  /* Do not parallelize loops in offloaded functions.  */
+  if (get_oacc_fn_attrib (cfun->decl) != NULL)
+return false;
+
   if (cfun->has_nonlocal_label)
 return false;
 
-- 
1.9.1



Re: [patch 2/6] scalar-storage-order merge: C front-end

2015-10-19 Thread Eric Botcazou
> > +  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
> > +error ("scalar_storage_order is not supported");
> 
> You might want to consider indicating why it's not supported.  Not that
> I expect folks to be using this on a pdp11 :-)

Done, I added "because endianness is not uniform".

> > -  /* For [y], return x+y */
> > -  if (TREE_CODE (arg) == ARRAY_REF)
> > -   {
> > - tree op0 = TREE_OPERAND (arg, 0);
> > - if (!c_mark_addressable (op0))
> > -   return error_mark_node;
> > -   }
> 
> Do we still get a proper diagnostic for [y] where x isn't something we
> can mark addressable?

Yes, c_mark_addressable is invoked on 'arg' later and the function looks into 
the prefix of an ARRAY_REF

> No real objections, assuming that [y] diagnostics is still handled
> correctly somewhere.

OK, thanks.

-- 
Eric Botcazou


Re: [RFC] Add OPTGROUP_PAR

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 9:27 AM, Tom de Vries  wrote:
> Hi,
>
> this patch adds OPTGROUP_PAR.
>
> It allows a user to see on stderr what loops are parallelized by
> pass_parallelize_loops, using -fopt-info-par:
> ...
> $ gcc -O2 -fopt-info-par test.c -ftree-parallelize-loops=32
> test.c:5:3: note: parallelized inner loop
> ...
>
> This patch doesn't include any MSG_MISSED_OPTIMIZATION/MSG_NOTE messages
> yet.
>
> Idea of the patch OK?
>
> Any other comments?

Ok.

> Thanks,
> - Tom


[hsa] Fix ICE in build_outer_var_ref within GPUKERNEL

2015-10-19 Thread Martin Jambor
Hi,

the following patch fixes a segfault when building an outer_ref which
would be in GPUKERNEL context hwen lowering.  In that case, we need to
use the outer context od the GPUKERNEL container.  Committed to the
branch.

Thanks,

Martin


2015-10-19  Martin Jambor  

* omp-low.c (build_outer_var_ref): If outer ctx is GPUKERNEL, use its
outer ctx.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 383f34a..5234a11 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1186,7 +1186,16 @@ build_outer_var_ref (tree var, omp_context *ctx)
x = var;
 }
   else if (ctx->outer)
-x = lookup_decl (var, ctx->outer);
+{
+  omp_context *outer = ctx->outer;
+  if (gimple_code (outer->stmt) == GIMPLE_OMP_GPUKERNEL)
+   {
+ outer = outer->outer;
+ gcc_assert (outer
+ && gimple_code (outer->stmt) != GIMPLE_OMP_GPUKERNEL);
+   }
+   x = lookup_decl (var, outer);
+}
   else if (is_reference (var))
 /* This can happen with orphaned constructs.  If var is reference, it is
possible it is shared and as such valid.  */


Re: [RFC VTV] Fix VTV for targets that have section anchors.

2015-10-19 Thread Ramana Radhakrishnan
On Tue, Oct 13, 2015 at 1:53 PM, Ramana Radhakrishnan
 wrote:
>
>
>
> On 12/10/15 21:44, Jeff Law wrote:
>> On 10/09/2015 03:17 AM, Ramana Radhakrishnan wrote:
>>> This started as a Friday afternoon project ...
>>>
>>> It turned out enabling VTV for AArch64 and ARM was a matter of fixing
>>> PR67868 which essentially comes from building libvtv with section
>>> anchors turned on. The problem was that the flow of control from
>>> output_object_block through to switch_section did not have the same
>>> special casing for the vtable section that exists in
>>> assemble_variable.
>> That's some ugly code.  You might consider factoring that code into a 
>> function and just calling it from both places.  Your version doesn't seem to 
>> handle PECOFF, so I'd probably refactor from assemble_variable.
>>
>
> I was a bit lazy as I couldn't immediately think of a target that would want 
> PECOFF, section anchors and VTV. That combination seems to be quite rare, 
> anyway point taken on the refactor.
>
> Ok if no regressions ?

Ping.

Ramana

>
>>>
>>> However both these failures also occur on x86_64 - so I'm content to
>>> declare victory on AArch64 as far as basic enablement goes.
>> Cool.
>>
>>>
>>> 1. Are the generic changes to varasm.c ok ? 2. Can we take the
>>> AArch64 support in now, given this amount of testing ? Marcus /
>>> Caroline ? 3. Any suggestions / helpful debug hints for VTV debugging
>>> (other than turning VTV_DEBUG on and inspecting trace) ?
>> I think that with refactoring they'd be good to go.  No opinions on the 
>> AArch64 specific question -- call for the AArch64 maintainers.
>>
>> Good to see someone hacking on vtv.  It's in my queue to look at as well.
>
> Yeah figuring out more about vtv is also in my background queue.
>
> regards
> Ramana
>
> PR other/67868
>
> * varasm.c (assemble_variable): Move special vtv handling to..
> (handle_vtv_comdat_sections): .. here. New function.
> (output_object_block): Handle vtv sections.
>
> libvtv/Changelog
>
> * configure.tgt: Support aarch64 and arm.


Re: [PATCH, committed] PR other/65800. Fix crash in gengtype's internal debug debug dump

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 1:46 AM, Mikhail Maltsev  wrote:
> Hi!
>
> gengtype has an option '-d' which allows to dump it's internal state. I 
> planned
> to use it in order to create some kind of list of all data which GCC stores in
> garbage-collected memory.
>
> Unfortunately this option was broken. The attached patch fixes it. Because it
> only affects gengtype's internal debugging option (and is also rather small), 
> I
> think it's OK to commit it without approve (as obvious).
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.

Ok.

> --
> Regards,
> Mikhail Maltsev
>
> gcc/ChangeLog:
>
> 2015-10-18  Mikhail Maltsev  
>
> PR other/65800
> * gengtype.c (dump_type): Handle TYPE_UNDEFINED correctly.


Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-19 Thread Szabolcs Nagy

On 14/10/15 10:55, Szabolcs Nagy wrote:

On 30/09/15 20:23, Andreas Krebbel wrote:

On 09/30/2015 06:21 PM, Szabolcs Nagy wrote:

On 30/09/15 14:47, Bernd Schmidt wrote:

On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the
issues on the various affected arches well enough. It looks like Jakub
had some input on the earlier changes, maybe he could take a look? Or
maybe rth knows best. Adding Ccs.

It would help to have examples of code generation demonstrating the
problem and how you would solve it. Input from the s390 maintainers
whether this is correct for their port would also be appreciated.


We are having the same problem on S/390. I think the GCC change is correct for 
S/390 as well.

-Andreas-



i think the approvals of arm and aarch64 maintainers
are needed to apply this fix for pr target/66912.

(only s390, arm and aarch64 use this predicate.)



i was told this needs global maintainer approval.
adding jakub and rth back to cc.





consider the TU

__attribute__((visibility("protected"))) int n;

int f () { return n; }

if n "binds_local" then gcc -O -fpic -S is like

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, n
  ldr w0, [x0, #:lo12:n]
  ret
  .size   f, .-f
  .protected  n
  .comm   n,4,4

so 'n' is a direct reference, not accessed through
the GOT ('n' will be in the .bss of the dso).
this is the current behavior.

if i remove the protected visibility attribute
then the access goes through GOT:

  .text
  .align  2
  .global f
  .arch armv8-a+fp+simd
  .type   f, %function
f:
  adrpx0, _GLOBAL_OFFSET_TABLE_
  ldr x0, [x0, #:gotpage_lo15:n]
  ldr w0, [x0]
  ret
  .size   f, .-f
  .comm   n,4,4

protected visibility means the definition cannot
be overridden by another module, but it should
still allow extern references.

if the main module references such an object then
(as an implementation detail) it may use copy
relocation against it, which places 'n' in the
main module and the dynamic linker should make
sure that references to 'n' point there.

this is only possible if references to 'n' go
through the GOT (i.e. it should not be "binds_local").












Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Thomas Schwinge
Hi!

On Thu, 11 Jun 2015 14:14:20 +0200, Jakub Jelinek  wrote:
> On Tue, Jun 09, 2015 at 09:36:08PM +0300, Ilya Verbin wrote:
> > On Wed, Apr 29, 2015 at 14:06:44 +0200, Jakub Jelinek wrote:
> > > [...] The draft requires only alloc or to
> > > (or always, variants) for enter data and only from or delete (or always,
> > > variants) for exit data, so in theory it is possible to figure that from
> > > the call without extra args, but not so for update - enter data is 
> > > supposed
> > > to increment reference counts, exit data decrement. [...]
> > 
> > TR3.pdf also says about 'release' map-type for exit data, but it is not
> > described in the document.
> 
> So, I've committed a patch to add parsing release map-kind, and fix up or add
> verification in C/C++ FE what map-kinds are used.
> 
> Furthermore, it seems the OpenMP 4.1 always modifier is something completely
> unrelated to the OpenACC force flag, in OpenMP 4.1 everything is reference
> count based, and always seems to make a difference only for from/to/tofrom,
> where it says that the copying is done unconditionally; thus the patch uses
> a different bit for that.

Aha, I see.  (The poor OpenACC/OpenMP users, having to remember so may
small yet intricate details...)

> include/
>   * gomp-constants.h (GOMP_MAP_FLAG_ALWAYS): Define.
>   (enum gomp_map_kind): Add GOMP_MAP_ALWAYS_TO, GOMP_MAP_ALWAYS_FROM,
>   GOMP_MAP_ALWAYS_TOFROM, GOMP_MAP_DELETE, GOMP_MAP_RELEASE.

> --- include/gomp-constants.h.jj   2015-05-21 11:12:09.0 +0200
> +++ include/gomp-constants.h  2015-06-11 11:24:32.041654947 +0200
> @@ -41,6 +41,8 @@
>  #define GOMP_MAP_FLAG_SPECIAL_1  (1 << 3)
>  #define GOMP_MAP_FLAG_SPECIAL(GOMP_MAP_FLAG_SPECIAL_1 \
>| GOMP_MAP_FLAG_SPECIAL_0)
> +/* OpenMP always flag.  */
> +#define GOMP_MAP_FLAG_ALWAYS (1 << 6)
>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
>  #define GOMP_MAP_FLAG_FORCE  (1 << 7)
>  
> @@ -77,7 +79,21 @@ enum gomp_map_kind
>  /* ..., and copy from device.  */
>  GOMP_MAP_FORCE_FROM =(GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
>  /* ..., and copy to and from device.  */
> -GOMP_MAP_FORCE_TOFROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM)
> +GOMP_MAP_FORCE_TOFROM =  (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
> +/* If not already present, allocate.  And unconditionally copy to
> +   device.  */
> +GOMP_MAP_ALWAYS_TO = (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_TO),
> +/* If not already present, allocate.  And unconditionally copy from
> +   device.  */
> +GOMP_MAP_ALWAYS_FROM =   (GOMP_MAP_FLAG_ALWAYS | GOMP_MAP_FROM),
> +/* If not already present, allocate.  And unconditionally copy to and 
> from
> +   device.  */
> +GOMP_MAP_ALWAYS_TOFROM = (GOMP_MAP_FLAG_ALWAYS | 
> GOMP_MAP_TOFROM),
> +/* OpenMP 4.1 alias for forced deallocation.  */
> +GOMP_MAP_DELETE =GOMP_MAP_FORCE_DEALLOC,

To avoid confusion about two different identifiers naming the same
functionality, I'd prefer to avoid such aliases ("GOMP_MAP_DELETE =
GOMP_MAP_FORCE_DEALLOC"), and instead just rename GOMP_MAP_FORCE_DEALLOC
to GOMP_MAP_DELETE, if that's the name you prefer.

By the way, looking at GCC 6 libgomp compatibility regarding
OpenACC/nvptx offloading for executables compiled with GCC 5, for the
legacy entry point libgomp/oacc-parallel.c:GOACC_parallel only supports
host-fallback execution, which doesn't pay attention to data clause at
all (sizes and kinds formal parameters), so you're free to renumber
GOMP_MAP_* if/where that makes sense.

> +/* Decrement usage count and deallocate if zero.  */
> +GOMP_MAP_RELEASE =   (GOMP_MAP_FLAG_ALWAYS
> +  | GOMP_MAP_FORCE_DEALLOC)
>};

I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
"(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov

Hi all,

This second patch teaches simplify_binary_operation to return the dereferenced
constants from the constant pool in the binary expression if other 
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
  return a * 32.0f;
}

to generate the code:
foo:
fcvtzs  w0, s0, #5
ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
(fix:SI (mult:SF (reg:SF 32 v0 [ a ])
(const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.

I've seen this patch trigger once in 453.gromacs from SPEC2006 on aarch64 where 
it
ended up eliminating a floating-point multiplication and a load from a constant 
pool.
There were no other changes, so I reckon this is pretty low impact.

Bootstrapped and tested on aarch64, arm, x86_64.
CC'ing Eric as this is an RTL optimisation and Segher as this is something that
has an effect through combine.

Ok for trunk?

Thanks,
Kyrill

2015-10-19  Kyrylo Tkachov  

* simplify-rtx.c (simplify_binary_operation): If either operand was
a constant pool reference use them if all other simplifications failed.

2015-10-19  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: Add multiply-by-32 cases.
commit f941a03f6ca5dcc0d509490d0e0ec39cefed714b
Author: Kyrylo Tkachov 
Date:   Mon Oct 12 17:12:34 2015 +0100

[simplify-rtx] Use constants from pool when simplifying binops

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5ea5522..519850a 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2001,7 +2001,17 @@ simplify_binary_operation (enum rtx_code code, machine_mode mode,
   tem = simplify_const_binary_operation (code, mode, trueop0, trueop1);
   if (tem)
 return tem;
-  return simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+  tem = simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+
+  if (tem)
+return tem;
+
+  /* If the above steps did not result in a simplification and op0 or op1
+ were constant pool references, use the referenced constants directly.  */
+  if (trueop0 != op0 || trueop1 != op1)
+return simplify_gen_binary (code, mode, trueop0, trueop1);
+
+  return NULL_RTX;
 }
 
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
index 5af8290..354f2be 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
@@ -83,6 +83,17 @@ FUNC_DEFD (16)
 /* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" } } */
 /* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" } } */
 
+FUNC_DEFS (32)
+FUNC_DEFD (32)
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tx\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], s\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tx\[0-9\], d\[0-9\]*.*#5" } } */
+/* { dg-final { scan-assembler "fcvtzu\tw\[0-9\], d\[0-9\]*.*#5" } } */
+
 
 #define FUNC_TESTS(__a, __b)	\
 do\
@@ -120,10 +131,12 @@ main (void)
   FUNC_TESTS (4, i);
   FUNC_TESTS (8, i);
   FUNC_TESTS (16, i);
+  FUNC_TESTS (32, i);
 
   FUNC_TESTD (4, i);
   FUNC_TESTD (8, i);
   FUNC_TESTD (16, i);
+  FUNC_TESTD (32, i);
 }
   return 0;
 }


[PATCH v10][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-10-19 Thread Benedikt Huber
This tenth revision of the patch:
 * Removes unnecessary enum.

Ok for check in.


Benedikt Huber (1):
  2015-10-19  Benedikt Huber  
Philipp Tomsich  

 gcc/ChangeLog  |  20 
 gcc/config/aarch64/aarch64-builtins.c  | 115 +
 gcc/config/aarch64/aarch64-protos.h|   4 +
 gcc/config/aarch64/aarch64-simd.md |  27 +
 gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
 gcc/config/aarch64/aarch64.c   | 107 ++-
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   5 +
 gcc/doc/invoke.texi|  12 +++
 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c | 111 
 .../gcc.target/aarch64/rsqrt_asm_check_1.c |  25 +
 .../gcc.target/aarch64/rsqrt_asm_check_common.h|  42 
 .../aarch64/rsqrt_asm_check_negative_1.c   |  12 +++
 13 files changed, 482 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c

-- 
1.9.1



[PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-19 Thread Kyrill Tkachov

Hi all,

The fcvtzs and fcvtzu instructions have a form where they convert to a 
fixed-point form with a specified number of
fractional bits. In practice this has the effect of multiplying the floating point 
argument by 2^
and then converting the result to integer. We can exploit that behaviour during 
combine to eliminate a floating-point multiplication
by an FP immediate that is a power of 2 i.e. 4.0, 8.0, 16.0 etc.
For example for code:
int
sffoo1 (float a)
{
  return a * 4.0f;
}

we currently generate:
sffoo1:
fmovs1, 4.0e+0
fmuls0, s0, s1
fcvtzs  w0, s0
ret

with this patch we can generate:
sffoo1:
fcvtzs  w0, s0, #2
ret

We already pefrom the analogous combination for the arm target (see the 
*combine_vcvtf2i pattern in config/arm/vfp.md)
However, this patch also implements the fcvtzu form i.e. the unsigned_fix form 
as well as the vector forms.

However, not everything is rosy. The code:
int
foo (float a)
{
  return a * 32.0f;
}

will not trigger the optimisation because 32.0f is stored in the constant pool 
and due to a deficiency in
simplify-rtx.c the simplification doesn't get through. I have a patch to fix 
that as part 2/2.

Also, for code:
int
foo (float a)
{
  return a * 2.0f;
}

This gets folded early on as a + a and thus emits an fadd instruction followed 
by a fcvtzs.
Nothing we can do about that (in this patch at least).

I've seen this trigger once in 453.povray in SPEC2006 and one other time in 
435.gromacs after
patch 2/2 is applied. I've heard this can also trigger in codec-like codebases 
and I did see it
trigger a few times in ffmpeg.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-19  Kyrylo Tkachov  

* config/aarch64/aarch64.md
 (*aarch64_fcvt2_mult): New pattern.
* config/aarch64/aarch64-simd.md
 (*aarch64_fcvt2_mult): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
(aarch64_fpconst_pow_of_2): New function.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
prototype.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
(aarch64_fp_vec_pow2): Likewise.

2015-10-19  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: New test.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
commit a13a5967a1f94744776d616ca84d5512b24bf546
Author: Kyrylo Tkachov 
Date:   Thu Oct 8 15:17:47 2015 +0100

[AArch64] Add fmul+fcvt optimisation

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a8ac8d3..309dcfb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+int aarch64_fpconst_pow_of_2 (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
+int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 const char *aarch64_output_move_struct (rtx *operands);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6a2ab61..3d2c496 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1654,6 +1654,27 @@ (define_insn "l2"
   [(set_attr "type" "neon_fp_to_int_")]
 )
 
+(define_insn "*aarch64_fcvt2_mult"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(FIXUORS: (unspec:
+			   [(mult:VDQF
+	 (match_operand:VDQF 1 "register_operand" "w")
+	 (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))]
+			   UNSPEC_FRINTZ)))]
+  "TARGET_SIMD
+   && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (GET_MODE_INNER (mode)))"
+  {
+int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]);
+char buf[64];
+sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);
+output_asm_insn (buf, operands);
+return "";
+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+
 (define_expand "2"
   [(set (match_operand: 0 "register_operand")
 	(FIXUORS: (unspec:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2ec76a5..9b76746 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6808,6 +6808,19 @@ cost_plus:
 	  else
 	*cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
 	}
+
+  /* We can combine fmul by a power of 2 

Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Szabolcs Nagy

On 19/10/15 12:12, Alan Modra wrote:

On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:

A powerpc toolchain built with (or without) --enable-secureplt
currently creates a binary that uses bss plt if

(1) any of the linked PIC objects have bss plt relocs
(2) or all the linked objects are non-PIC or have no relocs,

because this is the binutils linker behaviour.

This patch passes --secure-plt to the linker which makes the linker
warn in case (1) and produce a binary with secure plt in case (2).


The idea is OK I think, but


@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
  %{R*} \
  %(link_shlib) \
  %{!T*: %(link_start) } \
+%{!static: %(link_secure_plt_default)} \
  %(link_os)"


this change needs to be conditional on !mbss-plt too.



OK, will change that.

if -msecure-plt and -mbss-plt are supposed to affect
linking too (not just code gen) then shall i add
%{msecure-plt: --secure-plt} too?



Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-19 Thread Mikhail Maltsev
On 10/19/2015 02:13 PM, Bernd Schmidt wrote:
> But for normal C conditions the patches end up using flag_checking, so
> the CHECKING_P macro buys us nothing over ENABLE_CHECKING.
Presumably 'if (CHECKING_P)' can be used for performance-critical parts
(in this case the condition will be DCE-d) and also for those parts of
the compiler which we want to decouple from 'options.h'.
IIRC, Jeff's idea was get rid of 'ENABLE_CHECKING' completely and use
either 'flag_checking' or 'CHECKING_P'. But I don't know what is the
consensus on it (I would like to hear Jeff's and Richard's opinion).
Of course it will be easy for me to adjust the patch to whatever the
final decision will be.

-- 
Regards,
Mikhail Maltsev


[PATCH] Move cproj simplification to match.pd

2015-10-19 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-19  Richard Biener  

* gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
(gimple_stmt_nonnegative_warnv_p): Use it.
* match.pd (CPROJ): New operator list.
(cproj (complex ...)): Move simplifications from ...
* builtins.c (fold_builtin_cproj): ... here.

* gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 228877)
+++ gcc/gimple-fold.c   (working copy)
@@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
strict_overflow_p, depth);
 }
 
+/* Return true if return value of call STMT is known to be non-negative.
+   If the return value is based on the assumption that signed overflow is
+   undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
+   *STRICT_OVERFLOW_P.  DEPTH is the current nesting depth of the query.  */
+
+static bool
+gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
+   int depth)
+{
+  for (unsigned i = 0; i < gimple_phi_num_args (stmt); ++i)
+{
+  tree arg = gimple_phi_arg_def (stmt, i);
+  if (!tree_single_nonnegative_warnv_p (arg, strict_overflow_p, depth + 1))
+   return false;
+}
+  return true;
+}
+
 /* Return true if STMT is known to compute a non-negative value.
If the return value is based on the assumption that signed overflow is
undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
@@ -6241,6 +6259,9 @@ gimple_stmt_nonnegative_warnv_p (gimple
 case GIMPLE_CALL:
   return gimple_call_nonnegative_warnv_p (stmt, strict_overflow_p,
  depth);
+case GIMPLE_PHI:
+  return gimple_phi_nonnegative_warnv_p (stmt, strict_overflow_p,
+depth);
 default:
   return false;
 }
Index: gcc/match.pd
===
--- gcc/match.pd(revision 228877)
+++ gcc/match.pd(working copy)
@@ -61,6 +61,7 @@ (define_operator_list COS BUILT_IN_COSF
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
+(define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL)
 
 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
@@ -2361,6 +2362,32 @@ (define_operator_list CEXPI BUILT_IN_CEX
(cbrts (pows tree_expr_nonnegative_p@0 @1))
(pows @0 (mult @1 { build_real_truncate (type, dconst_third ()); })
 
+/* If the real part is inf and the imag part is known to be
+   nonnegative, return (inf + 0i).  */
+(simplify
+ (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1))
+ (if (real_isinf (TREE_REAL_CST_PTR (@0)))
+  (with
+{
+  REAL_VALUE_TYPE rinf;
+  real_inf ();
+}
+   { build_complex (type, build_real (TREE_TYPE (type), rinf),
+   build_zero_cst (TREE_TYPE (type))); })))
+/* If the imag part is inf, return (inf+I*copysign(0,imag)).  */
+(simplify
+ (CPROJ (complex @0 REAL_CST@1))
+ (if (real_isinf (TREE_REAL_CST_PTR (@1)))
+  (with
+{
+  REAL_VALUE_TYPE rinf, rzero = dconst0;
+  real_inf ();
+  rzero.sign = TREE_REAL_CST_PTR (@1)->sign;
+}
+   { build_complex (type, build_real (TREE_TYPE (type), rinf),
+   build_real (TREE_TYPE (type), rzero)); })))
+
+
 /* Narrowing of arithmetic and logical operations. 
 
These are conceptually similar to the transformations performed for
Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 228877)
+++ gcc/builtins.c  (working copy)
@@ -7657,33 +7657,6 @@ fold_builtin_cproj (location_t loc, tree
   else
return arg;
 }
-  else if (TREE_CODE (arg) == COMPLEX_EXPR)
-{
-  tree real = TREE_OPERAND (arg, 0);
-  tree imag = TREE_OPERAND (arg, 1);
-
-  STRIP_NOPS (real);
-  STRIP_NOPS (imag);
-  
-  /* If the real part is inf and the imag part is known to be
-nonnegative, return (inf + 0i).  Remember side-effects are
-possible in the imag part.  */
-  if (TREE_CODE (real) == REAL_CST
- && real_isinf (TREE_REAL_CST_PTR (real))
- && tree_expr_nonnegative_p (imag))
-   return omit_one_operand_loc (loc, type,
-build_complex_cproj (type, false),
-arg);
-  
-  /* If the imag part is inf, return (inf+I*copysign(0,imag)).
-Remember side-effects are possible in the real part.  */
-  if (TREE_CODE (imag) == REAL_CST
-

[PATCH] 2015-10-19 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>

2015-10-19 Thread Benedikt Huber
* config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
* config/aarch64/aarch64-protos.h: Declare.
* config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
frsqrts.
* config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
* config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
when
applicable.
* config/aarch64/aarch64.md: Added enum entries.
* config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common macros 
for
assembly checks.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
frsqrts and frsqrte are not emitted.
* testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
and
frsqrte are emitted.
* testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.

Signed-off-by: Philipp Tomsich 
---
 gcc/ChangeLog  |  20 
 gcc/config/aarch64/aarch64-builtins.c  | 115 +
 gcc/config/aarch64/aarch64-protos.h|   4 +
 gcc/config/aarch64/aarch64-simd.md |  27 +
 gcc/config/aarch64/aarch64-tuning-flags.def|   1 +
 gcc/config/aarch64/aarch64.c   | 107 ++-
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   5 +
 gcc/doc/invoke.texi|  12 +++
 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c | 111 
 .../gcc.target/aarch64/rsqrt_asm_check_1.c |  25 +
 .../gcc.target/aarch64/rsqrt_asm_check_common.h|  42 
 .../aarch64/rsqrt_asm_check_negative_1.c   |  12 +++
 13 files changed, 482 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f39753d..596c9c3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2015-10-19  Benedikt Huber  
+   Philipp Tomsich  
+
+   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
+   * config/aarch64/aarch64-protos.h: Declare.
+   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
+   frsqrts.
+   * config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
+   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
when
+   applicable.
+   * config/aarch64/aarch64.md: Added enum entries.
+   * config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common macros 
for
+   assembly checks.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
+   frsqrts and frsqrte are not emitted.
+   * testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
and
+   frsqrte are emitted.
+   * testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.
+
 2015-10-16  Trevor Saunders  
 
* lra-constraints.c (add_next_usage_insn): Change argument type
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index a1998ed..6b4208f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -324,6 +324,11 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_RSQRT_DF,
+  AARCH64_BUILTIN_RSQRT_SF,
+  AARCH64_BUILTIN_RSQRT_V2DF,
+  AARCH64_BUILTIN_RSQRT_V2SF,
+  AARCH64_BUILTIN_RSQRT_V4SF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include "aarch64-simd-builtins.def"
@@ -822,6 +827,46 @@ aarch64_init_crc32_builtins ()
 }
 }
 
+/* Add builtins for reciprocal square root.  */
+
+void
+aarch64_init_builtin_rsqrt (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree V2SF_type_node = build_vector_type (float_type_node, 2);
+  tree V2DF_type_node = build_vector_type (double_type_node, 2);
+  tree V4SF_type_node = build_vector_type (float_type_node, 4);
+
+  struct builtin_decls_data
+  {
+tree type_node;
+const char *builtin_name;
+int function_code;
+  };
+
+  builtin_decls_data bdda[] =
+  {
+{ double_type_node, "__builtin_aarch64_rsqrt_df", AARCH64_BUILTIN_RSQRT_DF 
},
+{ float_type_node, "__builtin_aarch64_rsqrt_sf", AARCH64_BUILTIN_RSQRT_SF 
},
+{ V2DF_type_node, "__builtin_aarch64_rsqrt_v2df", 
AARCH64_BUILTIN_RSQRT_V2DF },
+{ 

Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Segher Boessenkool
Hi Kyrill,

On Mon, Oct 19, 2015 at 02:57:54PM +0100, Kyrill Tkachov wrote:
> because combine now successfully tries to match:
> (set (reg/i:SI 0 x0)
> (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
> (const_double:SF 3.2e+1 [0x0.8p+6]
> 
> whereas before it would not try the to use the const_double directly
> but rather its constant pool reference.

What happens if the constant pool reference is actually the better
code, do we still generate that?


Segher


Move cabs simplifications to match.pd

2015-10-19 Thread Richard Sandiford
The fold code also expanded cabs(x+yi) to fsqrt(x*x+y*y) when optimising
for speed.  tree-ssa-math-opts.c has this transformation too, but unlike
the fold code, it first checks whether the target implements the sqrt
optab.  The patch simply removes the fold code and keeps the
tree-ssa-math-opts.c logic the same.

gcc.dg/lto/20110201-1_0.c was relying on us replacing cabs
with fsqrt even on targets where fsqrt is itself a library call.
The discussion leading up to that patch suggested that we only
want to test the fold on targets with a square root instruction,
so it would be OK to skip the test on other targets:

https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01961.html
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg02036.html

The patch does that using the sqrt_insn effective target.

It's possible that removing the tree folds renders the LTO trick
unnecessary, but since the test was originally for an ICE, it seems
better to leave it as-is.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
20110201-1_0.c passes on all three.  OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_cabs): Delete.
(fold_builtin_1): Update accordingly.  Handle constant arguments here.
* match.pd: Add rules previously handled by fold_builtin_cabs.

gcc/testsuite/
* gcc.dg/lto/20110201-1_0.c: Restrict to sqrt_insn targets.
Add associated options for arm*-*-*.
(sqrt): Remove dummy definition.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1e4ec35..8f87fd9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7539,82 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg)
   return NULL_TREE;
 }
 
-/* Fold call to builtin cabs, cabsf or cabsl with argument ARG.  TYPE is the
-   return type.  Return NULL_TREE if no simplification can be made.  */
-
-static tree
-fold_builtin_cabs (location_t loc, tree arg, tree type, tree fndecl)
-{
-  tree res;
-
-  if (!validate_arg (arg, COMPLEX_TYPE)
-  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
-return NULL_TREE;
-
-  /* Calculate the result when the argument is a constant.  */
-  if (TREE_CODE (arg) == COMPLEX_CST
-  && (res = do_mpfr_arg2 (TREE_REALPART (arg), TREE_IMAGPART (arg),
- type, mpfr_hypot)))
-return res;
-
-  if (TREE_CODE (arg) == COMPLEX_EXPR)
-{
-  tree real = TREE_OPERAND (arg, 0);
-  tree imag = TREE_OPERAND (arg, 1);
-
-  /* If either part is zero, cabs is fabs of the other.  */
-  if (real_zerop (real))
-   return fold_build1_loc (loc, ABS_EXPR, type, imag);
-  if (real_zerop (imag))
-   return fold_build1_loc (loc, ABS_EXPR, type, real);
-
-  /* cabs(x+xi) -> fabs(x)*sqrt(2).  */
-  if (flag_unsafe_math_optimizations
- && operand_equal_p (real, imag, OEP_PURE_SAME))
-{
- STRIP_NOPS (real);
- return fold_build2_loc (loc, MULT_EXPR, type,
- fold_build1_loc (loc, ABS_EXPR, type, real),
- build_real_truncate (type, dconst_sqrt2 ()));
-   }
-}
-
-  /* Optimize cabs(-z) and cabs(conj(z)) as cabs(z).  */
-  if (TREE_CODE (arg) == NEGATE_EXPR
-  || TREE_CODE (arg) == CONJ_EXPR)
-return build_call_expr_loc (loc, fndecl, 1, TREE_OPERAND (arg, 0));
-
-  /* Don't do this when optimizing for size.  */
-  if (flag_unsafe_math_optimizations
-  && optimize && optimize_function_for_speed_p (cfun))
-{
-  tree sqrtfn = mathfn_built_in (type, BUILT_IN_SQRT);
-
-  if (sqrtfn != NULL_TREE)
-   {
- tree rpart, ipart, result;
-
- arg = builtin_save_expr (arg);
-
- rpart = fold_build1_loc (loc, REALPART_EXPR, type, arg);
- ipart = fold_build1_loc (loc, IMAGPART_EXPR, type, arg);
-
- rpart = builtin_save_expr (rpart);
- ipart = builtin_save_expr (ipart);
-
- result = fold_build2_loc (loc, PLUS_EXPR, type,
-   fold_build2_loc (loc, MULT_EXPR, type,
-rpart, rpart),
-   fold_build2_loc (loc, MULT_EXPR, type,
-ipart, ipart));
-
- return build_call_expr_loc (loc, sqrtfn, 1, result);
-   }
-}
-
-  return NULL_TREE;
-}
-
 /* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
complex tree type of the result.  If NEG is true, the imaginary
zero is negative.  */
@@ -9683,7 +9607,11 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
 break;
 
 CASE_FLT_FN (BUILT_IN_CABS):
-  return fold_builtin_cabs (loc, arg0, type, fndecl);
+  if (TREE_CODE (arg0) == COMPLEX_CST
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE)
+return do_mpfr_arg2 (TREE_REALPART (arg0), TREE_IMAGPART (arg0),
+type, mpfr_hypot);
+  break;
 
 CASE_FLT_FN (BUILT_IN_CARG):
   return 

Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-19 Thread Michael Matz
Hi,

On Fri, 16 Oct 2015, David Malcolm wrote:

> This fixes much of the bloat seen for influence.i when sending ranges 
> through for every token.

Yeah, I think that's on the right track.

> This was with 8 bits allocated for packed ranges (which is probably 
> excessive, but it makes debugging easier).

Probably in the end it should be done similar to how column bits are dealt 
with, start with a reasonably low number (5 bits?) and increase if 
necessary and budget allows (budget being column+range < N bits && range < 
8 bits, or so; so that range can't consume all of the column bits).


Ciao,
Michael.


Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Bernd Schmidt

On 10/19/2015 03:57 PM, Kyrill Tkachov wrote:

This second patch teaches simplify_binary_operation to return the
dereferenced
constants from the constant pool in the binary expression if other
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
   return a * 32.0f;
}

to generate the code:
foo:
 fcvtzs  w0, s0, #5
 ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.


The only way I could see a problem with that if there are circumstances 
where the memory variant would simplify further. That doesn't seem 
highly likely, so...



 * simplify-rtx.c (simplify_binary_operation): If either operand was
 a constant pool reference use them if all other simplifications
failed.


Ok.


Bernd



Re: [gomp4.1] map clause parsing improvements

2015-10-19 Thread Thomas Schwinge
Hi!

On Mon, 19 Oct 2015 12:34:08 +0200, Jakub Jelinek  wrote:
> On Mon, Oct 19, 2015 at 12:20:23PM +0200, Thomas Schwinge wrote:
> > > +/* Decrement usage count and deallocate if zero.  */
> > > +GOMP_MAP_RELEASE =   (GOMP_MAP_FLAG_ALWAYS
> > > +  | GOMP_MAP_FORCE_DEALLOC)
> > >};
> > 
> > I have not yet read the OpenMP 4.1/4.5 standard, but it's not obvious to
> > me here how the GOMP_MAP_FLAG_ALWAYS flag relates to the OpenMP release
> > clause (GOMP_MAP_RELEASE here)?  Shouldn't GOMP_MAP_RELEASE be
> > "(GOMP_MAP_FLAG_SPECIAL_1 | 3)" or similar?
> 
> It isn't related to always, but always really is something that affects
> solely the data movement (i.e. to, from, tofrom), and while it can be
> specified elsewhere, it makes no difference.  Wasting one bit just for that
> is something we don't have the luxury for, which is why I've started using
> that bit for other OpenMP stuff (it acts there like GOMP_MAP_FLAG_SPECIAL_2
> to some extent).  It is not just release, but also the struct mapping etc.
> I'll still need to make further changes, because the rules for mapping
> structure element pointer/reference based array sections and structure
> element references have changed again.

Hmm, I do think we should allow the luxury to use its own bit for
GOMP_MAP_FLAG_ALWAYS -- we can extend the interface later, should we
really find uses for the other two remaining bits -- or if not using a
separate bit, at least make sure that GOMP_MAP_FLAG_ALWAYS is not used as
a flag.  See, for example, the following occasions where
GOMP_MAP_FLAG_ALWAYS is used as a flag: these conditionals will also be
matched for GOMP_MAP_STRUCT, GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION, and
GOMP_MAP_RELEASE.  I have not analyzed whether that is erroneous or not,
but it surely is confusing?

$ < gcc/gimplify.c grep -C3 GOMP_MAP_FLAG_ALWAYS
  struct_map_to_clause->put (decl, *list_p);
  list_p = _CLAUSE_CHAIN (*list_p);
  flags = GOVD_MAP | GOVD_EXPLICIT;
  if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
flags |= GOVD_SEEN;
  goto do_add_decl;
}
--
  tree *sc = NULL, *pt = NULL;
  if (!ptr && TREE_CODE (*osc) == TREE_LIST)
osc = _PURPOSE (*osc);
  if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
n->value |= GOVD_SEEN;
  offset_int o1, o2;
  if (offset)
--
  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
  if ((ctx->region_type & ORT_TARGET) != 0
  && !(n->value & GOVD_SEEN)
  && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
{
  remove = true;

I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2,
and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the
three "always,to", "always,from", and "always,tofrom" cases.


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: Split out some tests from builtins-20.c

2015-10-19 Thread Jeff Law

On 10/15/2015 07:18 AM, Richard Sandiford wrote:

Stripping unnecessary sign ops at the gimple level means that we're
no longer able to optimise:

   if (cos(y<10 ? -fabs(x) : tan(x<20 ? -x : -fabs(y)))
   != cos(y<10 ? x : tan(x<20 ? x : y)))
 link_error ();

because we're currently not able to fold away the equality in:

int
f1 (double x, double y)
{
   double z1 = __builtin_cos(y<10 ? x : __builtin_tan(x<20 ? x : y));
   double z2 = __builtin_cos(y<10 ? x : __builtin_tan(x<20 ? x : y));
   return z1 == z2;
}

The missed fold is being tracked as PR 67975.  This patch splits the
test out into a separate file so that we can XFAIL it until the PR
is fixed.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/testsuite/
* gcc.dg/builtins-20.c: Move some tests to...
* gcc.dg/builtins-86.c: ...this new file.
Yes.  I just went through this in a totally unrelated area.  I'd much 
rather have the test split out and xfailed.


jeff



Re: [PATCH v10][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-10-19 Thread Bernd Schmidt

On 01/04/1970 01:02 AM, Benedikt Huber wrote:

This tenth revision of the patch:
  * Removes unnecessary enum.


Please fix your clock.


Bernd



Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Biener
On Mon, Oct 19, 2015 at 2:38 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 15, 2015 at 3:17 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a pass that collects information that is common to all
>>> uses of an SSA name X and back-propagates that information up the statements
>>> that generate X.  The general idea is to use the information to simplify
>>> instructions (rather than a pure DCE) so I've simply called it
>>> tree-ssa-backprop.c, to go with tree-ssa-forwprop.c.
>>>
>>> At the moment the only use of the pass is to remove unnecessry sign
>>> operations, so that it's effectively a global version of
>>> fold_strip_sign_ops.  I'm hoping it could be extended in future to
>>> record which bits of an integer are significant.  There are probably
>>> other potential uses too.
>>>
>>> A later patch gets rid of fold_strip_sign_ops.
>>>
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>> OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * doc/invoke.texi (-fdump-tree-backprop, -ftree-backprop): Document.
>>> * Makefile.in (OBJS): Add tree-ssa-backprop.o.
>>> * common.opt (ftree-backprop): New option.
>>> * fold-const.h (negate_mathfn_p): Declare.
>>> * fold-const.c (negate_mathfn_p): Make public.
>>> * timevar.def (TV_TREE_BACKPROP): New.
>>> * tree-passes.h (make_pass_backprop): Declare.
>>> * passes.def (pass_backprop): Add.
>>> * tree-ssa-backprop.c: New file.
>>>
>>> gcc/testsuite/
>>> * gcc.dg/tree-ssa/backprop-1.c, gcc.dg/tree-ssa/backprop-2.c,
>>> gcc.dg/tree-ssa/backprop-3.c, gcc.dg/tree-ssa/backprop-4.c,
>>> gcc.dg/tree-ssa/backprop-5.c, gcc.dg/tree-ssa/backprop-6.c: New 
>>> tests.
>>>
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>>> index 783e4c9..69e669d 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -1445,6 +1445,7 @@ OBJS = \
>>> tree-switch-conversion.o \
>>> tree-ssa-address.o \
>>> tree-ssa-alias.o \
>>> +   tree-ssa-backprop.o \
>>> tree-ssa-ccp.o \
>>> tree-ssa-coalesce.o \
>>> tree-ssa-copy.o \
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 5060208..5aef625 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -2364,6 +2364,10 @@ ftree-pta
>>>  Common Report Var(flag_tree_pta) Optimization
>>>  Perform function-local points-to analysis on trees.
>>>
>>> +ftree-backprop
>>> +Common Report Var(flag_tree_backprop) Init(1) Optimization
>>> +Enable backward propagation of use properties at the tree level.
>>
>> Don't add new -ftree-* "tree" doesn't add any info for our users.  I'd
>> also refer to SSA level rather than "tree" level.  Not sure if -fbackprop
>> is good, but let's go for it.
>
> OK.
>
>>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>>> index de45a2c..7f00e72 100644
>>> --- a/gcc/fold-const.c
>>> +++ b/gcc/fold-const.c
>>> @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
>> warn_strict_overflow_code wc)
>>>  /* Return true if the built-in mathematical function specified by CODE
>>> is odd, i.e. -f(x) == f(-x).  */
>>>
>>> -static bool
>>> +bool
>>>  negate_mathfn_p (enum built_in_function code)
>>>  {
>>>switch (code)
>>
>> Belongs more to builtins.[ch] if exported.
>
> The long-term plan is to abstract away whether it's a built-in function
> or an internal function, in which case I hope to have a single predicate
> that handles both.  I'm not sure where the code should go after that change.
> Maybe a new file?

Hmm, we'll see.  So just leave it in fold-const.c for now.

>>> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
>>> index ee74dc8..4d5b24b 100644
>>> --- a/gcc/fold-const.h
>>> +++ b/gcc/fold-const.h
>>> @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
>>>  extern tree exact_inverse (tree, tree);
>>>  extern tree const_unop (enum tree_code, tree, tree);
>>>  extern tree const_binop (enum tree_code, tree, tree, tree);
>>> +extern bool negate_mathfn_p (enum built_in_function);
>>>
>>>  /* Return OFF converted to a pointer offset type suitable as offset for
>>> POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index dc3f44c..36d2b3b 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
>>>/* After CCP we rewrite no longer addressed locals into SSA
>>>  form if possible.  */
>>>NEXT_PASS (pass_complete_unrolli);
>>> +  NEXT_PASS (pass_backprop);
>>>NEXT_PASS (pass_phiprop);
>>>NEXT_PASS (pass_forwprop);
>>
>> Any reason to not put this later?  I was thinking before reassoc.
>
> I think we're relying on FRE to notice the redundancy in the
> builtins-*.c tests, once this pass has converted the version
> 

Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov


On 19/10/15 15:31, Segher Boessenkool wrote:

Hi Kyrill,


Hi Segher,



On Mon, Oct 19, 2015 at 02:57:54PM +0100, Kyrill Tkachov wrote:

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.

What happens if the constant pool reference is actually the better
code, do we still generate that?


In that case I think the previous calls to simplify_const_binary_operation and
simplify_binary_operation_1 should have returned a non-NULL rtx.

Kyrill





Segher





Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-19 Thread Kyrill Tkachov

Hi Bernd,

On 19/10/15 15:31, Bernd Schmidt wrote:

On 10/19/2015 03:57 PM, Kyrill Tkachov wrote:

This second patch teaches simplify_binary_operation to return the
dereferenced
constants from the constant pool in the binary expression if other
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
   return a * 32.0f;
}

to generate the code:
foo:
 fcvtzs  w0, s0, #5
 ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.


The only way I could see a problem with that if there are circumstances where 
the memory variant would simplify further. That doesn't seem highly likely, 
so...



I that were the case, I'd expect the earlier call to 
simplify_binary_operation_1 have returned a non-NULL rtx,
and the code in this patch would not come into play.


 * simplify-rtx.c (simplify_binary_operation): If either operand was
 a constant pool reference use them if all other simplifications
failed.


Ok.


Thanks,
I'll commit it when the first (aarch64-specific) patch is approved.

Kyrill




Bernd





Re: Add a pass to back-propagate use information

2015-10-19 Thread Richard Sandiford
Richard Biener  writes:
 diff --git a/gcc/fold-const.c b/gcc/fold-const.c
 index de45a2c..7f00e72 100644
 --- a/gcc/fold-const.c
 +++ b/gcc/fold-const.c
 @@ -319,7 +318,7 @@ fold_overflow_warning (const char* gmsgid, enum
>>> warn_strict_overflow_code wc)
  /* Return true if the built-in mathematical function specified by CODE
 is odd, i.e. -f(x) == f(-x).  */

 -static bool
 +bool
  negate_mathfn_p (enum built_in_function code)
  {
switch (code)
>>>
>>> Belongs more to builtins.[ch] if exported.
>>
>> The long-term plan is to abstract away whether it's a built-in function
>> or an internal function, in which case I hope to have a single predicate
>> that handles both.  I'm not sure where the code should go after that change.
>> Maybe a new file?
>
> Hmm, we'll see.  So just leave it in fold-const.c for now.

OK, thanks.

 diff --git a/gcc/fold-const.h b/gcc/fold-const.h
 index ee74dc8..4d5b24b 100644
 --- a/gcc/fold-const.h
 +++ b/gcc/fold-const.h
 @@ -173,6 +173,7 @@ extern tree sign_bit_p (tree, const_tree);
  extern tree exact_inverse (tree, tree);
  extern tree const_unop (enum tree_code, tree, tree);
  extern tree const_binop (enum tree_code, tree, tree, tree);
 +extern bool negate_mathfn_p (enum built_in_function);

  /* Return OFF converted to a pointer offset type suitable as offset for
 POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
 diff --git a/gcc/passes.def b/gcc/passes.def
 index dc3f44c..36d2b3b 100644
 --- a/gcc/passes.def
 +++ b/gcc/passes.def
 @@ -159,6 +159,7 @@ along with GCC; see the file COPYING3.  If not see
/* After CCP we rewrite no longer addressed locals into SSA
  form if possible.  */
NEXT_PASS (pass_complete_unrolli);
 +  NEXT_PASS (pass_backprop);
NEXT_PASS (pass_phiprop);
NEXT_PASS (pass_forwprop);
>>>
>>> Any reason to not put this later?  I was thinking before reassoc.
>>
>> I think we're relying on FRE to notice the redundancy in the
>> builtins-*.c tests, once this pass has converted the version
>> with redundant sign ops to make it look like the version without.
>> reassoc is likely to be too late.
>
> There is PRE after reassoc (run as FRE at -O1).

Ah, OK.  It looks like that also runs after sincos though, whereas
I think we want an FRE between this pass and sincos.

>> I also thought it should go before rather than after some instance
>> of forwprop because the pass might expose more forward folding
>> opportunities.  E.g. if the sign of A = -B * B doesn't matter,
>> we'll end up with A = B * B, which might be foldable with uses of A.
>> It seems less likely that forwprop would expose more backprop
>> opportunities.
>
> Indeed.  I was asking because backprop runs after inlining but
> with nearly no effective scalar cleanup after it to cleanup after
> inlining.
>
> In principle it only depends on some kind of DCE (DSE?) to avoid
> false uses, right?

Yeah, that sounds right.  Complex control-flow leading up to uses
shouldn't be a problem, but phantom uses would be a blocker.
>
> It's probably ok where you put it, I just wanted to get an idea of your
> reasoning.
>
>>
 +/* Make INFO describe all uses of RHS in ASSIGN.  */
 +
 +void
 +backprop::process_assign_use (gassign *assign, tree rhs, usage_info *info)
 +{
 +  tree lhs = gimple_assign_lhs (assign);
 +  switch (gimple_assign_rhs_code (assign))
 +{
 +case ABS_EXPR:
 +  /* The sign of the input doesn't matter.  */
 +  info->flags.ignore_sign = true;
 +  break;
 +
 +case COND_EXPR:
 +  /* For A = B ? C : D, propagate information about all uses of A
 +to B and C.  */
 +  if (rhs != gimple_assign_rhs1 (assign))
 +   if (const usage_info *lhs_info = lookup_operand (lhs))
>>>
>>> Use && instead of nested if
>>
>> That means introducing an extra level of braces just for something
>> that that isn't needed by the first statement, i.e.:
>>
>> {
>>   const usage_info *lhs_info;
>>   if (rhs != gimple_assign_rhs1 (assign)
>>   && (lhs_info = lookup_operand (lhs)))
>> *info = *lhs_info;
>>   break;
>> }
>>
>> There also used to be a strong preference for not embedding assignments
>> in && and || conditions.
>>
>> If there had been some other set-up for the lookup_operand call, we
>> would have had:
>>
>>   if (rhs != gimple_assign_rhs1 (assign))
>> {
>>   ...
>>   if (const usage_info *lhs_info = lookup_operand (lhs))
>> ..
>> }
>>
>> and presumably that would have been OK.  So if the original really isn't,
>> acceptable, I'd rather write it as:
>>
>>   if (rhs != gimple_assign_rhs1 (assign))
>> {
>>   const usage_info *lhs_info = lookup_operand (lhs);
>>   if 

Re: Fix prototype for print_insn in rtl.h

2015-10-19 Thread Jeff Law

On 10/15/2015 10:28 AM, Andrew MacLeod wrote:

On 10/13/2015 11:32 AM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

2015-10-13  Nikolai Bozhenov

 * gcc/rtl.h (print_insn): fix prototype

Installed on the trunk after bootstrap & regression test.

jeff


Sorry, a little late to the party.. but why is print_insn even in
rtl.h?  it seems that sched-vis.c is the only thing that uses it...

Then let's move it to sched-int.h, unless there's some good reason not to.

jeff


Re: [mask-vec_cond, patch 3/2] SLP support

2015-10-19 Thread Jeff Law

On 10/19/2015 05:21 AM, Ilya Enkovich wrote:

Hi,

This patch adds missing support for cond_expr with no embedded comparison in 
SLP.  No new test added because vec cmp SLP test becomes (due to changes in 
bool patterns by the first patch) a regression test for this patch.  Does it 
look OK?

Thanks,
Ilya
--
gcc/

2015-10-19  Ilya Enkovich  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow
cond_exp with no embedded comparison.
(vect_build_slp_tree_1): Likewise.
Is it even valid gimple to have a COND_EXPR that is anything other than 
a conditional?


From looking at gimplify_cond_expr, it looks like we could have a 
SSA_NAME that's a bool as the conditional.  Presumably we're allowing a 
vector of bools as the conditional once we hit the vectorizer, which 
seems fairly natural.


OK.  Please install when the prerequisites are installed.

Thanks,
jeff



[c++-delayed-folding] Introduce convert_to_pointer_nofold

2015-10-19 Thread Marek Polacek
This patch introduces convert_to_pointer_nofold; a variant that only folds
CONSTANT_CLASS_P expressions.  In the C++ FE, convert_to_pointer was only used
in cp_convert_to_pointer which is only used in cp_convert.  Instead of
introducing many _nofold variants, I just made cp_convert_to_pointer use the
convert_to_pointer_nofold.

Bootstrapped/regtested on x86_64-linux, ok for branch?

diff --git gcc/convert.c gcc/convert.c
index 1ce8099..79b4138 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -39,10 +39,11 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Convert EXPR to some pointer or reference type TYPE.
EXPR must be pointer, reference, integer, enumeral, or literal zero;
-   in other cases error is called.  */
+   in other cases error is called.  If FOLD_P is true, try to fold the
+   expression.  */
 
-tree
-convert_to_pointer (tree type, tree expr)
+static tree
+convert_to_pointer_1 (tree type, tree expr, bool fold_p)
 {
   location_t loc = EXPR_LOCATION (expr);
   if (TREE_TYPE (expr) == type)
@@ -58,10 +59,21 @@ convert_to_pointer (tree type, tree expr)
addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (type));
addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (expr)));
 
-   if (to_as == from_as)
- return fold_build1_loc (loc, NOP_EXPR, type, expr);
+   if (fold_p)
+ {
+   if (to_as == from_as)
+ return fold_build1_loc (loc, NOP_EXPR, type, expr);
+   else
+ return fold_build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type,
+ expr);
+ }
else
- return fold_build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type, expr);
+ {
+   if (to_as == from_as)
+ return build1_loc (loc, NOP_EXPR, type, expr);
+   else
+ return build1_loc (loc, ADDR_SPACE_CONVERT_EXPR, type, expr);
+ }
   }
 
 case INTEGER_TYPE:
@@ -75,20 +87,43 @@ convert_to_pointer (tree type, tree expr)
unsigned int pprec = TYPE_PRECISION (type);
unsigned int eprec = TYPE_PRECISION (TREE_TYPE (expr));
 
-   if (eprec != pprec)
- expr = fold_build1_loc (loc, NOP_EXPR,
- lang_hooks.types.type_for_size (pprec, 0),
- expr);
+   if (eprec != pprec)
+ {
+   tree totype = lang_hooks.types.type_for_size (pprec, 0);
+   if (fold_p)
+ expr = fold_build1_loc (loc, NOP_EXPR, totype, expr);
+   else
+ expr = build1_loc (loc, NOP_EXPR, totype, expr);
+ }
   }
 
-  return fold_build1_loc (loc, CONVERT_EXPR, type, expr);
+  if (fold_p)
+   return fold_build1_loc (loc, CONVERT_EXPR, type, expr);
+  return build1_loc (loc, CONVERT_EXPR, type, expr);
 
 default:
   error ("cannot convert to a pointer type");
-  return convert_to_pointer (type, integer_zero_node);
+  return convert_to_pointer_1 (type, integer_zero_node, fold_p);
 }
 }
 
+/* A wrapper around convert_to_pointer_1 that always folds the
+   expression.  */
+
+tree
+convert_to_pointer (tree type, tree expr)
+{
+  return convert_to_pointer_1 (type, expr, true);
+}
+
+/* A wrapper around convert_to_pointer_1 that only folds the
+   expression if it is CONSTANT_CLASS_P.  */
+
+tree
+convert_to_pointer_nofold (tree type, tree expr)
+{
+  return convert_to_pointer_1 (type, expr, CONSTANT_CLASS_P (expr));
+}
 
 /* Convert EXPR to some floating-point type TYPE.
 
diff --git gcc/convert.h gcc/convert.h
index ac78f95..24fa6bf 100644
--- gcc/convert.h
+++ gcc/convert.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree convert_to_integer (tree, tree);
 extern tree convert_to_integer_nofold (tree, tree);
 extern tree convert_to_pointer (tree, tree);
+extern tree convert_to_pointer_nofold (tree, tree);
 extern tree convert_to_real (tree, tree);
 extern tree convert_to_fixed (tree, tree);
 extern tree convert_to_complex (tree, tree);
diff --git gcc/cp/cvt.c gcc/cp/cvt.c
index 0a30270..cb73bb7 100644
--- gcc/cp/cvt.c
+++ gcc/cp/cvt.c
@@ -241,7 +241,7 @@ cp_convert_to_pointer (tree type, tree expr, tsubst_flags_t 
complain)
   gcc_assert (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr)))
  == GET_MODE_SIZE (TYPE_MODE (type)));
 
-  return convert_to_pointer (type, expr);
+  return convert_to_pointer_nofold (type, expr);
 }
 
   if (type_unknown_p (expr))

Marek


[c++-delayed-folding] Use convert_to_integer_nofold

2015-10-19 Thread Marek Polacek
As discussed in the other thread.  This is a patch I intend to commit to the
branch when I can actually commit stuff.

diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 9a8caa7..d15d6f9 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -5577,7 +5577,8 @@ finish_omp_clauses (tree clauses)
  if (OMP_CLAUSE_SCHEDULE_KIND (c)
  == OMP_CLAUSE_SCHEDULE_CILKFOR)
{
- t = convert_to_integer (long_integer_type_node, t);
+ t = convert_to_integer_nofold (long_integer_type_node,
+t);
  if (t == error_mark_node)
{
  remove = true;

Marek


Re: config header file reduction patch checked in.

2015-10-19 Thread Andrew MacLeod

On 10/18/2015 05:31 AM, Iain Sandoe wrote:

Hi Andrew,

On 16 Oct 2015, at 20:49, Andrew MacLeod wrote:


On 10/12/2015 04:04 AM, Jeff Law wrote:

On 10/08/2015 07:37 AM, Andrew MacLeod wrote:

On 10/07/2015 06:02 PM, Jeff Law wrote:

I'm slightly concerned about the darwin, windows and solaris bits.  The former 
primarily because Darwin has been a general source of pain, and in the others 
because I'm not sure the cross testing will exercise that code terribly much.

I'll go ahead and approve all the config/ bits.  Please be on the lookout for 
any fallout.

I'll try and get into more of the other patches tomorrow.



OK, I've checked in the config changes.  I rebuilt all the cross compilers for 
the 200+ targets, and they still build.. as well as bootstrapping on 
x86_64-pc-linux-gnu with no regressions.

So. If any one runs into a native build issue you can either add the required 
header back in, or back out the file for your port, and I'll look into why 
something happened.   The only thing I can imagine is files that have 
conditional compilation based on a  macro that is only ever defined on a native 
build command line or headers.  Its unlikely... but possible.

I've applied the following to fix Darwin native bootstrap.
AFAICT (from reading the other thread on the re-ordering tools) putting the 
diagnostics header at the end of the list is the right thing to do.

FWIW,
a) of course, Darwin exercises ObjC/ObjC++ in *both* NeXT and GNU mode - so 
those are pretty well covered by this too.

b) darwin folks will usually do their best to test any patch that you think is 
specifically risky - but you need to ask, because we have (very) limited 
resources in time and hardware ;-) ...

thanks for tidying things up!
(I, for one, think that improving the separation of things is worth a small 
amount of pain along the way).

cheers,
Iain

gcc/

+2015-10-18  Iain Sandoe  
+
+   * config/darwin-driver.h: Adjust includes to add diagnostic-core.
+



interesting that none of the cross builds need diagnostics-core.h. I see 
it used in 7 different targets.  Must be something on the native build 
command line that is defined which causes it to be needed.


Anyway, Thanks for fixing it.

btw, that should be darwin-driver.c not .h  in the changelog right?

Andrew

  2015-10-16  Trevor Saunders  
  
  	* lra-constraints.c (add_next_usage_insn): Change argument type

Index: gcc/config/darwin-driver.c
===
--- gcc/config/darwin-driver.c  (revision 228938)
+++ gcc/config/darwin-driver.c  (working copy)
@@ -23,6 +23,7 @@
  #include "coretypes.h"
  #include "tm.h"
  #include "opts.h"
+#include "diagnostic-core.h"
  
  #ifndef CROSS_DIRECTORY_STRUCTURE

  #include 





[gomp4] Merge gomp-4_1-branch r224607 (2015-06-18) into gomp-4_0-branch

2015-10-19 Thread Thomas Schwinge
Hi!

I have recently merged trunk r228776 (2015-10-13) into gomp-4_0-branch,
which is the trunk revision before Jakub's big "Merge from
gomp-4_1-branch to trunk",
.
Instead of attempting to merge that one in one go -- that is, to avoid
having to deal with a ton of merge conflicts at once, and to allow for
easier understanding of individual changes/regressions -- in the
following I'll gradually merge individual "blocks" of all the
gomp-4_1-branch changes into gomp-4_0-branch.  Committed to
gomp-4_0-branch in r228972:

commit 3931662876141de5c18d0c5e02c156eef5286bee
Merge: fdc2c87 2b9f218
Author: tschwinge 
Date:   Mon Oct 19 15:38:31 2015 +

svn merge -r 222404:224607 
svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_1-branch


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228972 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße,
 Thomas


signature.asc
Description: PGP signature


[gomp4.5] Add checking that OpenMP loop iterators aren't referenced in the bounds/step expressions

2015-10-19 Thread Jakub Jelinek
Hi!

In 4.0 and earlier, there has just been a restriction that the lb, b and
incr expressions in the syntax (low/high bounds and step) can't change their
values during the loop, but in OpenMP 4.5 we have even stronger restriction,
the iterators may not be referenced in there at all, so even if you ignore
the value, or multiply or and with 0, subtract it from itself etc., it is
still invalid.  That means the compiler can actually easily diagnose invalid
loops.

2015-10-19  Jakub Jelinek  

gcc/
* tree.h (OMP_FOR_ORIG_DECLS): Use OMP_LOOP_CHECK instead of
OMP_FOR_CHECK.  Remove comment.
* tree.def (OMP_SIMD, CILK_SIMD, CILK_FOR, OMP_DISTRIBUTE,
OMP_TASKLOOP, OACC_LOOP): Add OMP_FOR_ORIG_DECLS argument.
gcc/c-family/
* c-common.h (c_omp_check_loop_iv, c_omp_check_loop_iv_exprs): New
prototypes.
* c-omp.c (c_finish_omp_for): Store OMP_FOR_ORIG_DECLS always.
Don't call add_stmt here.
(struct c_omp_check_loop_iv_data): New type.
(c_omp_check_loop_iv_r, c_omp_check_loop_iv,
c_omp_check_loop_iv_exprs): New functions.
gcc/c/
* c-parser.c (c_parser_omp_for_loop): Call c_omp_check_loop_iv.
Call add_stmt here.
gcc/cp/
* cp-tree.h (finish_omp_for): Add ORIG_INITS argument.
* parser.c (cp_parser_omp_for_loop_init): Add ORIG_INIT argument,
initialize it.
(cp_parser_omp_for_loop): Compute orig_inits, pass it's address
to finish_omp_for.
* pt.c (tsubst_expr): Use OMP_FOR_ORIG_DECLS for all
OpenMP/OpenACC/Cilk+ looping constructs.  Adjust finish_omp_for
caller.
* semantics.c (handle_omp_for_class_iterator): Add ORIG_DECLS
argument.  Call c_omp_check_loop_iv_exprs on cond.
(finish_omp_for): Add ORIG_INITS argument.  Call
c_omp_check_loop_iv_exprs on ORIG_INITS elements.  Adjust
handle_omp_for_class_iterator caller.  Call c_omp_check_loop_iv.
Call add_stmt.
gcc/testsuite/
* c-c++-common/gomp/pr67521.c: Add dg-error directives.
* gcc.dg/gomp/loop-1.c: New test.
* g++.dg/gomp/pr38639.C (foo): Adjust dg-error.
(bar): Remove dg-message.
* g++.dg/gomp/loop-1.C: New test.
* g++.dg/gomp/loop-2.C: New test.
* g++.dg/gomp/loop-3.C: New test.

--- gcc/tree.h.jj   2015-10-14 10:24:55.0 +0200
+++ gcc/tree.h  2015-10-19 12:01:11.390680056 +0200
@@ -1264,8 +1264,7 @@ extern void protected_set_expr_location
 #define OMP_FOR_COND(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 3)
 #define OMP_FOR_INCR(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 4)
 #define OMP_FOR_PRE_BODY(NODE)TREE_OPERAND (OMP_LOOP_CHECK (NODE), 5)
-/* Note that this is only available for OMP_FOR, hence OMP_FOR_CHECK.  */
-#define OMP_FOR_ORIG_DECLS(NODE)   TREE_OPERAND (OMP_FOR_CHECK (NODE), 6)
+#define OMP_FOR_ORIG_DECLS(NODE)   TREE_OPERAND (OMP_LOOP_CHECK (NODE), 6)
 
 #define OMP_SECTIONS_BODY(NODE)TREE_OPERAND (OMP_SECTIONS_CHECK (NODE), 0)
 #define OMP_SECTIONS_CLAUSES(NODE) TREE_OPERAND (OMP_SECTIONS_CHECK (NODE), 1)
--- gcc/tree.def.jj 2015-10-14 10:25:43.0 +0200
+++ gcc/tree.def2015-10-19 12:00:50.282982246 +0200
@@ -1101,28 +1101,28 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_s
 DEFTREECODE (OMP_FOR, "omp_for", tcc_statement, 7)
 
 /* OpenMP - #pragma omp simd [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_SIMD, "omp_simd", tcc_statement, 7)
 
 /* Cilk Plus - #pragma simd [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_SIMD, "cilk_simd", tcc_statement, 7)
 
 /* Cilk Plus - _Cilk_for (..)
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (CILK_FOR, "cilk_for", tcc_statement, 7)
 
 /* OpenMP - #pragma omp distribute [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_DISTRIBUTE, "omp_distribute", tcc_statement, 7)
 
 /* OpenMP - #pragma omp taskloop [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OMP_TASKLOOP, "omp_taskloop", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OMP_TASKLOOP, "omp_taskloop", tcc_statement, 7)
 
 /* OpenMP - #pragma acc loop [clause1 ... clauseN]
-   Operands like operands 1-6 of OMP_FOR.  */
-DEFTREECODE (OACC_LOOP, "oacc_loop", tcc_statement, 6)
+   Operands like for OMP_FOR.  */
+DEFTREECODE (OACC_LOOP, "oacc_loop", tcc_statement, 7)
 
 /* OpenMP - #pragma omp teams [clause1 ... clauseN]
Operand 0: OMP_TEAMS_BODY: Teams body.
--- gcc/c-family/c-common.h.jj  

Re: [Patch] Add OPT_Wattributes to ignored attributes on template args

2015-10-19 Thread Ryan Mansfield

Ping:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02256.html

Regards,

Ryan Mansfield

On 15-09-29 04:21 PM, Ryan Mansfield wrote:

Hi,

In canonicalize_type_argument attributes are being discarded with a
warning. Should it be added to OPT_Wattributes?

2015-09-29  Ryan Mansfield  

 * pt.c (canonicalize_type_argument): Use OPT_Wattributes in
warning.


Re: [PATCH] Move cproj simplification to match.pd

2015-10-19 Thread Christophe Lyon
On 19 October 2015 at 15:54, Richard Biener  wrote:
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>

Hi Richard,

This patch caused arm and aarch64 builds of newlib to cause ICEs:
In file included from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0,
 from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13:
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:
In function '_mktm_r':
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9:
internal compiler error: Segmentation fault
 _DEFUN (_mktm_r, (tim_p, res, is_gmtime),
0xa90205 crash_signal
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353
0x7b3b0c tree_class_check
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055
0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025
0x814053 gimple_phi_nonnegative_warnv_p

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239
0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264
0x7b5c94 tree_expr_nonnegative_p(tree_node*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325
0xe2f657 gimple_simplify_108

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116
0xe3060d gimple_simplify_TRUNC_MOD_EXPR

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762
0xe0809b gimple_simplify

/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389
0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*,
tree_node**, tree_node* (*)(tree_node*))

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193
0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
tree_node* (*)(tree_node*), tree_node* (*)(tree_node*))

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762
0x81c694 fold_stmt_1

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605
0xad0f6c replace_uses_by(tree_node*, tree_node*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835
0xad1a2f gimple_merge_blocks

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921
0x67d325 merge_blocks(basic_block_def*, basic_block_def*)

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776
0xae06da cleanup_tree_cfg_bb

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654
0xae1118 cleanup_tree_cfg_1

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686
0xae1118 cleanup_tree_cfg_noloop

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738
0xae1118 cleanup_tree_cfg()

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793
0x9c5c94 execute_function_todo

/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920
Please submit a full bug report,

This happens for instance with GCC configured
--target arm-none-eabi
--with-cpu cortex-a9

You can download logs of a failed build from
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html

Sorry, I'm out of office for one week, I can't produce further details.

Christophe


> Richard.
>
> 2015-10-19  Richard Biener  
>
> * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
> (gimple_stmt_nonnegative_warnv_p): Use it.
> * match.pd (CPROJ): New operator list.
> (cproj (complex ...)): Move simplifications from ...
> * builtins.c (fold_builtin_cproj): ... here.
>
> * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.
>
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c   (revision 228877)
> +++ gcc/gimple-fold.c   (working copy)
> @@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
> strict_overflow_p, depth);
>  }
>
> +/* Return true if return value of call STMT is known to be non-negative.
> +   If the return value is based on the assumption that signed overflow is
> +   undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
> +   *STRICT_OVERFLOW_P.  DEPTH is the current nesting depth of the query.  */
> +
> +static bool
> +gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
> +   int depth)
> +{
> +  for (unsigned i = 0; i < gimple_phi_num_args 

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Jason Merrill

On 10/19/2015 02:31 AM, Marek Polacek wrote:

On Fri, Oct 16, 2015 at 02:07:51PM -1000, Jason Merrill wrote:

On 10/16/2015 07:35 AM, Marek Polacek wrote:

This code path seems to be for pushing a conversion down into a binary
expression.  We shouldn't do this at all when we aren't folding.


I tend to agree, but this case is tricky.  What's this code about is
e.g. for

int
fn (long p, long o)
{
   return p + o;
}

we want to narrow the operation and do the addition on unsigned ints and then
convert to int.  We do it here because we're still missing the
promotion/demotion pass on GIMPLE (PR45397 / PR47477).  Disabling this
optimization here would regress a few testcases, so I kept the code as it was.
Thoughts?


That makes sense, but please add a comment referring to one of those PRs and
also add a note to the PR about this place.  OK with that change.


Done.  But I can't seem to commit the patch to the c++-delayed-folding
branch; is that somehow restricted?  I'm getting:

svn: E170001: Commit failed (details follow):
svn: E170001: Authorization failed
svn: E170001: Your commit message was left in a temporary file:
svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'

and I've checked out the branch using
svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/


You need to use svn+ssh:// rather than svn:// if you want to be able to 
commit.  From svnwrite.html:


It is also possible to convert an existing SVN tree to use SSH by using 
svn switch --relocate:


svn switch --relocate svn://gcc.gnu.org/svn/gcc 
svn+ssh://usern...@gcc.gnu.org/svn/gcc


Jason



PING: [PATCH] X86: Optimize access to globals in PIE with copy reloc

2015-10-19 Thread H.J. Lu
PING.


-- Forwarded message --
From: H.J. Lu 
Date: Wed, Jul 1, 2015 at 5:11 AM
Subject: [PATCH] X86: Optimize access to globals in PIE with copy reloc
To: gcc-patches@gcc.gnu.org


Normally, with PIE, GCC accesses globals that are extern to the module
using GOT.  This is two instructions, one to get the address of the global
from GOT and the other to get the value.  Examples:

---
extern int a_glob;
int
main ()
{
  return a_glob;
}
---

With PIE, the generated code accesses global via GOT using two memory
loads:

movqa_glob@GOTPCREL(%rip), %rax
movl(%rax), %eax

for 64-bit or

movla_glob@GOT(%ecx), %eax
movl(%eax), %eax

for 32-bit.

Some experiments on google and SPEC CPU benchmarks show that the extra
instruction affects performance by 1% to 5%.

Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that
the global will be defined in the executable.  For globals that are
truly extern (come from shared objects), the linker will create copy
relocations and have them defined in the executable.  Result is that
no global access needs to go through GOT and hence improves performance.
We can generate

movla_glob(%rip), %eax

for 64-bit and

movla_glob@GOTOFF(%eax), %eax

for 32-bit.  This optimization only applies to undefined non-weak
non-TLS global data.  Undefined weak global or TLS data access still
must go through GOT.

This patch reverts legitimate_pic_address_disp_p change made in revision
218397, which only applies to x86-64.  Instead, this patch updates
targetm.binds_local_p to indicate if undefined non-weak non-TLS global
data is defined locally in PIE.  It also introduces a new target hook,
binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
default, binds_tls_local_p is the same as binds_local_p which assumes
TLS variable.

This patch checks if 32-bit and 64-bit linkers support PIE with copy
reloc at configure time.  64-bit linker is enabled in binutils 2.25
and 32-bit linker is enabled in binutils 2.26.  This optimization
is enabled only if the linker support is available.

Since copy relocation in PIE is incompatible with DSO created by
-Wl,-Bsymbolic, this patch also adds a new option, -fsymbolic, which
controls how references to global symbols are bound.  The -fsymbolic
option binds references to global symbols to the local definitions
and external references globally.  It avoids copy relocations in PIE
and optimizes global symbol references in shared library created
by -Wl,-Bsymbolic.

gcc/

PR target/65846
PR target/65886
* configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
(HAVE_LD_X86_64_PIE_COPYRELOC): This.
(HAVE_LD_386_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
linker supports PIE with copy reloc.
* output.h (default_binds_tls_local_p): New.
(default_binds_local_p_3): Add 2 bool arguments.
* target.def (binds_tls_local_p): New target hook.
* varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
with targetm.binds_tls_local_p.
(default_binds_local_p_3): Add a bool argument to indicate TLS
variable and a bool argument to indicate if an undefined non-TLS
non-weak data is local.  Double check TLS variable.  If an
undefined non-TLS non-weak data is local, treat it as defined
locally.
(default_binds_local_p): Pass true and false to
default_binds_local_p_3.
(default_binds_local_p_2): Likewise.
(default_binds_local_p_1): Likewise.
(default_binds_tls_local_p): New.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/i386.c (legitimate_pic_address_disp_p): Don't
check HAVE_LD_PIE_COPYRELOC here.
(ix86_binds_local): New.
(ix86_binds_tls_local_p): Likewise.
(ix86_binds_local_p): Use it.
(TARGET_BINDS_TLS_LOCAL_P): New.
* doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.

gcc/testsuite/

PR target/65846
PR target/65886
* gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
* gcc.target/i386/pie-copyrelocs-2.c: Likewise.
* gcc.target/i386/pie-copyrelocs-3.c: Likewise.
* gcc.target/i386/pie-copyrelocs-4.c: Likewise.
* gcc.target/i386/pr32219-9.c: Likewise.
* gcc.target/i386/pr32219-10.c: New file.
* gcc.target/i386/pr65886-1.c: Likewise.
* gcc.target/i386/pr65886-2.c: Likewise.
* gcc.target/i386/pr65886-3.c: Likewise.
* gcc.target/i386/pr65886-4.c: Likewise.
* gcc.target/i386/pr65886-4.c: Likewise.
* gcc.target/i386/pr65886-5.c: Likewise.

* lib/target-supports.exp (check_effective_target_pie_copyreloc):
Check HAVE_LD_X86_64_PIE_COPYRELOC and HAVE_LD_386_PIE_COPYRELOC
instead of 

[PATCH][AArch64] Fix insn types

2015-10-19 Thread Evandro Menezes

The type assigned to some insn definitions was seemingly not correct:

 * "movi %d0, %1" was of type "fmov"
 * "fmov %s0, wzr" was of type "fconstd"
 * "mov %0, {-1,1}" were of type "csel"

This patch changes their types to:

 * "movi %d0, %1" to type "neon_move"
 * "fmov %s0, wzr" to type "f_mcr"
 * "mov %0, {-1,1}" to type "mov_imm"

Please, commit if it's alright.

Thank you,

--
Evandro Menezes

>From 7e7057bf65befca9ff24ab2401bc2ce84a48c23a Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 19 Oct 2015 15:19:35 -0500
Subject: [PATCH] [AArch64] Fix insn types

The type assigned to some insn definitions was not correct.

gcc/
	* config/aarch64/aarch64.md
	(*movdi_aarch64): Change the type of "movi %d0, %1" to "neon_move".
	(*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr".
	(*cmov_insn): Change the types of "mov %0, {-1,1}" to "mov_imm".
	(*cmovsi_insn_uxtw): Idem.
---
 gcc/config/aarch64/aarch64.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 208f58f..5b7f2fd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1130,7 +1130,7 @@
ldrh\\t%w0, %1
strh\\t%w1, %0
mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\
+  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
  f_loads,f_stores,load1,store1,mov_reg")
(set_attr "simd" "yes,yes,yes,*,*,*,*,*")
(set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
@@ -1193,7 +1193,7 @@
ldp\\t%0, %H0, %1
stp\\t%1, %H1, %0
stp\\txzr, xzr, %0"
-  [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,fconstd,\
+  [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
  f_loadd,f_stored,load2,store2,store2")
(set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
(set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*,*")
@@ -2984,7 +2984,7 @@
csinc\\t%0, %4, zr, %M1
mov\\t%0, -1
mov\\t%0, 1"
-  [(set_attr "type" "csel")]
+  [(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
 )
 
 ;; zero_extend version of above
@@ -3007,7 +3007,7 @@
csinc\\t%w0, %w4, wzr, %M1
mov\\t%w0, -1
mov\\t%w0, 1"
-  [(set_attr "type" "csel")]
+  [(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
 )
 
 (define_insn "*cmovdi_insn_uxtw"
-- 
2.1.0.243.g30d45f7





Re: [PATCH] fortran/68019 -- Remove an assert() that prevents error message

2015-10-19 Thread Steve Kargl
Thanks.  Patch committed to both 5-branch and trunk.

-- 
steve

On Mon, Oct 19, 2015 at 10:02:09PM +0200, Paul Richard Thomas wrote:
> Hi Steve,
> 
> Yes, this is OK for trunk. I suggest that it is so obvious that it
> should go into 5 branch as well.
> 
> Cheers
> 
> Paul
> 
> On 19 October 2015 at 21:13, Steve Kargl
>  wrote:
> > The attached patch removes an assert() that prevents gfortran from
> > issuing an error message.  Built and tested on x86_64-*-freebsd.
> > Althoug probably an "obviously correct" patch, OK to commit?
> >
> > 2015-10-19  Steven G. Kargl  
> >
> > PR fortran/68019
> > * decl.c (add_init_expr_to_sym): Remove an assert() to allow an 
> > error
> > message to be issued.
> >
> > 2015-10-19  Steven G. Kargl  
> >
> > PR fortran/68019
> > * gfortran.dg/pr68019.f90: new test.
> >
> > --
> > Steve
> 
> 
> 
> -- 
> Outside of a dog, a book is a man's best friend. Inside of a dog it's
> too dark to read.
> 
> Groucho Marx

-- 
Steve


Re: [PATCH] fortran/68019 -- Remove an assert() that prevents error message

2015-10-19 Thread Paul Richard Thomas
Hi Steve,

Yes, this is OK for trunk. I suggest that it is so obvious that it
should go into 5 branch as well.

Cheers

Paul

On 19 October 2015 at 21:13, Steve Kargl
 wrote:
> The attached patch removes an assert() that prevents gfortran from
> issuing an error message.  Built and tested on x86_64-*-freebsd.
> Althoug probably an "obviously correct" patch, OK to commit?
>
> 2015-10-19  Steven G. Kargl  
>
> PR fortran/68019
> * decl.c (add_init_expr_to_sym): Remove an assert() to allow an error
> message to be issued.
>
> 2015-10-19  Steven G. Kargl  
>
> PR fortran/68019
> * gfortran.dg/pr68019.f90: new test.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


Re: New power of 2 hash policy

2015-10-19 Thread François Dumont
Is this one ok ?

François


On 28/09/2015 21:16, François Dumont wrote:
> On 25/09/2015 15:28, Jonathan Wakely wrote:
>> @@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>> mutable std::size_t_M_next_resize;
>>>   };
>>>
>>> +  /// Range hashing function considering that second args is a power
>>> of 2.
>> Does this mean "assuming" not "considering"?
> I assume yes.
>
>>> +  struct _Mask_range_hashing
>>> +  {
>>> +typedef std::size_t first_argument_type;
>>> +typedef std::size_t second_argument_type;
>>> +typedef std::size_t result_type;
>>> +
>>> +result_type
>>> +operator()(first_argument_type __num,
>>> +   second_argument_type __den) const noexcept
>>> +{ return __num & (__den - 1); }
>>> +  };
>>> +
>>> +
>>> +  /// Helper type to compute next power of 2.
>>> +  template
>>> +struct _NextPower2
>>> +{
>>> +  static std::size_t
>>> +  _Get(std::size_t __n)
>>> +  {
>>> +std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
>>> +return __next |= __next >> _N;
>>> +  }
>>> +};
>>> +
>>> +  template<>
>>> +struct _NextPower2<1>
>>> +{
>>> +  static std::size_t
>>> +  _Get(std::size_t __n)
>>> +  { return __n |= __n >> 1; }
>>> +};
>> This doesn't seem to return the next power of 2, it returns one less.
>>
>> _NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
>> _NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.
>
> Yes, name is bad, that is just part of the algo you copy/paste below. I
> review implementation to have _NextPower2 do all the algo.
>
>>
>> I don't think this needs to be a recursive template, it can simply be
>> a function, can't it?
> I wanted code to adapt to any sizeof(std::size_t) without relying on
> some preprocessor checks. As you pointed out additional >> 32 on 32 bits
> or >> 64 on 64 bits wouldn't hurt but the recursive template just make
> sure that we don't do useless operations.
>
>>
>>> +  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
>>> +  /// operations.
>>> +  struct _Power2_rehash_policy
>>> +  {
>>> +using __has_load_factor = std::true_type;
>>> +
>>> +_Power2_rehash_policy(float __z = 1.0) noexcept
>>> +: _M_max_load_factor(__z), _M_next_resize(0) { }
>>> +
>>> +float
>>> +max_load_factor() const noexcept
>>> +{ return _M_max_load_factor; }
>>> +
>>> +// Return a bucket size no smaller than n (as long as n is not
>>> above the
>>> +// highest power of 2).
>> This says "no smaller than n" but it actually seems to guarantee
>> "greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
>> power of two.
> yes but this function is calling _NextPower2<>::_Get(n - 1) + 1, there
> is a minus one which make this comment valid as shown by newly
> introduced test.
>
>>> +std::size_t
>>> +_M_next_bkt(std::size_t __n) const
>>> +{
>>> +  constexpr auto __max_bkt
>>> += (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
>>> +
>>> +  std::size_t __res
>>> += _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;
>> You wouldn't need to add one to the result if the template actually
>> returned a power of two!
>>
>>> +  if (__res == 0)
>>> +__res = __max_bkt;
>>> +
>>> +  if (__res == __max_bkt)
>>> +// Set next resize to the max value so that we never try to
>>> rehash again
>>> +// as we already reach the biggest possible bucket number.
>>> +// Note that it might result in max_load_factor not being
>>> respected.
>>> +_M_next_resize = std::size_t(0) - 1;
>>> +  else
>>> +_M_next_resize
>>> +  = __builtin_floor(__res * (long double)_M_max_load_factor);
>>> +
>>> +  return __res;
>>> +}
>> What are the requirements for this function, "no smaller than n" or
>> "greater than n"?
> 'No smaller than n' like stated in the comment. However for big n it is
> not possible, even in the prime number based implementation. So I played
> with _M_next_resize to make sure that _M_next_bkt won't be called again
> as soon as the max bucket number has been reach.
>
>
>> If "no smaller than n" is correct then the algorithm you want is
>> "round up to nearest power of 2", which you can find here (I wrote
>> this earlier this year for some reason I can't remember now):
>>
>> https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h
>>
>> The non-recursive version is only a valid constexpr function in C++14,
>> but since you don't need a constexpr function you could just that,
>> extended to handle 64-bit:
>>
>>  std::size_t
>>  clp2(std::size_t n)
>>  {
>>std::uint_least64_t x = n;
>>// Algorithm from Hacker's Delight, Figure 3-3.
>>x = x - 1;
>>x = x | (x >> 1);
>>x = x | (x >> 2);
>>x = x | (x >> 4);
>>x = x | (x >> 8);
>>x = x | (x >>16);
>>x = x | (x >>32);
>>return x + 1;
>>  }
>>
>> We could avoid the last shift when sizeof(size_t) == 32, I don't know
>> if the 

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-19 Thread Jan Hubicka
Richard,
I missed your reply earlier today.
> > Therefore I would say that TYPE_CANONICAL determine mode modulo the fact 
> > that
> > incoplete variant of a complete type will have VOIDmode instead of complete
> > type's mode (during non-LTO).  That is why I allow mode changes for casts 
> > from
> > complete to incomplete.
> 
> Incomplete have VOIDmode, right?

Yes
> 
> > In longer run I think that every query to useless_type_conversion_p that
> > contains incomplete types is a confused query.  useless_type_conversion_p is
> > about operations on the value and there are no operations for incomplete 
> > type
> > (and function types).  I know that ipa-icf-gimple and the following code in
> > gimplify-stmt checks this frequently:
> >   /* The FEs may end up building ADDR_EXPRs early on a decl with
> >  an incomplete type.  Re-build ADDR_EXPRs in canonical form
> >  here.  */
> >   if (!types_compatible_p (TREE_TYPE (op0), TREE_TYPE (TREE_TYPE 
> > (expr
> > *expr_p = build_fold_addr_expr (op0);
> > Taking address of incomplete type or functions, naturally, makes sense.  We 
> > may
> > want to check something else here, like simply
> >TREE_TYPE (op0) != TREE_TYPE (TREE_TYPE (expr))
> > and once ipa-icf is cleanded up start sanity checking in 
> > usless_type_conversion
> > that we use it to force equality only on types that do have values.
> >
> > We also can trip it when checking TYPE_METHOD_BASETYPE which may be 
> > incomplete.
> > This is in the code checking useless_type_conversion on functions that I 
> > think
> > are confused querries anyway - we need the ABI matcher, I am looking into 
> > that.
> 
> Ok, so given we seem to be fine in practive with TYPE_MODE (type) ==
> TYPE_MODE (TYPE_CANONICAL (type))

Witht the exception of incopmlete variants a type. Then TYPE_CANONICAL may
be complete and !VOIDmode.  
But sure, i believe we ought to chase away the calls to useless_type_conversion
when one of types in incomplete.
> (whether that's a but or not ...) I'm fine with re-instantiating the
> mode check for
> aggregate types.  Please do that with
> 
> Index: gcc/gimple-expr.c
> ===
> --- gcc/gimple-expr.c   (revision 228963)
> +++ gcc/gimple-expr.c   (working copy)
> @@ -89,8 +89,7 @@ useless_type_conversion_p (tree outer_ty
> 
>/* Changes in machine mode are never useless conversions unless we
>   deal with aggregate types in which case we defer to later checks.  */
> -  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
> -  && !AGGREGATE_TYPE_P (inner_type))
> +  if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type))
>  return false;

OK, that is variant of the patch I had at beggining.  I will test it.
> 
>/* If both the inner and outer types are integral types, then the
> 
> Can we asses equal sizes when modes are non-BLKmode then?  Thus
> 
> @@ -270,10 +269,9 @@ useless_type_conversion_p (tree outer_ty
>   use the types in move operations.  */
>else if (AGGREGATE_TYPE_P (inner_type)
>&& TREE_CODE (inner_type) == TREE_CODE (outer_type))
> -return (!TYPE_SIZE (outer_type)
> -   || (TYPE_SIZE (inner_type)
> -   && operand_equal_p (TYPE_SIZE (inner_type),
> -   TYPE_SIZE (outer_type), 0)));
> +return (TYPE_MODE (outer_type) != BLKmode
> +   || operand_equal_p (TYPE_SIZE (inner_type),
> +   TYPE_SIZE (outer_type), 0));
> 
>else if (TREE_CODE (inner_type) == OFFSET_TYPE
>&& TREE_CODE (outer_type) == OFFSET_TYPE)
> 
> ?  Hoping for VOIDmode incomplete case.
Don't see why this would be a problem either.  I am going to start the testing 
of this variant.

Honza
> 
> Richard.
> 
> > Honza
> >>
> >> Richard.
> >>
> >>
> >> >Honza
> >> >>
> >> >> --
> >> >> Eric Botcazou
> >>


Re: [PATCH] Correctly fill up cgraph_node::local.versionable flag.

2015-10-19 Thread Martin Liška

On 10/17/2015 09:49 PM, Jan Hubicka wrote:

Hello.

I've been working on HSA branch, where we have a cloning pass running with all
optimization levels. The patch makes computation of 
cgraph_node::local.versionability
independent on IPA CP and uses the flag to verify that a function can be cloned.

The patch can bootstrap on x86_64-linux-pc and survives test suite.

Ready for trunk?
Thanks,
Martin



>From d17b51257d5e01ab6bd9a018b08f8ed6fd39c029 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 8 Oct 2015 17:57:30 +0200
Subject: [PATCH 1/3] Correctly fill up cgraph_node::local.versionable flag.

gcc/ChangeLog:

2015-10-15  Martin Liska  

* cgraphclones.c (cgraph_node::create_virtual_clone):
Verify cgraph_node.local.versionable instead of calling
tree_versionable_function_p.
* ipa-cp.c (determine_versionability): Save the information
to ipa_node_params summary.
(ipcp_versionable_function_p): Use it.
(ipcp_propagate_stage): Pass IPA_NODE_REF to a called function.
(ipcp_generate_summary): Do not compute cgraph_node
versionability.


If you want to use the flag at WPA time, you need to stream it for LTO.
I suppose this only passed testing because we have no testcases checking that
ipa-cp happens with LTO.


Hi.

The flag is already streamed.




* ipa-inline-analysis.c (inline_generate_summary): Compute
visibility for all cgraph nodes.
* ipa-prop.c (ipa_node_params_t::duplicate): Duplicate
ipa_node_params::versionability.
* ipa-prop.h (struct ipa_node_params): Declare it.
---
  gcc/cgraphclones.c|  2 +-
  gcc/ipa-cp.c  | 15 ++-
  gcc/ipa-inline-analysis.c |  4 
  gcc/ipa-prop.c|  1 +
  gcc/ipa-prop.h|  2 ++
  5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index e51431c..5c04dc4 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -570,7 +570,7 @@ cgraph_node::create_virtual_clone (vec 
redirect_callers,
char *name;

if (!in_lto_p)
-gcc_checking_assert (tree_versionable_function_p (old_decl));
+gcc_checking_assert (local.versionable);


Then you should be able to drop in_lto_p here.


Yes, it works.

I'm sending final version of the patch which I'm going to install to trunk.

Thanks for the review,
Martin



OK with that change.

Honza



>From 83a16ac2716baf7418d581398769e78bbde6abd0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 8 Oct 2015 17:57:30 +0200
Subject: [PATCH] Correctly fill up cgraph_node::local.versionable flag.

gcc/ChangeLog:

2015-10-15  Martin Liska  

	* cgraphclones.c (cgraph_node::create_virtual_clone):
	Verify cgraph_node.local.versionable instead of calling
	tree_versionable_function_p.
	* ipa-cp.c (determine_versionability): Save the information
	to ipa_node_params summary.
	(ipcp_versionable_function_p): Use it.
	(ipcp_propagate_stage): Pass IPA_NODE_REF to a called function.
	(ipcp_generate_summary): Do not compute cgraph_node
	versionability.
	* ipa-inline-analysis.c (inline_generate_summary): Compute
	visibility for all cgraph nodes.
	* ipa-prop.c (ipa_node_params_t::duplicate): Duplicate
	ipa_node_params::versionability.
	* ipa-prop.h (struct ipa_node_params): Declare it.
---
 gcc/cgraphclones.c|  4 +---
 gcc/ipa-cp.c  | 15 ++-
 gcc/ipa-inline-analysis.c |  4 
 gcc/ipa-prop.c|  1 +
 gcc/ipa-prop.h|  2 ++
 5 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index e51431c..f243f6f 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -569,9 +569,7 @@ cgraph_node::create_virtual_clone (vec redirect_callers,
   ipa_replace_map *map;
   char *name;
 
-  if (!in_lto_p)
-gcc_checking_assert (tree_versionable_function_p (old_decl));
-
+  gcc_checking_assert (local.versionable);
   gcc_assert (local.can_change_signature || !args_to_skip);
 
   /* Make a new FUNCTION_DECL tree node */
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d9d81f1..ef93b20 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -514,7 +514,8 @@ print_all_lattices (FILE * f, bool dump_sources, bool dump_benefits)
with NODE.  */
 
 static void
-determine_versionability (struct cgraph_node *node)
+determine_versionability (struct cgraph_node *node,
+			  struct ipa_node_params *info)
 {
   const char *reason = NULL;
 
@@ -546,7 +547,7 @@ determine_versionability (struct cgraph_node *node)
 fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n",
 	 node->name (), node->order, reason);
 
-  node->local.versionable = (reason == NULL);
+  info->versionable = (reason == NULL);
 }
 
 /* Return true if it is at all technically possible to create clones of a
@@ -555,7 +556,7 @@ determine_versionability (struct cgraph_node *node)
 static bool
 ipcp_versionable_function_p 

Re: [c++-delayed-folding] First stab at convert_to_integer

2015-10-19 Thread Marek Polacek
On Mon, Oct 19, 2015 at 09:59:03AM -1000, Jason Merrill wrote:
> >Done.  But I can't seem to commit the patch to the c++-delayed-folding
> >branch; is that somehow restricted?  I'm getting:
> >
> >svn: E170001: Commit failed (details follow):
> >svn: E170001: Authorization failed
> >svn: E170001: Your commit message was left in a temporary file:
> >svn: E170001:'/home/marek/svn/c++-delayed-folding/svn-commit.tmp'
> >
> >and I've checked out the branch using
> >svn co svn://mpola...@gcc.gnu.org/svn/gcc/branches/c++-delayed-folding/
> 
> You need to use svn+ssh:// rather than svn:// if you want to be able to
> commit.  From svnwrite.html:
> 
> It is also possible to convert an existing SVN tree to use SSH by using svn
> switch --relocate:
> 
> svn switch --relocate svn://gcc.gnu.org/svn/gcc
> svn+ssh://usern...@gcc.gnu.org/svn/gcc

Oh my, thanks.  Committed now.

Marek


[PATCH] Refactoring sese.h and graphite-poly.h

2015-10-19 Thread Aditya Kumar
Rename scop->region to scop->scop_info
Removed conversion constructors for sese_l and dr_info.
Removed macros.

No functional changed intended. Passes regtest and bootstrap.

gcc/ChangeLog:

2015-19-10  Aditya Kumar  
* graphite-poly.h (struct dr_info): Removed conversion constructor.
(struct scop): Renamed scop::region to scop::scop_info
(scop_set_region): Same.
(SCOP_REGION): Removed
(SCOP_CONTEXT): Removed.
(POLY_SCOP_P): Removed.
* graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user):
Rename scop->region to scop->scop_info.
(add_parameters_to_ivs_params): Same.
(graphite_regenerate_ast_isl): Same.
* graphite-poly.c (new_scop): Same.
(free_scop): Same.
(print_scop_params): Same.
* graphite-scop-detection.c (scop_detection::remove_subscops): Same.
(scop_detection::remove_intersecting_scops): Use pointer to sese_l.
(dot_all_scops_1): Rename scop->region to scop->scop_info.
(scop_detection::nb_pbbs_in_loops): Same.
(find_scop_parameters): Same.
(try_generate_gimple_bb): Same.
(gather_bbs::before_dom_children): Same.
(gather_bbs::after_dom_children): Same.
(build_scops): Same.
* graphite-sese-to-poly.c (build_scop_scattering): Same.
(extract_affine_chrec): Same.
(extract_affine): Same.
(set_scop_parameter_dim): Same.
(build_loop_iteration_domains): Same.
(create_pw_aff_from_tree): Same.
(add_param_constraints): Same.
(build_scop_iteration_domain): Same.
(build_scop_drs): Same.
(analyze_drs_in_stmts): Same.
(insert_out_of_ssa_copy_on_edge): Same.
(rewrite_close_phi_out_of_ssa):Same. 
(rewrite_reductions_out_of_ssa):Same. 
(handle_scalar_deps_crossing_scop_limits):Same. 
(rewrite_cross_bb_scalar_deps):Same. 
(rewrite_cross_bb_scalar_deps_out_of_ssa):Same. 
(build_poly_scop):Same. 
(build_alias_set): Use pointer to dr_info.
* graphite.c (print_graphite_scop_statistics):
(graphite_transform_loops):
* sese.h (struct sese_l): Remove conversion constructor.



---
 gcc/graphite-isl-ast-to-gimple.c |  8 
 gcc/graphite-poly.c  |  8 
 gcc/graphite-poly.h  | 14 ++
 gcc/graphite-scop-detection.c| 34 
 gcc/graphite-sese-to-poly.c  | 42 
 gcc/graphite.c   | 10 +-
 gcc/sese.h   |  3 ---
 7 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 2f2e2ba..7f99bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -786,10 +786,10 @@ translate_isl_ast_node_user (__isl_keep isl_ast_node 
*node,
   iv_map.create (nb_loops);
   iv_map.safe_grow_cleared (nb_loops);
 
-  build_iv_mapping (iv_map, gbb, user_expr, ip, pbb->scop->region->region);
+  build_iv_mapping (iv_map, gbb, user_expr, ip, pbb->scop->scop_info->region);
   isl_ast_expr_free (user_expr);
   next_e = copy_bb_and_scalar_dependences (GBB_BB (gbb),
-  pbb->scop->region, next_e,
+  pbb->scop->scop_info, next_e,
   iv_map,
   _regenerate_error);
   iv_map.release ();
@@ -909,7 +909,7 @@ print_isl_ast_node (FILE *file, __isl_keep isl_ast_node 
*node,
 static void
 add_parameters_to_ivs_params (scop_p scop, ivs_params )
 {
-  sese_info_p region = scop->region;
+  sese_info_p region = scop->scop_info;
   unsigned nb_parameters = isl_set_dim (scop->param_context, isl_dim_param);
   gcc_assert (nb_parameters == SESE_PARAMS (region).length ());
   unsigned i;
@@ -1144,7 +1144,7 @@ bool
 graphite_regenerate_ast_isl (scop_p scop)
 {
   loop_p context_loop;
-  sese_info_p region = scop->region;
+  sese_info_p region = scop->scop_info;
   ifsese if_region = NULL;
   isl_ast_node *root_node;
   ivs_params ip;
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 0d1dc63..eb76f05 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -306,7 +306,7 @@ new_scop (edge entry, edge exit)
   scop->may_waw_no_source = NULL;
   scop_set_region (scop, region);
   scop->pbbs.create (3);
-  POLY_SCOP_P (scop) = false;
+  scop->poly_scop_p = false;
   scop->drs.create (3);
 
   return scop;
@@ -321,7 +321,7 @@ free_scop (scop_p scop)
   poly_bb_p pbb;
 
   remove_gbbs_in_scop (scop);
-  free_sese_info (SCOP_REGION (scop));
+  free_sese_info (scop->scop_info);
 
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
 free_poly_bb (pbb);
@@ -475,13 +475,13 @@ print_pbb (FILE *file, poly_bb_p pbb)
 void
 print_scop_params (FILE *file, scop_p scop)
 {
-  if (SESE_PARAMS 

Re: [patch] header file re-ordering.

2015-10-19 Thread Jeff Law

On 10/14/2015 08:05 AM, Andrew MacLeod wrote:

On 10/12/2015 04:04 AM, Jeff Law wrote:

Oh, you must be looking at the original combined patch?

Possibly :-)



fold-const.h is indirectly included by cp-tree.h, which gets it from
including c-common.h.   some of the output from show-headers on
objc-act.c  (indentation represents levels of including.  The number in
parenthesis is the number of times that include has been seen so far in
the files include list.   As you can see, we include ansidecl.h a lot
:-)  Most of the time there isn't much we can do about those sorts of
things. :

cp-tree.h
 tm.h  (2)
 hard-reg-set.h
 function.h  (1)
 c-common.h
   splay-tree.h
 ansidecl.h  (4)
   cpplib.h
 symtab.h  (2)
 line-map.h  (2)
   alias.h
   tree.h  (2)
   fold-const.h
   diagnostic-core.h  (1)
 bversion.h

I guess It could be a useful addition to show-headers to specify a
header file you are looking for and show you where it comes from if its
included...
Yea.  Though I think it's probably easy enough to get it from the 
current output.




I any case, there is some indirection here because none of the front end
files were flattened that much
And I think that's probably some source of the confusion on my part.  I 
thought we'd flattened the front-end .h files too.  So I didn't look 
deeply into the .h files to see if they were doing something undesirable 
behind my back.




incidentally, you may notice this is the second time tree.h is
included.  The first occurrence of tree.h is included directly by
objc-act.c, but it needs to be left because something between that and
cp-tree.h needs tree.h to compile.This sort of thing is resolved by
using the re-order tool, but I did not run that tool on most of the objc
and objcp files as they have some complex conditionals in their include
list:
#include "tree.h"
#include "stringpool.h"
#include "stor-layout.h"
#include "attribs.h"

#ifdef OBJCPLUS
#include "cp/cp-tree.h"
#else
#include "c/c-tree.h"
#include "c/c-lang.h"
#endif

#include "c-family/c-objc.h"
#include "langhooks.h"

Its beyond the scope of the reorder tool to deal with re-positioning
this automatically... and happens so rarely I didn't even look into it.
So they are not optimal as far as ordering goes.

Understood.  This unholy sharing had me concerned as well.


So you can not worry about that.  It builds fine.
OK.  I think the major source of confusion was the lack of flattening 
for the front-ends.  I'll go back to it with that in mind and probably 
start using the tools when I get a WTF moment.







I'm slightly concerned about the darwin, windows and solaris bits. The
former primarily because Darwin has been a general source of pain, and
in the others because I'm not sure the cross testing will exercise
that code terribly much.


Its easy enough to NOT do this for any of those files if were too
worried about them.   Its also easy to revert a single file if it
appears to be an issue. Thats why I wanted to run as many of these
on the compile farm natively as I could... but alas, powerPC was the
only thing the farm really offered me.



I'll go ahead and approve all the config/ bits.  Please be on the
lookout for any fallout.


even darwin, windows and solaris? :-)
Yup.  The changes are straighforward enough that if there's fallout (and 
to some degree I expect minor fallout from native builds) it can be 
easily fixed.


Jeff


Re: [PATCH, rs6000] Pass --secure-plt to the linker

2015-10-19 Thread Szabolcs Nagy

On 19/10/15 14:04, Szabolcs Nagy wrote:

On 19/10/15 12:12, Alan Modra wrote:

On Thu, Oct 15, 2015 at 06:50:50PM +0100, Szabolcs Nagy wrote:

A powerpc toolchain built with (or without) --enable-secureplt
currently creates a binary that uses bss plt if

(1) any of the linked PIC objects have bss plt relocs
(2) or all the linked objects are non-PIC or have no relocs,

because this is the binutils linker behaviour.

This patch passes --secure-plt to the linker which makes the linker
warn in case (1) and produce a binary with secure plt in case (2).


The idea is OK I think, but


@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
  %{R*} \
  %(link_shlib) \
  %{!T*: %(link_start) } \
+%{!static: %(link_secure_plt_default)} \
  %(link_os)"


this change needs to be conditional on !mbss-plt too.



OK, will change that.

if -msecure-plt and -mbss-plt are supposed to affect
linking too (not just code gen) then shall i add
%{msecure-plt: --secure-plt} too?



I added !mbss-plt only for now as a mix of -msecure-plt
and -mbss-plt options do not cancel each other in gcc,
the patch only changes behaviour for a secureplt toolchain.

OK to commit?

diff --git a/gcc/config/rs6000/secureplt.h b/gcc/config/rs6000/secureplt.h
index b463463..77edf2a 100644
--- a/gcc/config/rs6000/secureplt.h
+++ b/gcc/config/rs6000/secureplt.h
@@ -18,3 +18,4 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define CC1_SECURE_PLT_DEFAULT_SPEC "-msecure-plt"
+#define LINK_SECURE_PLT_DEFAULT_SPEC "--secure-plt"
diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 7b2f9bd..93499e8 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -537,6 +537,9 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 #ifndef CC1_SECURE_PLT_DEFAULT_SPEC
 #define CC1_SECURE_PLT_DEFAULT_SPEC ""
 #endif
+#ifndef LINK_SECURE_PLT_DEFAULT_SPEC
+#define LINK_SECURE_PLT_DEFAULT_SPEC ""
+#endif
 
 /* Pass -G xxx to the compiler.  */
 #undef CC1_SPEC
@@ -574,6 +577,7 @@ ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 %{R*} \
 %(link_shlib) \
 %{!T*: %(link_start) } \
+%{!static: %{!mbss-plt: %(link_secure_plt_default)}} \
 %(link_os)"
 
 /* Shared libraries are not default.  */
@@ -889,6 +893,7 @@ ncrtn.o%s"
   { "link_os_openbsd",		LINK_OS_OPENBSD_SPEC },			\
   { "link_os_default",		LINK_OS_DEFAULT_SPEC },			\
   { "cc1_secure_plt_default",	CC1_SECURE_PLT_DEFAULT_SPEC },		\
+  { "link_secure_plt_default",	LINK_SECURE_PLT_DEFAULT_SPEC },		\
   { "cpp_os_ads",		CPP_OS_ADS_SPEC },			\
   { "cpp_os_yellowknife",	CPP_OS_YELLOWKNIFE_SPEC },		\
   { "cpp_os_mvme",		CPP_OS_MVME_SPEC },			\


[PATCH] fortran/68019 -- Remove an assert() that prevents error message

2015-10-19 Thread Steve Kargl
The attached patch removes an assert() that prevents gfortran from
issuing an error message.  Built and tested on x86_64-*-freebsd.
Althoug probably an "obviously correct" patch, OK to commit?

2015-10-19  Steven G. Kargl  

PR fortran/68019
* decl.c (add_init_expr_to_sym): Remove an assert() to allow an error
message to be issued.

2015-10-19  Steven G. Kargl  

PR fortran/68019
* gfortran.dg/pr68019.f90: new test.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 228974)
+++ gcc/fortran/decl.c	(working copy)
@@ -1486,7 +1486,6 @@ add_init_expr_to_sym (const char *name, 
 			 " with scalar", >declared_at);
 	  return false;
 	}
-	  gcc_assert (sym->as->rank == init->rank);
 
 	  /* Shape should be present, we get an initialization expression.  */
 	  gcc_assert (init->shape);
Index: gcc/testsuite/gfortran.dg/pr68019.f90
===
--- gcc/testsuite/gfortran.dg/pr68019.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/pr68019.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! Original code from Gerhard Steinmetz
+! Gerhard dot Steinmetz for fortran at t-online dot de
+! PR fortran/68019
+!
+program p
+   integer :: i
+   type t
+  integer :: n
+   end type
+   type(t), parameter :: vec(*) = [(t(i), i = 1, 4)]
+   type(t), parameter :: arr(*) = reshape(vec, [2, 2])   ! { dg-error "ranks 1 and 2 in assignment" }
+end


PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-10-19 Thread H.J. Lu
-- Forwarded message --
From: H.J. Lu 
Date: Wed, Sep 9, 2015 at 3:02 PM
Subject: [PATCH] PR target/67215: -fno-plt needs improvements for x86
To: gcc-patches@gcc.gnu.org


prepare_call_address in calls.c is the wrong place to handle -fno-plt.
We shoudn't force function address into register and hope that load
function address via GOT and indirect call via register will be folded
into indirect call via GOT, which doesn't always happen.  Also non-PIC
case can only be handled in backend.  Instead, backend should expand
external function call into indirect call via GOT for -fno-plt.

This patch reverts -fno-plt in prepare_call_address and handles it in
ix86_expand_call.  Other backends may need similar changes to support
-fno-plt.  Alternately, we can introduce a target hook to indicate
whether an external function should be called via register for -fno-plt
so that i386 backend can disable it in prepare_call_address.

gcc/

PR target/67215
* calls.c (prepare_call_address): Don't handle -fno-plt here.
* config/i386/i386.c (ix86_expand_call): Generate indirect call
via GOT for -fno-plt.  Support indirect call via GOT for x32.
* config/i386/predicates.md (sibcall_memory_operand): Allow
GOT memory operand.

gcc/testsuite/

PR target/67215
* gcc.target/i386/pr67215-1.c: New test.
* gcc.target/i386/pr67215-2.c: Likewise.
* gcc.target/i386/pr67215-3.c: Likewise.
---
 gcc/calls.c   | 12 --
 gcc/config/i386/i386.c| 71 ---
 gcc/config/i386/predicates.md |  7 ++-
 gcc/testsuite/gcc.target/i386/pr67215-1.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-2.c | 20 +
 gcc/testsuite/gcc.target/i386/pr67215-3.c | 13 ++
 6 files changed, 114 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67215-3.c

diff --git a/gcc/calls.c b/gcc/calls.c
index 026cb53..22c65cd 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -203,18 +203,6 @@ prepare_call_address (tree fndecl_or_type, rtx
funexp, rtx static_chain_value,
   && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
  : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic
-  && fndecl_or_type
-  && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
-  && (!flag_plt
-  || lookup_attribute ("noplt", DECL_ATTRIBUTES (fndecl_or_type)))
-  && !targetm.binds_local_p (fndecl_or_type))
-{
-  /* This is done only for PIC code.  There is no easy interface
to force the
-function address into GOT for non-PIC case.  non-PIC case needs to be
-handled specially by the backend.  */
-  funexp = force_reg (Pmode, funexp);
-}
   else if (! sibcallp)
 {
   if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d78f4e7..b9299d4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25649,21 +25649,54 @@ ix86_expand_call (rtx retval, rtx fnaddr,
rtx callarg1,
   /* Static functions and indirect calls don't need the pic
register.  Also,
 check if PLT was explicitly avoided via no-plt or "noplt"
attribute, making
 it an indirect call.  */
+  rtx addr = XEXP (fnaddr, 0);
   if (flag_pic
- && (!TARGET_64BIT
- || (ix86_cmodel == CM_LARGE_PIC
- && DEFAULT_ABI != MS_ABI))
- && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
- && !SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
- && flag_plt
- && (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE
- || !lookup_attribute ("noplt",
-DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0))
+ && GET_CODE (addr) == SYMBOL_REF
+ && !SYMBOL_REF_LOCAL_P (addr))
{
- use_reg (, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
- if (ix86_use_pseudo_pic_reg ())
-   emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
-   pic_offset_table_rtx);
+ if (flag_plt
+ && (SYMBOL_REF_DECL (addr) == NULL_TREE
+ || !lookup_attribute ("noplt",
+   DECL_ATTRIBUTES
(SYMBOL_REF_DECL (addr)
+   {
+ if (!TARGET_64BIT
+ || (ix86_cmodel == CM_LARGE_PIC
+ && DEFAULT_ABI != MS_ABI))
+   {
+ use_reg (, gen_rtx_REG (Pmode,
+ REAL_PIC_OFFSET_TABLE_REGNUM));
+ if (ix86_use_pseudo_pic_reg ())
+   emit_move_insn (gen_rtx_REG 

Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-10-19 Thread Richard Henderson

On 07/22/2015 07:01 AM, Szabolcs Nagy wrote:

2015-07-22  Szabolcs Nagy

PR target/66912
* varasm.c (default_binds_local_p_2): Turn on extern_protected_data.

gcc/testsuite/ChangeLog:

2015-07-22  Szabolcs Nagy

PR target/66912
* gcc.target/aarch64/pr66912.c: New.
* gcc.target/arm/pr66912.c: New.


Ok.


r~


Re: [PATCH] mn10300: Use the STC bb-reorder algorithm at -Os

2015-10-19 Thread Jeff Law

On 10/16/2015 06:53 AM, Segher Boessenkool wrote:

For mn10300, STC still gives better results for optimise-for-size than
"simple" does.  So use STC at -Os as well.

Is this okay for trunk?


Segher


2015-10-16  Segher Boessenkool  

* common/config/mn10300/mn10300-common.c
(mn10300_option_optimization_table) :
Use REORDER_BLOCKS_ALGORITHM_STC at -Os and up.


OK.
jeff



Re: [PATCH, libiberty] Fix PR63758 by using the _NSGetEnviron() API on Darwin.

2015-10-19 Thread Mike Stump
On Oct 18, 2015, at 3:42 AM, Iain Sandoe  wrote:
 This seems likely to break cross-compilers to Darwin that do not have
 the system libraries available.  I guess I don't care about that if
 you don't.
>>> 
>>> I do care about it, but I'm not visualising the case...
>>> 
>>> AFAICS, when built as a host component for a cross to Darwin from 
>>> non-Darwin, environ would be declared as **environ as usual.
>>> 
>>> If an implementation includes a compiler targeting Darwin that defines 
>>> __APPLE__ but doesn't provide _NSGetEnviron in its libc, then isn't it 
>>> broken anyway?
>> 
>> I'm talking about the case of building a cross-compiler where the
>> system libraries are not available.  This is sometimes done as a first
>> step toward building a full cross-compiler.
> 
> I've applied the patch since it solves an immediate problem (and has been 
> requested).
> 
> Right now, the only case that I can think of when there's a Darwin-hosted 
> statically-linked user-space executable is in bringing up the system itself, 
> in which case one has to build non-standard crts and a statically-linkable 
> libc.  Last time I did this was on 10.5 with the darwinbuild stuff, not sure 
> it's even feasible on modern Darwin which is built with a different compiler.
> 
> It's possible that making the Darwin case conditional on ! 
> defined(__STATIC__) might be sufficient to guard that, but I need to think of 
> some way to test it.

So, I see two different things here.  One is a build of the darwin open source 
kernel.  I’ve never done that, though, I knew people that did.  I don’t play in 
this space, so I don’t know how much of a rat hole it is, or if it is even 
possible anymore.  Really, it should just be a drop a new gcc into the official 
open source darwin build infrastructure and hit build.  If it didn’t just build 
before, then it might be a time sink to make it work, I just don’t know.

The other is, it is theoretically nice to be able to build up an entire gcc 
tool chain for a mac, starting from a linux box.  I usually don’t do this, but, 
I do a subset, which is a cc1 with no headers and no link or assembly support 
that fails to build, but works far enough to get past cc1.  This isn’t handy 
for users, but for a developer, I like to do this from time to time.

I don’t see the case that Ian is concerned about.  Either, they have Apple’s 
library, and it does include this routine, or, someone is making a replacement 
OS, and then will need to now provide that routine, if they did not before.  
Partial builds without a library are fine, but without a library, you can’t 
link anything (other than -r) anyway, so I’m not sure it matters that it would 
fail to link, as it failed before anyway (for example, printf would not be 
found either).

Kernel builds are special, and they are one of the few things that build 
static, as does the dyld program).  To test either, I’d recommend either, not 
worrying about it, life is short, or, if you do care enough, you just gotta 
roll up you sleeves, as they truly are special.

Re: [PATCH, sh][v3] musl support for sh

2015-10-19 Thread Szabolcs Nagy

On 17/10/15 02:14, Oleg Endo wrote:

On Fri, 2015-10-16 at 17:06 +0100, Szabolcs Nagy wrote:

Revision of
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01636.html

The musl dynamic linker name is /lib/ld-musl-sh{-nofpu}{-fdpic}.so.1

New in this revision:

Add -fdpic to the name, will be useful with the pending sh2 FDPIC support.

2015-10-16  Gregor Richards  
Szabolcs Nagy  

* config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define.
(MUSL_DYNAMIC_LINKER_E, MUSL_DYNAMIC_LINKER_FP): Define.




+#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4)
+/* "-nofpu" if any nofpu option is specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" \
+  "m4-340|m4-400|m4-500|m4al|m5-32media-nofpu|m5-64media-nofpu|" \
+  "m5-compact-nofpu:-nofpu}"
+#else
+/* "-nofpu" if none of the hard fpu options are specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  
"%{m2a|m4|m4-100|m4-200|m4-300|m4a|m5-32media|m5-64media|m5-compact:;:-nofpu}"
+#endif


SH5 has been declared obsolete.  Please do not add any new SH5 related
things.  In this case, drop the m5-* thingies.



removed m5*.
OK to commit with this?

diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h
index 0f5d614..61cf777 100644
--- a/gcc/config/sh/linux.h
+++ b/gcc/config/sh/linux.h
@@ -43,6 +43,27 @@ along with GCC; see the file COPYING3.  If not see
 
 #define TARGET_ASM_FILE_END file_end_indicate_exec_stack
 
+#if TARGET_ENDIAN_DEFAULT == MASK_LITTLE_ENDIAN
+#define MUSL_DYNAMIC_LINKER_E "%{mb:eb}"
+#else
+#define MUSL_DYNAMIC_LINKER_E "%{!ml:eb}"
+#endif
+
+#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4)
+/* "-nofpu" if any nofpu option is specified.  */
+#define MUSL_DYNAMIC_LINKER_FP \
+  "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" \
+  "m4-340|m4-400|m4-500|m4al:-nofpu}"
+#else
+/* "-nofpu" if none of the hard fpu options are specified.  */
+#define MUSL_DYNAMIC_LINKER_FP "%{m2a|m4|m4-100|m4-200|m4-300|m4a:;:-nofpu}"
+#endif
+
+#undef MUSL_DYNAMIC_LINKER
+#define MUSL_DYNAMIC_LINKER \
+  "/lib/ld-musl-sh" MUSL_DYNAMIC_LINKER_E MUSL_DYNAMIC_LINKER_FP \
+  "%{mfdpic:-fdpic}.so.1"
+
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX


Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-19 Thread Ilya Verbin
On Thu, Oct 15, 2015 at 16:01:56 +0200, Jakub Jelinek wrote:
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> >  
> > +  if (flags & GOMP_TARGET_FLAG_NOWAIT)
> > +{
> > +  gomp_create_target_task (devicep, fn_addr, mapnum, hostaddrs, sizes,
> > +  kinds, flags, depend);
> > +  return;
> > +}
> 
> But this is not ok.  You need to do this far earlier, already before the
> if (depend != NULL) code in GOMP_target_41.  And, I think you should just
> not pass fn_addr, but fn itself.
> 
> > @@ -1636,34 +1657,58 @@ void
> >  gomp_target_task_fn (void *data)
> >  {
> >struct gomp_target_task *ttask = (struct gomp_target_task *) data;
> > +  struct gomp_device_descr *devicep = ttask->devicep;
> > +
> >if (ttask->fn != NULL)
> >  {
> > -  /* GOMP_target_41 */
> > +  if (devicep == NULL
> > + || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > +   {
> > + /* FIXME: Save host fn addr into gomp_target_task?  */
> > + gomp_target_fallback_firstprivate (NULL, ttask->mapnum,
> 
> If you pass above fn instead of fn_addr, ttask->fn is what you want
> to pass to gomp_target_fallback_firstprivate here and remove the FIXME.
> 
> > +ttask->hostaddrs, ttask->sizes,
> > +ttask->kinds);
> > + return;
> > +   }
> > +
> > +  struct target_mem_desc *tgt_vars
> > +   = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> > +ttask->sizes, ttask->kinds, true,
> > +GOMP_MAP_VARS_TARGET);
> > +  devicep->async_run_func (devicep->target_id, ttask->fn,
> > +  (void *) tgt_vars->tgt_start, data);
> 
> You need to void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> first obviously, and pass fn_addr.
> 
> > +
> > +  /* FIXME: TMP example of checking for completion.
> > +Alternatively the plugin can set some completion flag in ttask.  */
> > +  while (!devicep->async_is_completed_func (devicep->target_id, data))
> > +   {
> > + fprintf (stderr, "-");
> > + usleep (10);
> > +   }
> 
> This obviously doesn't belong here.
> 
> >if (device->capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > diff --git a/libgomp/testsuite/libgomp.c/target-tmp.c 
> > b/libgomp/testsuite/libgomp.c/target-tmp.c
> > new file mode 100644
> > index 000..23a739c
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c
> > @@ -0,0 +1,40 @@
> > +#include 
> > +#include 
> > +
> > +#pragma omp declare target
> > +void foo (int n)
> > +{
> > +  printf ("Start tgt %d\n", n);
> > +  usleep (500);
> 
> 5s is too long.  Not to mention that not sure if PTX can do printf
> and especially usleep.
> 
> > diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
> > b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > index 26ac6fe..c843710 100644
> > --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> > +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> ...
> > +/* Set of asynchronously running target tasks.  */
> > +static std::set *async_tasks;
> > +
> >  /* Thread-safe registration of the main image.  */
> >  static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
> >  
> > +/* Mutex for protecting async_tasks.  */
> > +static pthread_mutex_t async_tasks_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> >  static VarDesc vd_host2tgt = {
> >{ 1, 1 },  /* dst, src */
> >{ 1, 0 },  /* in, out  */
> > @@ -156,6 +163,8 @@ init (void)
> >  
> >  out:
> >address_table = new ImgDevAddrMap;
> > +  async_tasks = new std::set;
> > +  pthread_mutex_init (_tasks_lock, NULL);
> 
> PTHREAD_MUTEX_INITIALIZER should already initialize the lock.
> But, do you really need async_tasks and the lock?  Better store
> something into some plugin's owned field in target_task struct and
> let the plugin callback be passed address of that field rather than the
> whole target_task?

So, here is what I have for now.  Attached target-29.c testcase works fine with
MIC emul, however I don't know how to (and where) properly check for completion
of async execution on target.  And, similarly, where to do unmapping after that?
Do we need a callback from plugin to libgomp (as far as I understood, PTX
runtime supports this, but HSA doesn't), or libgomp will just check for
ttask->is_completed in task.c?

 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e707c80 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -430,6 +430,7 @@ struct gomp_target_task
   size_t *sizes;
   unsigned short *kinds;
   unsigned int flags;
+  bool is_completed;
   void *hostaddrs[];
 };
 
@@ -877,6 +878,7 @@ struct gomp_device_descr
   void *(*host2dev_func) (int, void *, const void *, size_t);
   void *(*dev2dev_func) (int, void *, const void *, size_t);
   void 

Re: [PATCH] Fix partial template specialization syntax in wide-int.h

2015-10-19 Thread H.J. Lu
On Mon, Jul 20, 2015 at 12:15 AM, Mikhail Maltsev  wrote:
> On 07/17/2015 07:46 PM, Mike Stump wrote:
>> On Jul 17, 2015, at 2:28 AM, Mikhail Maltsev  wrote:
>>> The following code (reduced from wide-int.h) is rejected by Intel C++
>>> Compiler (EDG-based):
>>
>> So, could you test this with the top of the tree compiler and file a bug
>> report against g++ for it, if it seems to not work right.  If that bug report
>> is rejected, then I’d say file a bug report against clang and EDG.
>
> In addition to usual bootstrap+regtest, I also checked that build succeeds 
> with
> GCC 4.3.6 (IIRC, this is now the minimal required version) as well as with
> recent GCC snapshot used as stage 0. Committed as r225993.
> I also filed this bugreport: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66941
>
>>> I think that the warning is correct, and "template <>" should not be used
>>> here. The attached patch should fix this issue. Bootstrapped and regtested
>>> on x86_64-linux. OK for trunk?
>>
>> Ok.  Does this need to go into the gcc-5 release branch as well?  If so, ok
>> there too.  Thanks.
> I think there is no need for it.

It is also need for gcc-5. I am backporting it now.


-- 
H.J.


[PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-10-19 Thread Martin Sebor

This is a patch for two C++ bugs:

  67913 - new expression with negative size not diagnosed
  67927 - array new expression with excessive number of elements
  not diagnosed

The C++ front end rejects a subset of array declarators with negative
bounds or with bounds in excess of some implementation defined maximum
(roughly SIZE_MAX / 2), but it does so inconsistently.  For example,
it silently accepts expressions such as

new int [-1][2];

but invokes operator new to allocate SIZE_MAX bytes.  When operator
new succeeds (as is the case on Linux with memory overcommitment
enabled), the new expression returns a valid pointer to some
unknown amount of storage less than SIZE_MAX.  Accessing the memory
at some non-zero offset then causes a SIGSEGV.

Similarly, GCC accepts the following expression with the same result:

new int [SIZE_MAX][2];

C++ 14 makes it clear that such expressions are ill-formed and must
be rejected.  This patch adds checks that consistently reject all
new expressions with both negative array bounds and bounds in excess
of the maximum.

While I raised these bugs as separate issues I decided to group the
two sets of changes together since they both touch the same function
in similar ways, and hopefully doing so will make them also easier
to review.

I've tested the patch by bootstrapping C/C++ and running the test
suites (including libstdc++) on x86_64 with no regressions.

During the development of the changes I found a few basic mistakes
in my code only after running libstdc++ tests.  To make it possible
to uncover them sooner in the future, I added another test that
isn't directly related to the problem: new45.C.

I also found a minor problem in the GCC regression test suite where
the g++.dg/other/new-size-type.C test for PR 36741 tried to check
that the 'new char[~static_cast(0)]' expression was accepted
without a warning.  The complaint in the PR was about the wording of
the warning, not about the validity of the expression (the submitter
agreed that a correctly worded diagnostic would be appropriate).
 I chaged the test to expect a meaningful error message.

Once this patch is approved and committed a follow-up patch should
document the implementation-defined maximum to the manual.

Martin
gcc/cp/ChangeLog

2015-10-19  Martin Sebor  

	PR c++/67913
	PR c++/67927
	* call.c (build_operator_new_call): Do not assume size_check
	is non-null, analogously to the top half of the function.
	* init.c (build_new_1): Detect and diagnose array sizes in
	excess of the maximum of roughly SIZE_MAX / 2.
	Insert a runtime check only for arrays with a non-constant size.
	(build_new): Detect and diagnose negative array sizes.

gcc/testsuite/ChangeLog

2015-10-19  Martin Sebor  

	* init/new45.C: New test to verify that operator new is invoked
	with or without overhead for a cookie.

	PR c++/67927
	* init/new44.C: New test for placement new expressions for arrays
	with excessive number of elements.

	PR c++/67913
	* init/new43.C: New test for placement new expressions for arrays
	with negative number of elements.

	* other/new-size-type.C: Expect array new expression with
	an excessive number of elements to be rejected.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 367d42b..3f76198 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4228,10 +4228,12 @@ build_operator_new_call (tree fnname, vec **args,
 	 {
 	   /* Update the total size.  */
 	   *size = size_binop (PLUS_EXPR, original_size, *cookie_size);
+	   if (size_check)
+	 {
 	   /* Set to (size_t)-1 if the size check fails.  */
-	   gcc_assert (size_check != NULL_TREE);
 	   *size = fold_build3 (COND_EXPR, sizetype, size_check,
 *size, TYPE_MAX_VALUE (sizetype));
+	 }
 	   /* Update the argument list to reflect the adjusted size.  */
 	   (**args)[0] = *size;
 	 }
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1ed8f6c..3db512d 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2272,7 +2272,11 @@ throw_bad_array_new_length (void)
 /* Generate code for a new-expression, including calling the "operator
new" function, initializing the object, and, if an exception occurs
during construction, cleaning up.  The arguments are as for
-   build_raw_new_expr.  This may change PLACEMENT and INIT.  */
+   build_raw_new_expr.  This may change PLACEMENT and INIT.
+   TYPE is the type of the object being constructed, possibly an array
+   of NELTS elements when NELTS is non-null (in "new T[NELTS]", T may
+   be an array of the form U[inner], with the whole expression being
+   "new U[NELTS][inner]").  */

 static tree
 build_new_1 (vec **placement, tree type, tree nelts,
@@ -2292,13 +2296,16 @@ build_new_1 (vec **placement, tree type, tree nelts,
  type.)  */
   tree pointer_type;
   tree non_const_pointer_type;
+  /* The most significant array bound in int[OUTER_NELTS][inner].  */
   tree outer_nelts = NULL_TREE;
-  /* For arrays, 

[PATCH] [AArch64] Distinct costs for sign and zero extension

2015-10-19 Thread Evandro Menezes
Some micro-architectures may favor one of sign or zero extension over 
the other in the base plus extended register offset addressing mode.


This patch separates the member "register_extend" of the structure 
"cpu_addrcost_table" into two, one for sign and the other zero extension.


Please, commit if it's alright.

Thank you,

--
Evandro Menezes

>From 2efc8994abfbab65d04009fa1c0a8900804c23bb Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Tue, 8 Sep 2015 15:15:56 -0500
Subject: [PATCH] [AArch64] Distinct costs for sign and zero extension

gcc/
	* config/aarch64/aarch64.c (generic_addrcost_table,
	cortexa57_addrcost_table, xgene1_addrcost_table): Infer values for sign
	and zero register extension.
	* config/aarch64/aarch64-protos.h (cpu_addrcost_table): Split member
	for register extension into sign and zero register extension.
---
 gcc/config/aarch64/aarch64-protos.h |  3 ++-
 gcc/config/aarch64/aarch64.c| 16 +++-
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index baaf1bd..3c46222 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -134,7 +134,8 @@ struct cpu_addrcost_table
   const int pre_modify;
   const int post_modify;
   const int register_offset;
-  const int register_extend;
+  const int register_sextend;
+  const int register_zextend;
   const int imm_offset;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aba5b56..47dbe74 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -190,7 +190,8 @@ static const struct cpu_addrcost_table generic_addrcost_table =
   0, /* pre_modify  */
   0, /* post_modify  */
   0, /* register_offset  */
-  0, /* register_extend  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
   0 /* imm_offset  */
 };
 
@@ -205,7 +206,8 @@ static const struct cpu_addrcost_table cortexa57_addrcost_table =
   0, /* pre_modify  */
   0, /* post_modify  */
   0, /* register_offset  */
-  0, /* register_extend  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
   0, /* imm_offset  */
 };
 
@@ -220,7 +222,8 @@ static const struct cpu_addrcost_table xgene1_addrcost_table =
   1, /* pre_modify  */
   0, /* post_modify  */
   0, /* register_offset  */
-  1, /* register_extend  */
+  1, /* register_sextend  */
+  1, /* register_zextend  */
   0, /* imm_offset  */
 };
 
@@ -5508,9 +5511,12 @@ aarch64_address_cost (rtx x,
 	cost += addr_cost->register_offset;
 	break;
 
-  case ADDRESS_REG_UXTW:
   case ADDRESS_REG_SXTW:
-	cost += addr_cost->register_extend;
+	cost += addr_cost->register_sextend;
+	break;
+
+  case ADDRESS_REG_UXTW:
+	cost += addr_cost->register_zextend;
 	break;
 
   default:
-- 
2.1.0.243.g30d45f7



Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-10-19 Thread Andrew Pinski
On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes  wrote:
> In the existing targets, it seems that it's always faster to zero up a DF
> register with "movi %d0, #0" instead of "fmov %d0, xzr".

I think for ThunderX 1, this change will not make a difference.  So I
am neutral on this change.

Thanks,
Andrew

>
> This patch modifies the respective pattern.
>
> Please, commit if it's alright.
>
> Thank you,
>
> --
> Evandro Menezes
>


Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-10-19 Thread Andrew Pinski
On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski  wrote:
> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes  
> wrote:
>> In the existing targets, it seems that it's always faster to zero up a DF
>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>
> I think for ThunderX 1, this change will not make a difference.  So I
> am neutral on this change.

Actually depending on fmov is decoded in our pipeline, this change
might actually be worse.  Currently fmov with an immediate is 1 cycle
while movi is two cycles.  Let me double check how internally on how
it is decoded and if it is 1 cycle or two.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>> This patch modifies the respective pattern.
>>
>> Please, commit if it's alright.
>>
>> Thank you,
>>
>> --
>> Evandro Menezes
>>


PING^2: [PATCH] Limit alignment on error_mark_node variable

2015-10-19 Thread H.J. Lu
PING

On Wed, Sep 30, 2015 at 9:13 AM, H.J. Lu  wrote:
> PING
>
> On Fri, Jul 10, 2015 at 5:19 AM, H.J. Lu  wrote:
>> On Thu, Jul 09, 2015 at 03:57:31PM +0200, Richard Biener wrote:
>>> On Thu, Jul 9, 2015 at 1:08 PM, H.J. Lu  wrote:
>>> > On Thu, Jul 9, 2015 at 2:54 AM, Richard Biener
>>> >  wrote:
>>> >> On Thu, Jul 9, 2015 at 11:52 AM, H.J. Lu  wrote:
>>> >>> On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
>>>  On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
>>>  > There is no need to try different alignment on variable of
>>>  > error_mark_node.
>>>  >
>>>  > OK for trunk if there is no regression?
>>> 
>>>  Can't we avoid calling align_variable on error_mark_node type decls
>>>  completely?  That is, punt earlier when we try to emit it.
>>> 
>>> >>>
>>> >>> How about this?  OK for trunk?
>>> >>
>>> >> Heh, you now get the obvious question why we can't simply avoid
>>> >> adding the varpool node in the first place ;)
>>> >>
>>> >
>>> > When it was first added to varpool, its type was OK:
>>> >
>>> > (gdb) bt
>>> > #0  varpool_node::get_create (decl=)
>>> > at /export/gnu/import/git/sources/gcc/gcc/varpool.c:150
>>> > #1  0x00e1c3e8 in rest_of_decl_compilation (
>>> > decl=, top_level=1, at_end=0)
>>> > at /export/gnu/import/git/sources/gcc/gcc/passes.c:271
>>> > #2  0x00731d39 in finish_decl (decl=,
>>> > init_loc=0, init=, origtype=, asmspec_tree=>> > 0x0>)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4863
>>> > #3  0x0078d1ed in c_parser_declaration_or_fndef (
>>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>>> > empty_ok=true, nested=false, start_attr_ok=true,
>>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>>> > #4  0x0078c234 in c_parser_external_declaration 
>>> > (parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>>> > #5  0x0078be45 in c_parser_translation_unit 
>>> > (parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>>> > #6  0x007b3271 in c_parse_file ()
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
>>> > #7  0x0081cb97 in c_common_parse_file ()
>>> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
>>> > #8  0x00f27662 in compile_file ()
>>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:543
>>> > ---Type  to continue, or q  to quit---
>>> > #9  0x00f29baa in do_compile ()
>>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2041
>>> > #10 0x00f29df9 in toplev::main (this=0x7fffdc90, argc=17,
>>> > argv=0x7fffdd98)
>>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2142
>>> > #11 0x017d8228 in main (argc=17, argv=0x7fffdd98)
>>> > at /export/gnu/import/git/sources/gcc/gcc/main.c:39
>>> >
>>> > Later, it was turned into error_mark_node:
>>> >
>>> > Old value = 
>>> > New value = 
>>> > finish_decl (decl=, init_loc=0, init=>> > 0x0>,
>>> > origtype=, asmspec_tree=)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>>> > 4802  if (TREE_USED (type))
>>> > (gdb) bt
>>> > #0  finish_decl (decl=, init_loc=0,
>>> > init=, origtype=, asmspec_tree=)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>>> > #1  0x0078d1ed in c_parser_declaration_or_fndef (
>>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>>> > empty_ok=true, nested=true, start_attr_ok=true,
>>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>>> > #2  0x00792a23 in c_parser_compound_statement_nostart (
>>> > parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4621
>>> > #3  0x00792688 in c_parser_compound_statement 
>>> > (parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4532
>>> > #4  0x0078d5a3 in c_parser_declaration_or_fndef (
>>> > parser=0x715050a8, fndef_ok=true, static_assert_ok=true,
>>> > empty_ok=true, nested=false, start_attr_ok=true,
>>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1965
>>> > #5  0x0078c234 in c_parser_external_declaration 
>>> > (parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>>> > #6  0x0078be45 in c_parser_translation_unit 
>>> > (parser=0x715050a8)
>>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>>> > #7  0x007b3271 in 

[PATCH,committed] PR fortran/67900 -- Check for NULL pointer

2015-10-19 Thread Steve Kargl
Committed as 'obvious' after Mikael's comment in PR audit trail.

2015-10-19  Steven G. Kargl  

* resolve.c (gfc_verify_binding_labels): Check for NULL pointer.

2015-10-19  Steven G. Kargl  

* gfortran.dg/pr67900.f90: New tests.

-- 
Steve
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(revision 229002)
+++ gcc/fortran/resolve.c	(working copy)
@@ -10800,7 +10800,7 @@ gfc_verify_binding_labels (gfc_symbol *s
   sym->binding_label = NULL;
 
 }
-  else if (sym->attr.flavor == FL_VARIABLE
+  else if (sym->attr.flavor == FL_VARIABLE && module 
 	   && (strcmp (module, gsym->mod_name) != 0
 	   || strcmp (sym->name, gsym->sym_name) != 0))
 {
Index: gcc/testsuite/gfortran.dg/pr67900.f90
===
--- gcc/testsuite/gfortran.dg/pr67900.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/pr67900.f90	(working copy)
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! PR fortran/67900
+! Original code contributed by Giorgian Borca-Tasciuc
+! giorgianb at gmail dot com
+! 
+program main
+   implicit none
+   interface f
+  function f_real(x)
+ real, bind(c) :: x
+ real :: f_real
+  end function f_real
+
+  function f_integer(x)
+ integer, bind(c) :: x
+ integer :: f_integer
+  end function f_integer
+   end interface f
+end program main


Re: [RFC VTV] Fix VTV for targets that have section anchors.

2015-10-19 Thread Caroline Tice
Sending this again, as the mailer daemon rejected it last time

It looks good to me, but I can only approve the files that go into libvtv.

In (belated) response to your earlier question about debugging vtv
problems, there's a fair amount of useful info for debugging in the
User's Guide, off the wiki page (https://gcc.gnu.org/wiki/vtv).   If
you're already read that and have further questions, let me know...


-- Caroline
cmt...@google.com


On Mon, Oct 19, 2015 at 4:39 PM, Caroline Tice  wrote:
> It looks good to me, but I can only approve the files that go into libvtv.
>
> In (belated) response to your earlier question about debugging vtv problems,
> there's a fair amount of useful info for debugging in the User's Guide, off
> the wiki page (https://gcc.gnu.org/wiki/vtv).   If you're already read that
> and have further questions, let me know...
>
>
> -- Caroline
> cmt...@google.com
>
> On Mon, Oct 19, 2015 at 1:54 AM, Ramana Radhakrishnan
>  wrote:
>>
>> On Tue, Oct 13, 2015 at 1:53 PM, Ramana Radhakrishnan
>>  wrote:
>> >
>> >
>> >
>> > On 12/10/15 21:44, Jeff Law wrote:
>> >> On 10/09/2015 03:17 AM, Ramana Radhakrishnan wrote:
>> >>> This started as a Friday afternoon project ...
>> >>>
>> >>> It turned out enabling VTV for AArch64 and ARM was a matter of fixing
>> >>> PR67868 which essentially comes from building libvtv with section
>> >>> anchors turned on. The problem was that the flow of control from
>> >>> output_object_block through to switch_section did not have the same
>> >>> special casing for the vtable section that exists in
>> >>> assemble_variable.
>> >> That's some ugly code.  You might consider factoring that code into a
>> >> function and just calling it from both places.  Your version doesn't seem 
>> >> to
>> >> handle PECOFF, so I'd probably refactor from assemble_variable.
>> >>
>> >
>> > I was a bit lazy as I couldn't immediately think of a target that would
>> > want PECOFF, section anchors and VTV. That combination seems to be quite
>> > rare, anyway point taken on the refactor.
>> >
>> > Ok if no regressions ?
>>
>> Ping.
>>
>> Ramana
>>
>> >
>> >>>
>> >>> However both these failures also occur on x86_64 - so I'm content to
>> >>> declare victory on AArch64 as far as basic enablement goes.
>> >> Cool.
>> >>
>> >>>
>> >>> 1. Are the generic changes to varasm.c ok ? 2. Can we take the
>> >>> AArch64 support in now, given this amount of testing ? Marcus /
>> >>> Caroline ? 3. Any suggestions / helpful debug hints for VTV debugging
>> >>> (other than turning VTV_DEBUG on and inspecting trace) ?
>> >> I think that with refactoring they'd be good to go.  No opinions on the
>> >> AArch64 specific question -- call for the AArch64 maintainers.
>> >>
>> >> Good to see someone hacking on vtv.  It's in my queue to look at as
>> >> well.
>> >
>> > Yeah figuring out more about vtv is also in my background queue.
>> >
>> > regards
>> > Ramana
>> >
>> > PR other/67868
>> >
>> > * varasm.c (assemble_variable): Move special vtv handling to..
>> > (handle_vtv_comdat_sections): .. here. New function.
>> > (output_object_block): Handle vtv sections.
>> >
>> > libvtv/Changelog
>> >
>> > * configure.tgt: Support aarch64 and arm.
>
>


[PATCH][AArch64] Replace insn to zero up DF register

2015-10-19 Thread Evandro Menezes
In the existing targets, it seems that it's always faster to zero up a 
DF register with "movi %d0, #0" instead of "fmov %d0, xzr".


This patch modifies the respective pattern.

Please, commit if it's alright.

Thank you,

--
Evandro Menezes

>From 429b1d70a7eca76c96250fec6ec5269a7a661a4c Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 19 Oct 2015 18:31:48 -0500
Subject: [PATCH] [AArch64] Replace insn to zero up DF register

gcc/
	* config/aarch64/aarch64.md
	(*movdf_aarch64): Add "movi %d0, #0" to zero up DF register.
---
 gcc/config/aarch64/aarch64.md | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5b7f2fd..5f00686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1156,21 +1156,22 @@
 )
 
 (define_insn "*movdf_aarch64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-	(match_operand:DF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=w,?r,w,w,w  ,w,m,r,m ,r")
+	(match_operand:DF 1 "general_operand"  "?r, w,w,Y,Ufc,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], DFmode)
 || aarch64_reg_or_fp_zero (operands[1], DFmode))"
   "@
fmov\\t%d0, %x1
fmov\\t%x0, %d1
fmov\\t%d0, %d1
+   movi\\t%d0, #0
fmov\\t%d0, %1
ldr\\t%d0, %1
str\\t%d1, %0
ldr\\t%x0, %1
str\\t%x1, %0
mov\\t%x0, %x1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\
+  [(set_attr "type" "f_mcr,f_mrc,fmov,neon_move,fconstd,\
  f_loadd,f_stored,load1,store1,mov_reg")]
 )
 
-- 
2.1.0.243.g30d45f7



Re: [PATCH v2] PR rtl-optimization/66790: uninitialized registers handling in REE

2015-10-19 Thread Pierre-Marie de Rodat

On 10/14/2015 09:41 AM, Bernd Schmidt wrote:

This one is OK with minor changes. I ran some tests with it, and the mir
sets look good this time. Code generation still seems unaffected by it
on all my example code (which is as expected).


Thank you very much for your help on this and for double-checking!


+  /* Ignoring artificial defs is intentionnal: these often pretend
that some


"intentional".


Fixed.


Please remove the commented out code and then also the unnecessary
braces. In general we avoid commented out code in gcc, but when doing
it, #if 0 is generally a better method.


Ok, this is removed.


+  const rtx reg = XEXP (src, 0);


Drop the const maybe? It doesn't seem to add much and the idiom is to
just use rtx.


Done.


Subject: [PATCH 2/2] DF_LIVE: make clobbers cancel effect of previous
GENs in
  the same BBs

gcc/ChangeLog:

* df-problems.c (df_live_bb_local_compute): Clear GEN bits for
DF_REF_MUST_CLOBBER references.


This one is probably ok too; I still want to experiment with it a little.


Sure, I only committed the attached updated first patch.

--
Pierre-Marie de Rodat
>From 0275c4a20a4b9daaefbbddd5306e9214e7d5d673 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Sat, 18 Jul 2015 13:10:45 +0200
Subject: [PATCH] REE: fix uninitialized registers handling

gcc/ChangeLog:

	PR rtl-optimization/66790
	* df.h (DF_MIR): New macro.
	(DF_LAST_PROBLEM_PLUS1): Update to be past DF_MIR
	(DF_MIR_INFO_BB): New macro.
	(DF_MIR_IN, DF_MIR_OUT): New macros.
	(struct df_mir_bb_info): New.
	(df_mir): New macro.
	(df_mir_add_problem, df_mir_simulate_one_insn): New forward
	declarations.
	(df_mir_get_bb_info): New.
	* df-problems.c (struct df_mir_problem_data): New.
	(df_mir_free_bb_info, df_mir_alloc, df_mir_reset,
	df_mir_bb_local_compute, df_mir_local_compute, df_mir_init,
	df_mir_confluence_0, df_mir_confluence_n,
	df_mir_transfer_function, df_mir_free, df_mir_top_dump,
	df_mir_bottom_dump, df_mir_verify_solution_start,
	df_mir_verify_solution_end): New.
	(problem_MIR): New.
	(df_mir_add_problem, df_mir_simulate_one_insn): New.
	* timevar.def (TV_DF_MIR): New.
	* ree.c: Include bitmap.h
	(add_removable_extension): Add an INIT_REGS parameter.  Use it
	to skip zero-extensions that may get an uninitialized register.
	(find_removable_extensions): Compute must-initialized registers
	using the MIR dataflow problem. Update the call to
	add_removable_extension.
	(find_and_remove_re): Call df_mir_add_problem.

gcc/testsuite/ChangeLog:

	* gnat.dg/opt50.adb: New test.
	* gnat.dg/opt50_pkg.adb: New helper.
	* gnat.dg/opt50_pkg.ads: New helper.
---
 gcc/ChangeLog   |  30 +++
 gcc/df-problems.c   | 403 
 gcc/df.h|  34 ++-
 gcc/ree.c   |  62 --
 gcc/testsuite/ChangeLog |   6 +
 gcc/testsuite/gnat.dg/opt50.adb |  23 ++
 gcc/testsuite/gnat.dg/opt50_pkg.adb |  48 +
 gcc/testsuite/gnat.dg/opt50_pkg.ads |  12 ++
 gcc/timevar.def |   1 +
 9 files changed, 605 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/opt50.adb
 create mode 100644 gcc/testsuite/gnat.dg/opt50_pkg.adb
 create mode 100644 gcc/testsuite/gnat.dg/opt50_pkg.ads

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6255b76..4279557 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2015-10-19  Pierre-Marie de Rodat  
+
+	PR rtl-optimization/66790
+	* df.h (DF_MIR): New macro.
+	(DF_LAST_PROBLEM_PLUS1): Update to be past DF_MIR
+	(DF_MIR_INFO_BB): New macro.
+	(DF_MIR_IN, DF_MIR_OUT): New macros.
+	(struct df_mir_bb_info): New.
+	(df_mir): New macro.
+	(df_mir_add_problem, df_mir_simulate_one_insn): New forward
+	declarations.
+	(df_mir_get_bb_info): New.
+	* df-problems.c (struct df_mir_problem_data): New.
+	(df_mir_free_bb_info, df_mir_alloc, df_mir_reset,
+	df_mir_bb_local_compute, df_mir_local_compute, df_mir_init,
+	df_mir_confluence_0, df_mir_confluence_n,
+	df_mir_transfer_function, df_mir_free, df_mir_top_dump,
+	df_mir_bottom_dump, df_mir_verify_solution_start,
+	df_mir_verify_solution_end): New.
+	(problem_MIR): New.
+	(df_mir_add_problem, df_mir_simulate_one_insn): New.
+	* timevar.def (TV_DF_MIR): New.
+	* ree.c: Include bitmap.h
+	(add_removable_extension): Add an INIT_REGS parameter.  Use it
+	to skip zero-extensions that may get an uninitialized register.
+	(find_removable_extensions): Compute must-initialized registers
+	using the MIR dataflow problem. Update the call to
+	add_removable_extension.
+	(find_and_remove_re): Call df_mir_add_problem.
+
 2015-10-19  Segher Boessenkool  
 
 	* common/config/mn10300/mn10300-common.c
diff --git a/gcc/df-problems.c b/gcc/df-problems.c
index 153732a..331fd87 100644
--- a/gcc/df-problems.c
+++ b/gcc/df-problems.c
@@ -1849,6 +1849,409 @@ df_live_verify_transfer_functions (void)
 }
 
 

[gomp4] loop cleanup

2015-10-19 Thread Nathan Sidwell

I've committed this to gomp4.

1) small cleanup combining the bodies of two identical conditionals.

2) replace and move the OpenACC thread numbering expanders to be nearer the now 
sole user.


nathan

2015-10-19  Nathan Sidwell  

	* omp-low.c (scan_omp_for): Combine OpenACC conditional.
	(expand_oacc_get_num_threads, expand_oacc_get_thread_num): Delete.
	(oacc_thread_numbers): New.
	(oacc_xform_loop): Correct comment, Use oacc_thread_numbers.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 229002)
+++ gcc/omp-low.c	(working copy)
@@ -2911,11 +2911,7 @@ scan_omp_for (gomp_for *stmt, omp_contex
 			"argument not permitted on %<%s%> clause in"
 			" OpenACC %", check);
 	  }
-}
 
-  if (is_gimple_omp_oacc (stmt))
-{
-  omp_context *tgt = enclosing_target_ctx (ctx);
   if (tgt && is_oacc_kernels (tgt))
 	{
 	  /* Strip out reductions, as they are not  handled yet.  */
@@ -5131,80 +5127,6 @@ is_atomic_compatible_reduction (tree var
 }
 
 
-/* Find the total number of threads used by a region partitioned by
-   GWV_BITS.  Setup code required for the calculation is added to SEQ.  Note
-   that this is currently used from both OMP-lowering and OMP-expansion phases,
-   and uses builtins specific to NVidia PTX: this will need refactoring into a
-   generic interface when support for other targets is added.   */
-
-static tree
-expand_oacc_get_num_threads (gimple_seq *seq, int gwv_bits)
-{
-  tree res = build_int_cst (unsigned_type_node, 1);
-  unsigned ix;
-
-  for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++)
-if (GOMP_DIM_MASK(ix) & gwv_bits)
-  {
-	tree arg = build_int_cst (integer_type_node, ix);
-	tree count = create_tmp_var (integer_type_node);
-	gimple *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg);
-	
-	gimple_call_set_lhs (call, count);
-	gimple_seq_add_stmt (seq, call);
-	res = fold_build2 (MULT_EXPR, integer_type_node, res, count);
-  }
-  
-  return res;
-}
-
-/* Find the current thread number to use within a region partitioned by
-   GWV_BITS.  Setup code required for the calculation is added to SEQ.  See
-   note for expand_oacc_get_num_threads above re: builtin usage.  */
-
-static tree
-expand_oacc_get_thread_num (gimple_seq *seq, int gwv_bits)
-{
-  tree res = NULL_TREE;
-  unsigned ix;
-
-  /* Start at gang level, and examine relevant dimension indices.  */
-  for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++)
-if (GOMP_DIM_MASK (ix) & gwv_bits)
-  {
-	tree arg = build_int_cst (unsigned_type_node, ix);
-
-	if (res)
-	  {
-	/* We had an outer index, so scale that by the size of
-	   this dimension.  */
-	tree n = create_tmp_var (integer_type_node);
-	gimple *call
-	  = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg);
-	
-	gimple_call_set_lhs (call, n);
-	gimple_seq_add_stmt (seq, call);
-	res = fold_build2 (MULT_EXPR, integer_type_node, res, n);
-	  }
-
-	/* Determine index in this dimension.  */
-	tree id = create_tmp_var (integer_type_node);
-	gimple *call = gimple_build_call_internal (IFN_GOACC_DIM_POS, 1, arg);
-	
-	gimple_call_set_lhs (call, id);
-	gimple_seq_add_stmt (seq, call);
-	if (res)
-	  res = fold_build2 (PLUS_EXPR, integer_type_node, res, id);
-	else
-	  res = id;
-  }
-
-  if (res == NULL_TREE)
-res = build_int_cst (integer_type_node, 0);
-
-  return res;
-}
-
 /* Lower the OpenACC reductions of CLAUSES for compute axis LEVEL
(which might be a placeholder).  INNER is true if this is an inner
axis of a multi-axis loop.  FORK and JOIN are (optional) fork and
@@ -16904,16 +16826,63 @@ make_pass_late_lower_omp (gcc::context *
   return new pass_late_lower_omp (ctxt);
 }
 
+/* Find the number of threads (POS = false), or thread number (POS =
+   tre) for an OpenACC region partitioned as MASK.  Setup code
+   required for the calculation is added to SEQ.  */
+
+static tree
+oacc_thread_numbers (bool pos, int mask, gimple_seq *seq)
+{
+  tree res = pos ? NULL_TREE :  build_int_cst (unsigned_type_node, 1);
+  unsigned ix;
+
+  /* Start at gang level, and examine relevant dimension indices.  */
+  for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++)
+if (GOMP_DIM_MASK (ix) & mask)
+  {
+	tree arg = build_int_cst (unsigned_type_node, ix);
+
+	if (res)
+	  {
+	/* We had an outer index, so scale that by the size of
+	   this dimension.  */
+	tree n = create_tmp_var (integer_type_node);
+	gimple *call
+	  = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg);
+	
+	gimple_call_set_lhs (call, n);
+	gimple_seq_add_stmt (seq, call);
+	res = fold_build2 (MULT_EXPR, integer_type_node, res, n);
+	  }
+	if (pos)
+	  {
+	/* Determine index in this dimension.  */
+	tree id = create_tmp_var (integer_type_node);
+	gimple *call = gimple_build_call_internal
+	  (IFN_GOACC_DIM_POS, 1, arg);
+
+	gimple_call_set_lhs (call, 

Re: [PATCH] [AArch64] Distinct costs for sign and zero extension

2015-10-19 Thread Andrew Pinski
On Tue, Oct 20, 2015 at 7:10 AM, Evandro Menezes  wrote:
> Some micro-architectures may favor one of sign or zero extension over the
> other in the base plus extended register offset addressing mode.

Yes I was going to create the same patch as ThunderX is one of those
micro-architectures.

Thanks,
Andrew

>
> This patch separates the member "register_extend" of the structure
> "cpu_addrcost_table" into two, one for sign and the other zero extension.
>
> Please, commit if it's alright.
>
> Thank you,
>
> --
> Evandro Menezes
>


  1   2   >