[PATCH] ARC: configure script to allow non uclibc based triplets

2016-05-19 Thread Vineet Gupta
gcc/
2016-05-20  Vineet Gupta 

* config.gcc: Remove uclibc from arc target spec

Signed-off-by: Vineet Gupta 
---
 gcc/config.gcc | 2 +-
 libgcc/config.host | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 9ca5c6ed71d8..f88d1dfa23df 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -889,7 +889,7 @@ arc*-*-elf*)
big*)   
tm_defines="DRIVER_ENDIAN_SELF_SPECS=\\\"%{!EL:%{!mlittle-endian:-mbig-endian}}\\\"
 ${tm_defines}"
esac
;;
-arc*-*-linux-uclibc*)
+arc*-*-linux*)
extra_headers="arc-simd.h"
tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h ${tm_file}"
tmake_file="${tmake_file} arc/t-arc-uClibc arc/t-arc"
diff --git a/libgcc/config.host b/libgcc/config.host
index e7683898f82b..d3c9c71bb042 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -322,7 +322,7 @@ arc*-*-elf*)
tmake_file="arc/t-arc-newlib arc/t-arc"
extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o 
libgmon.a crtg.o crtgend.o crttls_r25.o crttls_r30.o"
;;
-arc*-*-linux-uclibc*)
+arc*-*-linux*)
tmake_file="${tmake_file} t-slibgcc-libgcc t-slibgcc-nolc-override 
arc/t-arc700-uClibc arc/t-arc"
extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o 
libgmon.a crtg.o crtgend.o"
;;
-- 
2.5.0



Re: [C++ PATCH] PR c++/69855

2016-05-19 Thread Ville Voutilainen
On 19 May 2016 at 19:40, Jason Merrill  wrote:
> On 05/05/2016 09:11 AM, Ville Voutilainen wrote:
>>
>> On 5 May 2016 at 13:36, Paolo Carlini  wrote:
>>>
>>> .. minor nit: the new testcase has a number of trailing blank lines.
>>
>>
>> New patch attached. :)
>
>
> Sorry for the delay.
>
> Please use ".diff" for patches so that they are properly recognized as
> text/x-patch.
>
> The patch looks good, but it could use a comment explaining what it's doing.

I'll add some comments into it.

> Any thoughts on doing something similar for extern variable declarations?

Ah, we diagnose local extern variable declarations that clash with
previous declarations,
but we don't diagnose cases where a subsequent declaration clashes
with a previous
local extern declaration. I'll take a look.


[PATCH] Fix PR tree-optimization/71179

2016-05-19 Thread Kugan Vivekanandarajah
Hi,

We don’t allow vector type for integer. Likewise I am also disallowing
the floating point vector type in transforming repeated addition to
multiplication.

This can be relaxed. I will send a separate patch to allow integer and
floating point vectorization later.

Bootstrapped and regression tested on x86-64-linux-gnu with no new regressions.

Is this OK for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-05-20  Kugan Vivekanandarajah  

* gcc.dg/tree-ssa/pr71179.c: New test.

gcc/ChangeLog:

2016-05-20  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (transform_add_to_multiply): Disallow float
VECTOR type.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71179.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71179.c
index e69de29..885c643 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71179.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71179.c
@@ -0,0 +1,10 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math" } */
+
+typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
+
+__m128 foo (__m128 a)
+{
+  return a + a;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 3b5f36b..0c25a8c 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1769,7 +1769,8 @@ transform_add_to_multiply (gimple *stmt, 
vec *ops)
   bool changed = false;
 
   if (!INTEGRAL_TYPE_P (TREE_TYPE ((*ops)[0]->op))
-  && !flag_unsafe_math_optimizations)
+  && (!SCALAR_FLOAT_TYPE_P (TREE_TYPE ((*ops)[0]->op))
+ || !flag_unsafe_math_optimizations))
 return false;
 
   /* Look for repeated operands.  */


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-19 Thread Jeff Law

On 05/19/2016 05:14 PM, Nathan Sidwell wrote:

On 05/19/16 15:25, Jeff Law wrote:

On 05/19/2016 12:40 PM, Aaron Conole wrote:



I'm happy to report that I did send in some FSF paperwork this week.
Hopefully it is on record now, but even if it isn't I live a train ride
away from the FSF headquarters so I'd be happy to take the time to make
sure it's all signed correctly.





Also note that Aaron works for Red Hat and should be covered by our
existing
assignments.


Yeah, Aaron hit  me with that clue bat privately -- I'd just grepped his
name and not noticed the email!  I thought we were all settled, sorry if
that wasn't clear.

No worries.  I didn't realize it either at first :-)

jeff


Re: [PATCH] Fix PR tree-optimization/71170

2016-05-19 Thread Kugan Vivekanandarajah
Hi Richard,

> I think it should have the same rank as op or op + 1 which is the current
> behavior.  Sth else doesn't work correctly here I think, like inserting the
> multiplication not near the definition of op.
>
> Well, the whole "clever insertion" logic is simply flawed.

What I meant to say was that the simple logic we have now wouldn’t
work. "clever logic" is knowing where exactly where it is needed and
inserting there.  I think thats what  you are suggesting below in a
simple to implement way.

> I'd say that ideally we would delay inserting the multiplication to
> rewrite_expr_tree time.  For example by adding a ops->stmt_to_insert
> member.
>

Here is an implementation based on above. Bootstrap on x86-linux-gnu
is OK. regression testing is ongoing.

Thanks,
Kugan

gcc/ChangeLog:

2016-05-20  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (struct operand_entry): Add field stmt_to_insert.
(add_to_ops_vec): Add stmt_to_insert.
(add_repeat_to_ops_vec): Init stmt_to_insert.
(transform_add_to_multiply): Remove mult_stmt insertion and add it
to ops vector.
(get_ops): Init stmt_to_insert.
(maybe_optimize_range_tests): Likewise.
(rewrite_expr_tree): Insert  stmt_to_insert before use stmt.
(rewrite_expr_tree_parallel): Likewise.
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 3b5f36b..69441ce 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -195,6 +195,7 @@ struct operand_entry
   int id;
   tree op;
   unsigned int count;
+  gimple *stmt_to_insert;
 };
 
 static object_allocator operand_entry_pool
@@ -553,7 +554,7 @@ sort_by_operand_rank (const void *pa, const void *pb)
 /* Add an operand entry to *OPS for the tree operand OP.  */
 
 static void
-add_to_ops_vec (vec *ops, tree op)
+add_to_ops_vec (vec *ops, tree op, gimple *stmt_to_insert = 
NULL)
 {
   operand_entry *oe = operand_entry_pool.allocate ();
 
@@ -561,6 +562,7 @@ add_to_ops_vec (vec *ops, tree op)
   oe->rank = get_rank (op);
   oe->id = next_operand_entry_id++;
   oe->count = 1;
+  oe->stmt_to_insert = stmt_to_insert;
   ops->safe_push (oe);
 }
 
@@ -577,6 +579,7 @@ add_repeat_to_ops_vec (vec *ops, tree op,
   oe->rank = get_rank (op);
   oe->id = next_operand_entry_id++;
   oe->count = repeat;
+  oe->stmt_to_insert = NULL;
   ops->safe_push (oe);
 
   reassociate_stats.pows_encountered++;
@@ -1810,21 +1813,12 @@ transform_add_to_multiply (gimple *stmt, 
vec *ops)
ops->unordered_remove (i);
   tree tmp = make_ssa_name (TREE_TYPE (op));
   tree cst = build_int_cst (integer_type_node, count);
-  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
   gassign *mul_stmt
= gimple_build_assign (tmp, MULT_EXPR,
   op, fold_convert (TREE_TYPE (op), cst));
-  if (gimple_code (def_stmt) == GIMPLE_NOP
- || gimple_bb (stmt) != gimple_bb (def_stmt))
-   {
- gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
- gimple_set_uid (mul_stmt, gimple_uid (stmt));
- gsi_insert_before (, mul_stmt, GSI_NEW_STMT);
-   }
-  else
-   insert_stmt_after (mul_stmt, def_stmt);
+  gimple_set_uid (mul_stmt, gimple_uid (stmt));
   gimple_set_visited (mul_stmt, true);
-  add_to_ops_vec (ops, tmp);
+  add_to_ops_vec (ops, tmp, mul_stmt);
   changed = true;
 }
 
@@ -3224,6 +3218,7 @@ get_ops (tree var, enum tree_code code, vec *ops,
oe->rank = code;
oe->id = 0;
oe->count = 1;
+   oe->stmt_to_insert = NULL;
ops->safe_push (oe);
   }
   return true;
@@ -3464,6 +3459,7 @@ maybe_optimize_range_tests (gimple *stmt)
  oe->rank = code;
  oe->id = 0;
  oe->count = 1;
+ oe->stmt_to_insert = NULL;
  ops.safe_push (oe);
  bb_ent.last_idx++;
}
@@ -3501,6 +3497,7 @@ maybe_optimize_range_tests (gimple *stmt)
 is.  */
  oe->id = bb->index;
  oe->count = 1;
+ oe->stmt_to_insert = NULL;
  ops.safe_push (oe);
  bb_ent.op = NULL;
  bb_ent.last_idx++;
@@ -3798,6 +3795,19 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
   oe1 = ops[opindex];
   oe2 = ops[opindex + 1];
 
+  /* If the stmt that defines operand has to be inserted, insert it
+before the use.  */
+  if (oe1->stmt_to_insert)
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ gsi_insert_before (, oe1->stmt_to_insert, GSI_NEW_STMT);
+   }
+  if (oe2->stmt_to_insert)
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ gsi_insert_before (, oe2->stmt_to_insert, GSI_NEW_STMT);
+   }
+
   if (rhs1 != oe1->op || rhs2 != oe2->op)
{
  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
@@ -3855,6 +3865,14 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
   /* Rewrite the next operator.  */
   oe = ops[opindex];
 
+  /* 

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Nathan Sidwell

On 05/19/16 14:42, Cesar Philippidis wrote:


+  "operands[2] = make_safe_from (operands[2], operands[0]);"



Please use { ... } rather than "" for readability. Ok  with that change.

nathan


PR71206: inconsistent types after match.pd transformation

2016-05-19 Thread Marc Glisse

Hello,

this was bootstrapped and regtested on powerpc64le-unknown-linux-gnu.

2016-05-20  Marc Glisse  

gcc/
* match.pd ((X ^ Y) ^ (X ^ Z)): Convert the arguments.

gcc/testsuite/
* gcc.dg/tree-ssa/pr71206.c: New testcase.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 236489)
+++ gcc/match.pd	(working copy)
@@ -733,21 +733,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
(if (single_use (@5) && single_use (@6))
 (op @3 (convert @2))
 (if (single_use (@3) && single_use (@4))
  (op (convert @1) @5))
 /* (X ^ Y) ^ (X ^ Z) -> Y ^ Z  */
 (simplify
  (bit_xor (convert1? (bit_xor:c @0 @1)) (convert2? (bit_xor:c @0 @2)))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@1))
   && tree_nop_conversion_p (type, TREE_TYPE (@2)))
-  (convert (bit_xor @1 @2
+  (bit_xor (convert @1) (convert @2
 
 (simplify
  (abs (abs@1 @0))
  @1)
 (simplify
  (abs (negate @0))
  (abs @0))
 (simplify
  (abs tree_expr_nonnegative_p@0)
  @0)
Index: gcc/testsuite/gcc.dg/tree-ssa/71206.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/71206.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/71206.c	(working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+int f(int d, unsigned b) {
+int i2 = b ^ 1;
+int i4 = d ^ 1;
+return i2 ^ i4;
+}


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-19 Thread Nathan Sidwell

On 05/19/16 14:40, Aaron Conole wrote:

Nathan Sidwell  writes:



+FILE *__gcov_error_file = NULL;


Unless I'm missing something, isn't this only accessed from this file?
(So could be static with a non-underbarred name)


Ack.


I have a vague memory that perhaps the __gcov_error_file is seen from other 
dynamic objects, and one of them gets to open/close it?  I think the closing 
function needs to reset it to NULL though?  (In case it's reactivated before the 
process exits)





And this protection here, makes me wonder what happens if one is
IN_GCOV_TOOL. Does it pay attention to GCOV_ERROR_FILE?  That would
seem incorrect, and thus the above should be changed so that stderr is
unconditionally used when IN_GCOV_TOOL?


You are correct.  I will fix it.


thanks.


+static void
+gcov_error_exit(void)
+{
+  if (__gcov_error_file && __gcov_error_file != stderr)
+{


Braces are not needed here.


Unless of course my speculation about setting it to NULL is right.

nathan


Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-19 Thread Nathan Sidwell

On 05/19/16 15:25, Jeff Law wrote:

On 05/19/2016 12:40 PM, Aaron Conole wrote:



I'm happy to report that I did send in some FSF paperwork this week.
Hopefully it is on record now, but even if it isn't I live a train ride
away from the FSF headquarters so I'd be happy to take the time to make
sure it's all signed correctly.





Also note that Aaron works for Red Hat and should be covered by our existing
assignments.


Yeah, Aaron hit  me with that clue bat privately -- I'd just grepped his name 
and not noticed the email!  I thought we were all settled, sorry if that wasn't 
clear.


nathan


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Nathan Sidwell

On 05/18/16 23:42, Cesar Philippidis wrote:


+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+   UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+{
+  emit_insn (gen_sinsf2 (operands[1], operands[2]));
+  emit_insn (gen_cossf2 (operands[0], operands[2]));
+
+  DONE;
+})


Why the emit_insn code?  that seems to be replicating the RTL
representation -- you're saying the same thing twice.


You've not answered this.  I always find it confusing when insn expansions say 
the same thing in the RTL represntation as the C fragment is generating.



My intent was to verify that I got the sin and cos arguments right,
i.e., make sure that this sincos expansion didn't mix up sin(x) with
cos(x). I guess I can create a test that uses vprintf and scans


That's a good idea, but I think you need  a much clearer testcase. It'd be 
better to check both the sin and cos outputs (and, that's going to be 
'interesting' making sure only one path gets to be seen by the sincos optimizer 
-- but not difficult.


nathan


Re: [PATCH] Drop excess size used for run time allocated stack variables.

2016-05-19 Thread Jeff Law

On 05/03/2016 08:17 AM, Dominik Vogt wrote:

Version two of the patch including a test case.

On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:

On 04/29/2016 04:12 PM, Dominik Vogt wrote:

The attached patch removes excess stack space allocation with
alloca in some situations.  Plese check the commit message in the
patch for details.



However, I would strongly recommend some tests, even if they are
target specific.  You can always copy pr36728-1 into the s390x
directory and look at size of the generated stack.  Simliarly for
pr50938 for x86.


However, x86 uses the "else" branch in round_push, i.e. it uses
"virtual_preferred_stack_boundary_rtx" to calculate the number of
bytes to add for stack alignment.  That value is unknown at the
time round_push is called, so the test case fails on such targets,
and I've no idea how to fix this properly.
I'm not going to be able to complete a review of this today.  As I dig 
into the history of this code I came across pr34548, pr47353, & pr46894 
(and their associated discussions on the lists) that we'll need to 
verify we don't break.  This is a bit of a mess and I think the code 
needs some TLC before we start hacking it up further.


Let's start with clean up of dead code:

 /* We will need to ensure that the address we return is aligned to
 REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
 always know its final value at this point in the compilation (it
 might depend on the size of the outgoing parameter lists, for
 example), so we must align the value to be returned in that case.
 (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
 STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
 We must also do an alignment operation on the returned value if
 the stack pointer alignment is less strict than REQUIRED_ALIGN.

 If we have to align, we must leave space in SIZE for the hole
 that might result from the alignment operation.  */

  must_align = (crtl->preferred_stack_boundary < required_align);
  if (must_align)
{
  if (required_align > PREFERRED_STACK_BOUNDARY)
extra_align = PREFERRED_STACK_BOUNDARY;
  else if (required_align > STACK_BOUNDARY)
extra_align = STACK_BOUNDARY;
  else
extra_align = BITS_PER_UNIT;
}

  /* ??? STACK_POINTER_OFFSET is always defined now.  */
#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
  must_align = true;
  extra_align = BITS_PER_UNIT;
#endif

If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So 
all the code above I think collapses to:


  must_align = true;
  extra_align = BITS_PER_UNIT

And the only other assignment to must_align assigns it the value "true". 
 There are two conditionals on must_align that looks like


if (must_align)
  {
CODE;
  }

We should remove the conditional and pull CODE out an indentation level. 
 And remove all remnants of must_align.


I don't think that changes your patch in any way.  Hopefully it makes 
the whole function somewhat easier to grok.


Thoughts?

jeff



Re: [PATCH 2/3] Implement CALL_EXPR_MUST_TAIL_CALL

2016-05-19 Thread Jeff Law

On 05/17/2016 04:01 PM, David Malcolm wrote:

This patch implements support for marking CALL_EXPRs
as being mandatory for tail-call-optimization. expand_call
tries harder to perform the optimization on such CALL_EXPRs,
and issues an error if it fails.

Currently this flag isn't accessible from any frontend,
so the patch uses a plugin for testing the functionality.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 8 PASS results to gcc.sum.

OK for trunk?

gcc/ChangeLog:
* calls.c (maybe_complain_about_tail_call): New function.
(initialize_argument_information): Call
maybe_complain_about_tail_call when clearing *may_tailcall.
(can_implement_as_sibling_call_p): Call
maybe_complain_about_tail_call when returning false.
(expand_call): Read CALL_EXPR_MUST_TAIL_CALL and, if set,
ensure try_tail_call is set.  Call maybe_complain_about_tail_call
if tail-call optimization fails.
* cfgexpand.c (expand_call_stmt): Initialize
CALL_EXPR_MUST_TAIL_CALL from gimple_call_must_tail_p.
* gimple-pretty-print.c (dump_gimple_call): Dump
gimple_call_must_tail_p.
* gimple.c (gimple_build_call_from_tree): Call
gimple_call_set_must_tail with the value of
CALL_EXPR_MUST_TAIL_CALL.
* gimple.h (enum gf_mask): Add GF_CALL_MUST_TAIL_CALL.
(gimple_call_set_must_tail): New function.
(gimple_call_must_tail_p): New function.
* print-tree.c (print_node): Update printing of TREE_STATIC
to reflect its use for CALL_EXPR_MUST_TAIL_CALL.
* tree-core.h (struct tree_base): Add MUST_TAIL_CALL to the
trailing comment listing applicable flags.
* tree.h (CALL_EXPR_MUST_TAIL_CALL): New macro.
It's actually simpler than it looks -- most of the changes are just 
getting better diagnostics when tail call fails, which I wholeheartedly 
support.


OK for the trunk.


Thanks,
Jeff



Re: Revert gcc r227962

2016-05-19 Thread JonY
On 5/20/2016 02:11, Jeff Law wrote:
> So if we make this change (revert 227962), my understanding is that
> cygwin bootstraps will fail because they won't find kernel32 and perhaps
> other libraries.
> 
> Jeff
> 

I'll need to double check with trunk but gcc-5.3.0 built OK without it.
The other alternative is to search /usr/lib before w32api.




signature.asc
Description: OpenPGP digital signature


Re: [ping][patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-19 Thread Cesar Philippidis
Ping.

Cesar

On 05/10/2016 01:29 PM, Cesar Philippidis wrote:
> Pointers are special in OpenACC. Depending on the context, they can
> either be treated as a "scalar" or as special firstprivate pointer. This
> is in contrast to OpenMP target pointers, which are always treated as
> firstprivate pointers if I'm not mistaken. The difference between a
> firstprivate scalar and pointer is that the contents of a firstprivate
> scalar are preserved on the accelerator, whereas a firstprivate pointer
> gets remapped by the runtime to point to an address in the device's
> address space. Here are the rules for pointers that I worked out with
> the ACC technical committee.
> 
>   1) pointers used in subarrays shall be treated as firstprivate
>  pointers
> 
>   2) all other pointers are scalars
> 
> There is an exception to 2) when a pointer appears inside a data region.
> E.g.
> 
>   #pragma acc data copy (ptr[0:100])
>   {
> #pragma acc parallel loop
> for (i = ...)
>   ptr[i] = ...
>   }
> 
> Here the compiler should detect that ptr is nested inside an acc data
> region, and add an implicit present(ptr[0:100]) clause to it, and not
> present(ptr). Note that the implicit data clause rule only applies to
> lexically scoped offloaded regions inside acc data regions. E.g.
> 
>   foo (int *ptr)
>   {
> ...
> #pragma acc parallel loop
> for (i = ...)
>ptr[i] = ...
>   }
> 
>   bar ()
>   {
> ...
> #pragma acc data copy (ptr[0:100])
> {
>   foo (ptr);
> }
>   }
> 
> will result in an implicit firstprivate(ptr) clause of the scalar
> variety, not a firstprivate_pointer(ptr).
> 
> The attached patch updates gcc to implement this behavior. Currently,
> gcc treats all pointers as scalars in OpenACC. So, if you have a
> subarray involving a data mapping pcopy(p[5:10]), the gimplifier would
> translate this clause into
> 
>   map(tofrom:*(p + 3) [len: 5]) map(alloc:p [pointer assign, bias: 3])
> 
> The alloc pointer map is a problem, especially with subarrays, because
> it effectively breaks all of the acc_* library functions involving
> subarrays. This patch changes that alloc map clause into a
> map(firstprivate:c [pointer assign, bias: 3]).
> 
> This patch also corrects behavior of the acc_* library functions when
> they deal with shared-memory targets. Before, most of those libraries
> were incorrectly trying to create data maps for shared-memory targets.
> The new behavior is to propagate the the host pointers where applicable,
> bypassing the data map altogether.
> 
> Since I had to change so many existing compiler test cases, I also took
> the liberty to make some of warning and error messages generated by the
> c and c++ front ends more specific to OpenACC. In the c++ front end, I
> went one step further and added preliminary checking for duplicate
> reference-typed data mappings. This check is not that exhaustive though,
> but I did include a test case for OpenMP.
> 
> It should be noted that I still need to update the behavior of subarray
> pointers in fortran. I'm just waiting until for the OpenMP 4.5 fortran
> changes to land in trunk first.
> 
> Is this patch OK for trunk?
> 
> Cesar
> 



Re: [PATCH, rs6000] Add support for int versions of vec_addec

2016-05-19 Thread Segher Boessenkool
On Thu, May 19, 2016 at 04:16:53PM -0500, Bill Seurer wrote:
> 2016-05-19  Bill Seurer  
> 
>   * config/rs6000/rs6000-builtin.def (vec_addec): Change vec_addec to a
>   special case builtin.
>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
>   support for ALTIVEC_BUILTIN_VEC_ADDEC.
>   * config/rs6000/rs6000.c (altivec_init_builtins): Add definition
>   for __builtin_vec_addec.
> 
> [gcc/testsuite]
> 
> 2016-05-19  Bill Seurer  
> 
>   * gcc.target/powerpc/vec-addec.c: New test.
>   * gcc.target/powerpc/vec-addec-int128.c: New test.

Okay for trunk.  A few trivial formatting nits left...

> +  /* All 3 arguments must be vectors of (signed or unsigned) (int or
> + __int128) and the types must match.  */
> +  if (arg0_type != arg1_type || arg1_type != arg2_type)
> + goto bad; 
> +  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> + goto bad; 
> +
> +  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
> + {
> +   /* For {un}signed ints, 

Trailing spaces (this last line and the two "bad"s).

> + arg0 = save_expr(arg0);
> + arg1 = save_expr(arg1);

Missing space before the (.

> + tree const1 = build_int_cstu(TREE_TYPE (arg0_type), 1);

And here.

> +   case TImode: 

Trailing space.

Thanks,


Segher


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-19 Thread Segher Boessenkool
On Thu, May 19, 2016 at 04:00:22PM -0600, Jeff Law wrote:
> > * function.c (make_epilogue_seq): Remove epilogue_end parameter.
> > (thread_prologue_and_epilogue_insns): Remove bb_flags.  Restructure
> > code.  Ignore sibcalls on EDGE_IGNORE edges.
> > * shrink-wrap.c (handle_simple_exit): New function.  Set EDGE_IGNORE
> > on edges for sibcalls that run without prologue.  The rest of the
> > function is combined from...
> > (fix_fake_fallthrough_edge): ... this, and ...
> > (try_shrink_wrapping): ... a part of this.  Remove the bb_with
> > function argument, make it a local variable.
> For some reason I found this patch awful to walk through.  In 
> retrospect, it might have been better break this down further. Not 
> because it's conceptually difficult to follow, but because the diffs 
> themselves are difficult to read.

Yeah, I should have realised that because the changelog was hard to write.

> I kept slicing out hunks when I could pair up the original code to its 
> new functional equivalent and hunks which were just "fluff" and kept 
> iterating until there was nothing left that seemed unreasonable.
> 
> OK for the trunk, but please watch closely for any fallout.

Thanks, and I will!


Segher


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-19 Thread Jeff Law

On 05/16/2016 07:09 PM, Segher Boessenkool wrote:

This patch restructures how the prologues/epilogues are inserted.  Sibcalls
that run without prologue are now handled in shrink-wrap.c; it communicates
what is already handled by setting the EDGE_IGNORE flag.  The
try_shrink_wrapping function then doesn't need to be passed the bb_flags
anymore.

Tested like the previous two patches; is this okay for trunk?


Segher


2016-05-16  Segher Boessenkool  

* function.c (make_epilogue_seq): Remove epilogue_end parameter.
(thread_prologue_and_epilogue_insns): Remove bb_flags.  Restructure
code.  Ignore sibcalls on EDGE_IGNORE edges.
* shrink-wrap.c (handle_simple_exit): New function.  Set EDGE_IGNORE
on edges for sibcalls that run without prologue.  The rest of the
function is combined from...
(fix_fake_fallthrough_edge): ... this, and ...
(try_shrink_wrapping): ... a part of this.  Remove the bb_with
function argument, make it a local variable.
For some reason I found this patch awful to walk through.  In 
retrospect, it might have been better break this down further. Not 
because it's conceptually difficult to follow, but because the diffs 
themselves are difficult to read.


I kept slicing out hunks when I could pair up the original code to its 
new functional equivalent and hunks which were just "fluff" and kept 
iterating until there was nothing left that seemed unreasonable.


OK for the trunk, but please watch closely for any fallout.

jeff



Re: [PATCH, libstdc++] Add missing atomic-builtins argument to experimental/memory_resource/1.cc

2016-05-19 Thread Jonathan Wakely
On 19 May 2016 at 14:05, Thomas Preudhomme
 wrote:
> Hi Jonathan,
>
> The dg-require-atomic-builtins in experimental/memory_resource/1.cc does not
> currently work as intended because it is missing its argument. This patch 
> fixes
> that.

Oops.

> ChangeLog entry is as follows:
>
> *** libstdc++-v3/ChangeLog ***
>
> 2016-05-18  Thomas Preud'homme  
>
> * testsuite/experimental/memory_resource/1.cc: Add required argument
> to dg-require-atomic-builtins.
>
>
> diff --git a/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
> b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
> index
> 22d4e0d966f9c83bb1341137d62c834b441c08f3..08c02e5e31b287cb678c12f499a985899e612748
> 100644
> --- a/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
> +++ b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
> @@ -1,5 +1,5 @@
>  // { dg-options "-std=gnu++14" }
> -// { dg-require-atomic-builtins }
> +// { dg-require-atomic-builtins "" }
>
>  // Copyright (C) 2015-2016 Free Software Foundation, Inc.
>  //
>
>
> Is this ok for trunk?

OK, thanks.


Re: [PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures

2016-05-19 Thread Jasmin J.

Hi!

Ping!

Attached is a rebased version of my patch due to commit
  33ac16c8cc870229a6a08cd7037275b01e7a0b9d

*** gcc/ChangeLog ***

2016-04-19  Thomas Preud'homme  
Jasmin Jessich 

 * config.gcc: Handle bare-metal multilibs in --with-multilib-list
 option.
 * config/arm/t-baremetal: New file.
 * configure.ac: added comment for ARM in --with-multilib-list option.
 * configure: added comment for ARM in --with-multilib-list option.

BR
Jasmin

***

On 03/04/2016 01:19 AM, Jasmin J. wrote:

Hi all!


As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built.

As Ramana asked in his answer to my first version of the patch: Why?
The GCC mechanism to forward this to the t-* makefile is "TM_MULTILIB_CONFIG"
(as far as I have understand it). It is not necessary to introduce a new
variable to configure and Makefile.

Ramana mentioned also:

... as well as comments up top to explain what multilibs are being
built .


Additionally the error message "You cannot use any of ..." didn't print the
the right text in any case.

Attached is an improved version of this patch:
- it uses TM_MULTILIB_CONFIG
- fixed the error message "You cannot use any of ..."
- made the error message "Error:  not supported." more clear
- added a FSF copyright header to t-baremetal file and described what is
   built there
- commented out armv8-m.base and armv8-m.main, because this is currently not
   available in GCC mainline and gcc 5.3.0 release, but will be added soon
   (I guess)

Ramana mentioned in another message a test of the new options:
- I did test it with "test_arm_none_eabi.sh"; procedure taken from this
   message: https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00659.html
- The result is in "test_result.txt".
(both files attached also)

My copyright assignment number: 1059920

Please note, that the patch
   "[PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs"
   from 12/16/2015 12:58 PM
needs to be applied before my new version of this patch.

BR
Jasmin

**

On 12/16/2015 01:04 PM, Thomas Preud'homme wrote:

Hi Ramana,

As suggested in your initial answer to this thread, we updated the multilib
patch provided in ARM's embedded branch to be up-to-date with regards to
supported CPUs in GCC. As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built. To this effect, it takes a list of
architecture at configure time and that list needs to be passed down to
t-baremetal Makefile to set the multilib variables appropriately.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-12-15  Thomas Preud'homme  

 * Makefile.in (with_multilib_list): New variables substituted by
 configure.
 * config.gcc: Handle bare-metal multilibs in --with-multilib-list
 option.
 * config/arm/t-baremetal: New file.
 * configure.ac (with_multilib_list): New AC_SUBST.
 * configure: Regenerate.
 * doc/install.texi (--with-multilib-list): Update description for
 arm*-*-* targets to mention bare-metal multilibs.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 
1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be0662d3ac9ffbc94e919d
 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt 
$(srcdir)/common.opt
  lang_specs_files=@lang_specs_files@
  lang_tree_files=@lang_tree_files@
  target_cpu_default=@target_cpu_default@
+with_multilib_list=@with_multilib_list@
  OBJC_BOEHM_GC=@objc_boehm_gc@
  extra_modes_file=@extra_modes_file@
  extra_opt_files=@extra_opt_files@
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb10590db1ec86d48e9fcdbc
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3787,15 +3787,25 @@ case "${target}" in
default)
;;
*)
-   echo "Error: --with-multilib-list=${with_multilib_list} not 
supported." 1>&2
-   exit 1
+   for arm_multilib in ${arm_multilibs}; do
+   case ${arm_multilib} in
+   armv6-m | armv7-m | armv7e-m | armv7-r 
| armv8-m.base | armv8-m.main)
+   
tmake_profile_file="arm/t-baremetal"
+   ;;
+   *)
+   echo "Error: 

Re: [PATCH 1/3] Introduce can_implement_as_sibling_call_p

2016-05-19 Thread Jeff Law

On 05/17/2016 04:01 PM, David Malcolm wrote:

This patch moves part of the logic for determining if tail
call optimizations are possible to a new helper function.

There are no functional changes.

expand_call is 1300 lines long, so there's arguably a
case for doing this on its own, but this change also
enables the followup patch.

The patch changes the logic from a big "if" with joined
|| clauses:

  if (first_problem ()
  ||second_problem ()
  /* ...etc... */
  ||final_problem ())
 try_tail_call = 0;

to a series of separate tests:

  if (first_problem ())
return false;
  if (second_problem ())
return false;
  /* ...etc... */
  if (final_problem ())
return false;

I think the latter form has several advantages over the former:
- IMHO it's easier to read
- it makes it easy to put breakpoints on individual causes of failure
- it makes it easy to put specific error messages on individual causes
  of failure (as done in the followup patch).

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* calls.c (expand_call): Move "Rest of purposes for tail call
optimizations to fail" to...
(can_implement_as_sibling_call_p): ...this new function, and
split into multiple "if" statements.
This is good in and of itself as a refactoring/cleanup change.  That 
code has grown quite a list of "don't tailcall when ..." cases.


OK for the trunk.

jeff



Re: [PATCH, rs6000] Add support for int versions of vec_addec

2016-05-19 Thread Bill Seurer
Here is an updated patch addressing all of Segher's comments:

This patch adds support for the signed and unsigned int versions of the
vec_addec altivec builtins from the Power Architecture 64-Bit ELF V2 ABI
OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1). There are
many of the builtins that are missing and this is part of a series
of patches to add them.

There aren't instructions for the int versions of vec_addec so the
output code is built from other built-ins that do have instructions
which in this case is the following.

vec_addec (va, vb, carryv) == vec_or (vec_addc (va, vb),
vec_addc(vec_add(va, vb),
 vec_and (carryv, 0x1)))

The new test cases are executable tests which verify that the generated
code produces expected values. C macros were used so that the same
test case could be used for both the signed and unsigned versions. An
extra executable test case is also included to ensure that the modified
support for the __int128 versions of vec_addec is not broken. The same
test case could not be used for both int and __int128 because of some
differences in loading and storing the vectors.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk?

[gcc]

2016-05-19  Bill Seurer  

* config/rs6000/rs6000-builtin.def (vec_addec): Change vec_addec to a
special case builtin.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
support for ALTIVEC_BUILTIN_VEC_ADDEC.
* config/rs6000/rs6000.c (altivec_init_builtins): Add definition
for __builtin_vec_addec.

[gcc/testsuite]

2016-05-19  Bill Seurer  

* gcc.target/powerpc/vec-addec.c: New test.
* gcc.target/powerpc/vec-addec-int128.c: New test.


Index: /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-builtin.def
===
--- /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-builtin.def   
(revision 236484)
+++ /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-builtin.def   
(working copy)
@@ -991,7 +991,6 @@ BU_ALTIVEC_X (VEC_EXT_V4SF, "vec_ext_v4sf", CO
before we get to the point about classifying the builtin type.  */
 
 /* 3 argument Altivec overloaded builtins.  */
-BU_ALTIVEC_OVERLOAD_3 (ADDEC, "addec")
 BU_ALTIVEC_OVERLOAD_3 (MADD,   "madd")
 BU_ALTIVEC_OVERLOAD_3 (MADDS,  "madds")
 BU_ALTIVEC_OVERLOAD_3 (MLADD,  "mladd")
@@ -1177,6 +1176,7 @@ BU_ALTIVEC_OVERLOAD_P (VCMPGE_P,   "vcmpge_p")
 
 /* Overloaded Altivec builtins that are handled as special cases.  */
 BU_ALTIVEC_OVERLOAD_X (ADDE,  "adde")
+BU_ALTIVEC_OVERLOAD_X (ADDEC, "addec")
 BU_ALTIVEC_OVERLOAD_X (CTF,   "ctf")
 BU_ALTIVEC_OVERLOAD_X (CTS,   "cts")
 BU_ALTIVEC_OVERLOAD_X (CTU,   "ctu")
Index: /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-c.c
===
--- /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-c.c   (revision 
236484)
+++ /home/seurer/gcc/gcc-checkin/gcc/config/rs6000/rs6000-c.c   (working copy)
@@ -4661,6 +4661,86 @@ assignment for unaligned loads and stores");
}
 }
 
+  if (fcode == ALTIVEC_BUILTIN_VEC_ADDEC)
+{
+  /* vec_addec needs to be special cased because there is no instruction
+   for the {un}signed int version.  */
+  if (nargs != 3)
+   {
+ error ("vec_addec only accepts 3 arguments");
+ return error_mark_node;
+   }
+
+  tree arg0 = (*arglist)[0];
+  tree arg0_type = TREE_TYPE (arg0);
+  tree arg1 = (*arglist)[1];
+  tree arg1_type = TREE_TYPE (arg1);
+  tree arg2 = (*arglist)[2];
+  tree arg2_type = TREE_TYPE (arg2);
+
+  /* All 3 arguments must be vectors of (signed or unsigned) (int or
+   __int128) and the types must match.  */
+  if (arg0_type != arg1_type || arg1_type != arg2_type)
+   goto bad; 
+  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+   goto bad; 
+
+  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+   {
+ /* For {un}signed ints, 
+ vec_addec (va, vb, carryv) ==
+   vec_or (vec_addc (va, vb),
+   vec_addc(vec_add(va, vb),
+vec_and (carryv, 0x1))).  */
+ case SImode:
+   {
+   /* Use save_expr to ensure that operands used more than once
+   that may have side effects (like calls) are only evaluated
+   once.  */
+   arg0 = save_expr(arg0);
+   arg1 = save_expr(arg1);
+   vec *params = make_tree_vector ();
+   vec_safe_push (params, arg0);
+   vec_safe_push (params, arg1);
+   tree addc_builtin = 

Re: C++ PATCH to improve dump_decl (PR c++/71075)

2016-05-19 Thread Jason Merrill
OK.

On Thu, May 19, 2016 at 4:03 PM, Marek Polacek  wrote:
> On Thu, May 19, 2016 at 07:55:05PM +0200, Marek Polacek wrote:
>> On Thu, May 19, 2016 at 12:55:18PM -0400, Jason Merrill wrote:
>> > Well, constants aren't declarations.  Why are we calling dump_decl here?
>>
>> Eh, not sure what I was thinking...
>>
>> We call dump_decl because unify_template_argument_mismatch has %qD:
>> inform (input_location,
>> "  template argument %qE does not match %qD", arg, parm);
>> so I suspect we want to print expression ever for PARM:
>>
>> Untested, but I think it should test fine.  Ok if testing passes?
>
> Which it did.
>
> Marek


Re: [PATCH, vec-tails 01/10] New compiler options

2016-05-19 Thread Joseph Myers
On Thu, 19 May 2016, Ilya Enkovich wrote:

> Hi,
> 
> This patch introduces new options used for loop epilogues vectorization.

Any patch adding a new option should update invoke.texi (both the summary 
list of options, and adding documentation for the new option).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C++ PATCH to improve dump_decl (PR c++/71075)

2016-05-19 Thread Marek Polacek
On Thu, May 19, 2016 at 07:55:05PM +0200, Marek Polacek wrote:
> On Thu, May 19, 2016 at 12:55:18PM -0400, Jason Merrill wrote:
> > Well, constants aren't declarations.  Why are we calling dump_decl here?
> 
> Eh, not sure what I was thinking...
> 
> We call dump_decl because unify_template_argument_mismatch has %qD:
> inform (input_location,
> "  template argument %qE does not match %qD", arg, parm);
> so I suspect we want to print expression ever for PARM:
> 
> Untested, but I think it should test fine.  Ok if testing passes?

Which it did.

Marek


[PATCH, vec-tails 07/10] Support loop epilogue combining

2016-05-19 Thread Ilya Enkovich
Hi,

This patch introduces support for loop epilogue combining.  This includes
support in cost estimation and all required changes required to mask
vectorized loop.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* dbgcnt.def (vect_tail_combine): New.
* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
epilogue combined with loop body.
(vect_do_peeling_for_loop_bound): Likewise.
* tree-vect-loop.c Include alias.h and dbgcnt.h.
(vect_estimate_min_profitable_iters): Add 
ret_min_profitable_combine_niters
arg, compute number of iterations for which loop epilogue combining is
profitable.
(vect_generate_tmps_on_preheader): Support combined apilogue.
(vect_gen_ivs_for_masking): New.
(vect_get_mask_index_for_elems): New.
(vect_get_mask_index_for_type): New.
(vect_gen_loop_masks): New.
(vect_mask_reduction_stmt): New.
(vect_mask_mask_load_store_stmt): New.
(vect_mask_load_store_stmt): New.
(vect_combine_loop_epilogue): New.
(vect_transform_loop): Support combined apilogue.


diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 78ddcc2..73c2966 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -192,4 +192,5 @@ DEBUG_COUNTER (treepre_insert)
 DEBUG_COUNTER (tree_sra)
 DEBUG_COUNTER (vect_loop)
 DEBUG_COUNTER (vect_slp)
+DEBUG_COUNTER (vect_tail_combine)
 DEBUG_COUNTER (dom_unreachable_edges)
diff --git a/gcc/params.def b/gcc/params.def
index 62a1e40..98d6c5a 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1220,6 +1220,11 @@ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
  "Maximum number of may-defs visited when devirtualizing "
  "speculatively", 50, 0, 0)
 
+DEFPARAM (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD,
+ "vect-cost-increase-combine-threshold",
+ "Cost increase threshold to mask main loop for epilogue.",
+ 10, 0, 300)
+
 /*
 
 Local variables:
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f275933..c5bdeb9 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4000,6 +4000,9 @@ vect_get_new_ssa_name (tree type, enum vect_var_kind 
var_kind, const char *name)
   case vect_scalar_var:
 prefix = "stmp";
 break;
+  case vect_mask_var:
+prefix = "mask";
+break;
   case vect_pointer_var:
 prefix = "vectp";
 break;
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index fab5879..b3c0668 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1195,6 +1195,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct 
loop *scalar_loop,
   int first_guard_probability = 2 * REG_BR_PROB_BASE / 3;
   int second_guard_probability = 2 * REG_BR_PROB_BASE / 3;
   int probability_of_second_loop;
+  bool skip_second_after_first = false;
 
   if (!slpeel_can_duplicate_loop_p (loop, e))
 return NULL;
@@ -1393,7 +1394,11 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct 
loop *scalar_loop,
 {
   loop_vec_info loop_vinfo = loop_vec_info_for_loop (loop);
   tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
-  unsigned limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
+  unsigned limit = 0;
+  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+   skip_second_after_first = true;
+  else
+   limit = LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
limit = limit + 1;
   if (check_profitability
@@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, 
struct loop *scalar_loop,
   bb_between_loops = new_exit_bb;
   bb_after_second_loop = split_edge (single_exit (second_loop));
 
-  pre_condition =
-   fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
-  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
-  bb_after_second_loop, bb_before_first_loop,
- inverse_probability 
(second_guard_probability));
+  if (skip_second_after_first)
+/* We can just redirect edge from bb_between_loops to
+   bb_after_second_loop but we have many code assuming
+   we have a guard after the first loop.  So just make
+   always taken condtion.  */
+pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
+integer_zero_node);
+  else
+pre_condition =
+  fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
+  skip_e
+= slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
+bb_after_second_loop, bb_before_first_loop,
+inverse_probability (second_guard_probability));
   scale_loop_profile 

[PATCH, vec-tails 09/10] Print more info about vectorized loop

2016-05-19 Thread Ilya Enkovich
Hi,

This patch extends dumps for vectorized loops to provide more info
about them and also specify used vector size.  This is to be used
for tests.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c (vect_transform_loop): Print more info
about vectorized loop and specify used vector size.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 7075f29..5572cbb 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8088,12 +8088,43 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 
   if (dump_enabled_p ())
 {
-  dump_printf_loc (MSG_NOTE, vect_location,
-  "LOOP VECTORIZED\n");
-  if (loop->inner)
-   dump_printf_loc (MSG_NOTE, vect_location,
-"OUTER LOOP VECTORIZED\n");
-  dump_printf (MSG_NOTE, "\n");
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+   {
+ if (LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo))
+   {
+ gcc_assert (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo));
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "LOOP EPILOGUE VECTORIZED AND MASKED (VS=%d)\n",
+  current_vector_size);
+   }
+ else
+   dump_printf_loc (MSG_NOTE, vect_location,
+"LOOP EPILOGUE VECTORIZED (VS=%d)\n",
+current_vector_size);
+   }
+  else
+   {
+ if (LOOP_VINFO_NEED_MASKING (loop_vinfo))
+   dump_printf_loc (MSG_NOTE, vect_location,
+"LOW TRIP COUNT LOOP VECTORIZED AND MASKED "
+"(VS=%d)\n", current_vector_size);
+ else
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "LOOP VECTORIZED (VS=%d)\n",
+  current_vector_size);
+ if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+   dump_printf_loc (MSG_NOTE, vect_location,
+"LOOP EPILOGUE COMBINED (VS=%d)\n",
+current_vector_size);
+   }
+
+ if (loop->inner)
+   dump_printf_loc (MSG_NOTE, vect_location,
+"OUTER LOOP VECTORIZED (VS=%d)\n",
+current_vector_size);
+ dump_printf (MSG_NOTE, "\n");
+   }
 }
 
   /* Free SLP instances here because otherwise stmt reference counting


[PATCH, vec-tails 08/10] Support loop epilogue masking and low trip count loop vectorization

2016-05-19 Thread Ilya Enkovich
Hi,

This patch enables vectorization of loop epilogues and low trip count
loops using masking.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* dbgcnt.def (vect_tail_mask): New.
* tree-vect-loop.c (vect_analyze_loop_2): Support masked loop
epilogues and low trip count loops.
(vect_get_known_peeling_cost): Ignore scalat epilogue cost for
loops we are going to mask.
(vect_estimate_min_profitable_iters): Support masked loop
epilogues and low trip count loops.
* tree-vectorizer.c (vectorize_loops): Add a message for a case
when loop epilogue can't be vectorized.


diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 73c2966..5aad1d7 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -193,4 +193,5 @@ DEBUG_COUNTER (tree_sra)
 DEBUG_COUNTER (vect_loop)
 DEBUG_COUNTER (vect_slp)
 DEBUG_COUNTER (vect_tail_combine)
+DEBUG_COUNTER (vect_tail_mask)
 DEBUG_COUNTER (dom_unreachable_edges)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1a80c42..7075f29 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2199,7 +2199,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
)
   int saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   HOST_WIDE_INT estimated_niter;
   unsigned th;
-  int min_scalar_loop_bound;
+  int min_scalar_loop_bound = 0;
 
   /* Check the SLP opportunities in the loop, analyze and build SLP trees.  */
   ok = vect_analyze_slp (loop_vinfo, n_stmts);
@@ -2224,6 +2224,30 @@ start_over:
   unsigned vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   gcc_assert (vectorization_factor != 0);
 
+  /* For now we mask loop epilogue using the same VF since it was used
+ for cost estimations and it should be easier for reduction
+ optimization.  */
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
+  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) != (int)vectorization_factor)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"not vectorized: VF for loop epilogue doesn't "
+"match original loop VF.\n");
+  return false;
+}
+
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  && !LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo)
+  && LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo) <= (int)vectorization_factor)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"not vectorized: VF for loop epilogue is too small\n");
+  return false;
+}
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
 "vectorization_factor = %d, niters = "
@@ -2237,11 +2261,29 @@ start_over:
   || (max_niter != -1
  && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
 {
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not vectorized: iteration count smaller than "
-"vectorization factor.\n");
-  return false;
+  /* Allow low trip count for loop epilogue we want to mask.  */
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+ && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo))
+   ;
+  /* Allow low trip count for non-epilogue loops if flag is enabled.  */
+  else if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  && flag_tree_vectorize_short_loops)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"iteration count is small, masking is "
+"required for chosen vectorization factor.\n");
+
+ LOOP_VINFO_NEED_MASKING (loop_vinfo) = true;
+   }
+  else
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"not vectorized: iteration count smaller than "
+"vectorization factor.\n");
+ return false;
+   }
 }
 
   /* Analyze the alignment of the data-refs in the loop.
@@ -2282,6 +2324,16 @@ start_over:
   return false;
 }
 
+  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = true;
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
+  && LOOP_VINFO_ORIG_MASK_EPILOGUE (loop_vinfo))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"vectorizing loop epilogue with masking.\n");
+  LOOP_VINFO_NEED_MASKING (loop_vinfo) = true;
+}
+
   if (slp)
 {
   /* Analyze operations in the SLP instances.  Note this may
@@ -2305,6 +2357,19 @@ start_over:
   return false;
 }
 
+  if (LOOP_VINFO_NEED_MASKING (loop_vinfo)
+  && !LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+{
+  gcc_assert 

[PATCH, vec-tails 06/10] Mark the first vector store generated for a scalar store

2016-05-19 Thread Ilya Enkovich
Hi,

This patch STMT_VINFO_FIRST_COPY_P field to statement vec info.
This is used to find the first vector store generated for a
scalar one.  For other statements I use original scalar statement
to find the first and following vector statement.  For stores
original scalar statement is removed and this new fiels is used
to mark a chain start.  Also original data reference and vector
type are preserved in the first vector statement for masking
purposes.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_mask_load_store): Mark
the first copy of generated vector stores.
(vectorizable_store): Mark the first copy of generated
vector stores and provide it with vectype and the original
data reference.
* tree-vectorizer.h (struct _stmt_vec_info): Add first_copy_p
field.
(STMT_VINFO_FIRST_COPY_P): New.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 91ebe5a..84f4dc81 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2131,7 +2131,10 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
  ptr, vec_mask, vec_rhs);
  vect_finish_stmt_generation (stmt, new_stmt, gsi);
  if (i == 0)
-   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+   {
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
+   }
  else
STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
  prev_stmt_info = vinfo_for_stmt (new_stmt);
@@ -6203,7 +6206,16 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   if (!slp)
{
  if (j == 0)
-   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+   {
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
+ /* Original statement is replaced with the first vector one.
+Keep data reference and original vectype in the first
+vector copy for masking purposes.  */
+ STMT_VINFO_DATA_REF (vinfo_for_stmt (new_stmt))
+   = STMT_VINFO_DATA_REF (stmt_info);
+ STMT_VINFO_VECTYPE (vinfo_for_stmt (new_stmt)) = vectype;
+   }
  else
STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
  prev_stmt_info = vinfo_for_stmt (new_stmt);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 86c5371..3702c5d 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -633,6 +633,10 @@ typedef struct _stmt_vec_info {
   /* For both loads and stores.  */
   bool simd_lane_access_p;
 
+  /* True for the first vector statement copy when scalar
+ statement is vectorized into several vector ones.  */
+  bool first_copy_p;
+
   /* For reduction loops, this is the type of reduction.  */
   enum vect_reduction_type v_reduc_type;
 
@@ -666,6 +670,7 @@ STMT_VINFO_BB_VINFO (stmt_vec_info stmt_vinfo)
 #define STMT_VINFO_GATHER_SCATTER_P(S)(S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)   (S)->strided_p
 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
+#define STMT_VINFO_FIRST_COPY_P(S) (S)->first_copy_p
 #define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
 
 #define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_base_address


[PATCH, vec-tails 05/10] Check if loop can be masked

2016-05-19 Thread Ilya Enkovich
Hi,

This patch introduces analysis to determine if loop can be masked
(compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
and compute how much masking costs.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index e25a0ce..31360d3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h" /* FIXME: for insn_data */
 #include "diagnostic-core.h"
 #include "fold-const.h"
 #include "stor-layout.h"
@@ -1601,6 +1603,266 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
 vectorization_factor);
 }
 
+/* Function vect_check_required_masks_widening.
+
+   Return 1 if vector mask of type MASK_TYPE can be widened
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_widening (loop_vec_info loop_vinfo,
+   tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (mask_elems > req_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (mask_elems / req_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
+  if (icode == CODE_FOR_nothing
+  || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems >> 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
+  req_elems))
+   return false;
+}
+  else
+{
+  mask_type = build_truth_vector_type (req_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+}
+
+  return true;
+}
+
+/* Function vect_check_required_masks_narowing.
+
+   Return 1 if vector mask of type MASK_TYPE can be narrowed
+   to a type having REQ_ELEMS elements in a single vector.  */
+
+static bool
+vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
+tree mask_type, unsigned req_elems)
+{
+  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
+
+  gcc_assert (req_elems > mask_elems);
+
+  /* Don't convert if it requires too many intermediate steps.  */
+  int steps = exact_log2 (req_elems / mask_elems);
+  if (steps > MAX_INTERM_CVT_STEPS + 1)
+return false;
+
+  /* Check we have conversion support for given mask mode.  */
+  machine_mode mode = TYPE_MODE (mask_type);
+  insn_code icode = optab_handler (vec_pack_trunc_optab, mode);
+  if (icode == CODE_FOR_nothing)
+return false;
+
+  /* Make recursive call for multi-step conversion.  */
+  if (steps > 1)
+{
+  mask_elems = mask_elems << 1;
+  mask_type = build_truth_vector_type (mask_elems, current_vector_size);
+  if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
+   return false;
+
+  if 

Re: [PATCH] c/71115 - Missing warning: excess elements in struct initializer

2016-05-19 Thread Jeff Law

On 05/18/2016 06:12 PM, Martin Sebor wrote:

The bug points out that the following and similar invalid uses
of NULL are not diagnosed.

  #include 

  const char* a[1] = { "", NULL };

The attached patch implements the suggestion on the Diagnostics
Guidelines Wiki to call
expansion_point_location_if_in_system_header to determine the
location where the macro is used.  Making these changes and
noticing the already existing calls to the function made me
wonder if this approach (warning on system macros) should be
the default strategy, and not warning special.  Aren't there
many more contexts where we would like to see warnings for
them?
Seems like it would be a good idea.  Our ability to track this kind of 
stuff has improved greatly over the last several years and a rethink of 
stuff like this is in order.




In comment #8 on the bug Manuel also suggests to remove the note:
(near initialization for 'decl').  I tried it but decided not to
include it in this change because of the large number of tests it
will require making changes to (I counted at least 20).  I think
it's a worthwhile change but it seems that it might better be
made on its own.

Yes, that seems like a separate change.

As for the patch, it's fine for the trunk.  I note that the BZ entry 
claims this is a regression since 4.8, so rather than close, just remove 
the "7" from the regression marker.


Jeff


[PATCH, vec-tails 04/10] Add masking cost

2016-05-19 Thread Ilya Enkovich
Hi,

This patch extends vectorizer cost model to include masking cost by
adding new cost model locations and new target hook to compute
masking cost.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* config/i386/i386.c (ix86_init_cost): Extend costs array.
(ix86_add_stmt_masking_cost): New.
(ix86_finish_cost): Add masking_prologue_cost and masking_body_cost
args.
(TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
* config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): New.
* config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New.
* config/rs6000/rs6000.c (_rs6000_cost_data): Extend cost array.
(rs6000_init_cost): Initialize new cost elements.
(rs6000_finish_cost): Add masking_prologue_cost and masking_body_cost.
* config/spu/spu.c (spu_init_cost): Extend costs array.
(spu_finish_cost): Add masking_prologue_cost and masking_body_cost args.
* doc/tm.texi.in (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
* doc/tm.texi: Regenerated.
* target.def (add_stmt_masking_cost): New.
(finish_cost): Add masking_prologue_cost and masking_body_cost args.
* target.h (enum vect_cost_for_stmt): Add vector_mask_load and
vector_mask_store.
(enum vect_cost_model_location): Add vect_masking_prologue
and vect_masking_body.
* targhooks.c (default_builtin_vectorization_cost): Support
vector_mask_load and vector_mask_store.
(default_init_cost): Extend costs array.
(default_add_stmt_masking_cost): New.
(default_finish_cost): Add masking_prologue_cost and masking_body_cost
args.
* targhooks.h (default_add_stmt_masking_cost): New.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Adjust
finish_cost call.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
* tree-vectorizer.h (add_stmt_masking_cost): New.
(finish_cost): Add masking_prologue_cost and masking_body_cost args.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9f62089..6c2c364 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -53932,8 +53932,12 @@ ix86_spill_class (reg_class_t rclass, machine_mode 
mode)
 static void *
 ix86_init_cost (struct loop *)
 {
-  unsigned *cost = XNEWVEC (unsigned, 3);
-  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
+  unsigned *cost = XNEWVEC (unsigned, 5);
+  cost[vect_prologue] = 0;
+  cost[vect_body] = 0;
+  cost[vect_epilogue] = 0;
+  cost[vect_masking_prologue] = 0;
+  cost[vect_masking_body] = 0;
   return cost;
 }
 
@@ -53974,16 +53978,56 @@ ix86_add_stmt_cost (void *data, int count, enum 
vect_cost_for_stmt kind,
   return retval;
 }
 
+/* Implement targetm.vectorize.add_stmt_masking_cost.  */
+
+static unsigned
+ix86_add_stmt_masking_cost (void *data, int count, enum vect_cost_for_stmt 
kind,
+   struct _stmt_vec_info *stmt_info, int misalign,
+   enum vect_cost_model_location where)
+{
+  bool embedded_masking = false;
+  unsigned *cost = (unsigned *) data;
+  unsigned retval = 0;
+
+  tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
+  if (vectype)
+{
+  machine_mode mode
+   = ix86_get_mask_mode (TYPE_VECTOR_SUBPARTS (vectype),
+ tree_to_uhwi (TYPE_SIZE_UNIT (vectype)));
+  embedded_masking = !VECTOR_MODE_P (mode);
+}
+  else
+embedded_masking = TARGET_AVX512F;
+
+  if (embedded_masking || kind == vector_load)
+return retval;
+
+  if (kind == vector_store)
+return TARGET_INCREASE_MASK_STORE_COST ? 10 : 0;
+
+  int stmt_cost = ix86_builtin_vectorization_cost (vector_stmt, vectype, 
misalign);
+  retval = (unsigned) (count * stmt_cost);
+
+  cost[where] += retval;
+
+  return retval;
+}
+
 /* Implement targetm.vectorize.finish_cost.  */
 
 static void
 ix86_finish_cost (void *data, unsigned *prologue_cost,
- unsigned *body_cost, unsigned *epilogue_cost)
+ unsigned *body_cost, unsigned *epilogue_cost,
+ unsigned *masking_prologue_cost,
+ unsigned *masking_body_cost)
 {
   unsigned *cost = (unsigned *) data;
   *prologue_cost = cost[vect_prologue];
   *body_cost = cost[vect_body];
   *epilogue_cost = cost[vect_epilogue];
+  *masking_prologue_cost = cost[vect_masking_prologue];
+  *masking_body_cost = cost[vect_masking_body];
 }
 
 /* Implement targetm.vectorize.destroy_cost_data.  */
@@ -54964,6 +55008,8 @@ ix86_addr_space_zero_address_valid (addr_space_t as)
 #define TARGET_VECTORIZE_INIT_COST ix86_init_cost
 #undef TARGET_VECTORIZE_ADD_STMT_COST
 #define TARGET_VECTORIZE_ADD_STMT_COST ix86_add_stmt_cost
+#undef TARGET_VECTORIZE_ADD_STMT_MASKING_COST
+#define TARGET_VECTORIZE_ADD_STMT_MASKING_COST ix86_add_stmt_masking_cost
 #undef TARGET_VECTORIZE_FINISH_COST
 #define 

[PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-05-19 Thread Ilya Enkovich
Hi,

This patch introduces changes required to run vectorizer on loop epilogue.
This also enables epilogue vectorization using a vector of smaller size.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-if-conv.c (tree_if_conversion): Make public.
* tree-if-conv.h: New file.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
try to enhance alignment for epilogues.
* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
created loop.
* tree-vect-loop.c: include tree-if-conv.h.
(destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
loop->aux.
(vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
loop->aux.
(vect_analyze_loop): Reset loop->aux.
(vect_transform_loop): Check if created epilogue should be returned
for further vectorization.  If-convert epilogue if required.
* tree-vectorizer.c (vectorize_loops): Add a queue of loops to
process and insert vectorized loop epilogues into this queue.
* tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created
loop.
(vect_transform_loop): Return created loop.


diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index c38e21b..41b6c99 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2801,7 +2801,7 @@ ifcvt_local_dce (basic_block bb)
profitability analysis.  Returns non-zero todo flags when something
changed.  */
 
-static unsigned int
+unsigned int
 tree_if_conversion (struct loop *loop)
 {
   unsigned int todo = 0;
diff --git a/gcc/tree-if-conv.h b/gcc/tree-if-conv.h
new file mode 100644
index 000..3a732c2
--- /dev/null
+++ b/gcc/tree-if-conv.h
@@ -0,0 +1,24 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_TREE_IF_CONV_H
+#define GCC_TREE_IF_CONV_H
+
+unsigned int tree_if_conversion (struct loop *);
+
+#endif  /* GCC_TREE_IF_CONV_H  */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7652e21..f275933 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1595,7 +1595,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
   || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
-  || loop->inner)
+  || loop->inner
+  /* Required peeling was performed in prologue and
+is not required for epilogue.  */
+  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
 do_peeling = false;
 
   if (do_peeling
@@ -1875,7 +1878,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
 
   do_versioning =
optimize_loop_nest_for_speed_p (loop)
-   && (!loop->inner); /* FORNOW */
+   && (!loop->inner) /* FORNOW */
+/* Required versioning was performed for the
+  original loop and is not required for epilogue.  */
+   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
 
   if (do_versioning)
 {
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 7ec6dae..fab5879 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1742,9 +1742,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info 
loop_vinfo, tree niters,
NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).
 
COND_EXPR and COND_EXPR_STMT_LIST are combined with a new generated
-   test.  */
+   test.
 
-void
+   Return created loop.  */
+
+struct loop *
 vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
tree ni_name, tree ratio_mult_vf_name,
unsigned int th, bool check_profitability)
@@ -1812,6 +1814,8 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
   scev_reset ();
 
   free_original_copy_tables ();
+
+  return new_loop;
 }
 
 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index aac0df9..a537ef4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "gimple-fold.h"
 #include "cgraph.h"
+#include "tree-if-conv.h"
 
 /* Loop Vectorization Pass.
 
@@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool 
clean_stmts)
   

[PATCH, vec-tails 02/10] Extend _loop_vec_info structure with epilogue related fields

2016-05-19 Thread Ilya Enkovich
Hi,

This patch adds new fields to _loop_vec_info structure to support loop
epilogue vectorization.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* tree-vectorizer.h (struct _loop_vec_info): Add new fields
can_be_masked, required_masks, mask_epilogue, combine_epilogue,
need_masking, orig_loop_info.
(LOOP_VINFO_CAN_BE_MASKED): New.
(LOOP_VINFO_REQUIRED_MASKS): New.
(LOOP_VINFO_COMBINE_EPILOGUE): New.
(LOOP_VINFO_MASK_EPILOGUE): New.
(LOOP_VINFO_NEED_MASKING): New.
(LOOP_VINFO_ORIG_LOOP_INFO): New.
(LOOP_VINFO_EPILOGUE_P): New.
(LOOP_VINFO_ORIG_MASK_EPILOGUE): New.
(LOOP_VINFO_ORIG_VECT_FACTOR): New.
* tree-vect-loop.c (new_loop_vec_info): Initialize new
_loop_vec_info fields.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index da98211..aac0df9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1125,6 +1125,12 @@ new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
   LOOP_VINFO_PEELING_FOR_NITER (res) = false;
   LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
+  LOOP_VINFO_CAN_BE_MASKED (res) = false;
+  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
+  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
+  LOOP_VINFO_MASK_EPILOGUE (res) = false;
+  LOOP_VINFO_NEED_MASKING (res) = false;
+  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
 
   return res;
 }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index bd1d55a..4c19317 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -336,6 +336,23 @@ typedef struct _loop_vec_info : public vec_info {
   /* Mark loops having masked stores.  */
   bool has_mask_store;
 
+  /* True if vectorized loop can be masked.  */
+  bool can_be_masked;
+  /* If vector mask with 2^N elements is required to mask the loop
+ then N-th bit of this field is set to 1.  */
+  unsigned required_masks;
+
+  /* True if we should vectorize loop epilogue with masking.  */
+  bool mask_epilogue;
+  /* True if we should combine main loop with epilogue using masking.  */
+  bool combine_epilogue;
+  /* True if loop vectorization requires masking.  E.g. we want to
+ vectorize loop with low trip count.  */
+  bool need_masking;
+  /* For loops being epilogues of already vectorized loops
+ this points to the original vectorized loop.  Otherwise NULL.  */
+  _loop_vec_info *orig_loop_info;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -374,6 +391,12 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_HAS_MASK_STORE(L)   (L)->has_mask_store
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) 
(L)->single_scalar_iteration_cost
+#define LOOP_VINFO_CAN_BE_MASKED(L)(L)->can_be_masked
+#define LOOP_VINFO_REQUIRED_MASKS(L)   (L)->required_masks
+#define LOOP_VINFO_COMBINE_EPILOGUE(L) (L)->combine_epilogue
+#define LOOP_VINFO_MASK_EPILOGUE(L)(L)->mask_epilogue
+#define LOOP_VINFO_NEED_MASKING(L) (L)->need_masking
+#define LOOP_VINFO_ORIG_LOOP_INFO(L)   (L)->orig_loop_info
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
   ((L)->may_misalign_stmts.length () > 0)
@@ -383,6 +406,14 @@ typedef struct _loop_vec_info : public vec_info {
 #define LOOP_VINFO_NITERS_KNOWN_P(L)  \
   (tree_fits_shwi_p ((L)->num_iters) && tree_to_shwi ((L)->num_iters) > 0)
 
+#define LOOP_VINFO_EPILOGUE_P(L) \
+  (LOOP_VINFO_ORIG_LOOP_INFO(L) != NULL)
+
+#define LOOP_VINFO_ORIG_MASK_EPILOGUE(L) \
+  (LOOP_VINFO_MASK_EPILOGUE (LOOP_VINFO_ORIG_LOOP_INFO(L)))
+#define LOOP_VINFO_ORIG_VECT_FACTOR(L) \
+  (LOOP_VINFO_VECT_FACTOR (LOOP_VINFO_ORIG_LOOP_INFO(L)))
+
 static inline loop_vec_info
 loop_vec_info_for_loop (struct loop *loop)
 {


[PATCH, vec-tails 01/10] New compiler options

2016-05-19 Thread Ilya Enkovich
Hi,

This patch introduces new options used for loop epilogues vectorization.

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* common.opt (flag_tree_vectorize_epilogues): New.
(ftree-vectorize-short-loops): New.
(ftree-vectorize-epilogues=): New.
(fno-tree-vectorize-epilogues): New.
(fvect-epilogue-cost-model=): New.
* flag-types.h (enum vect_epilogue_mode): New.
* opts.c (parse_vectorizer_options): New.
(common_handle_option): Support -ftree-vectorize-epilogues=
and -fno-tree-vectorize-epilogues options.


diff --git a/gcc/common.opt b/gcc/common.opt
index 682cb41..6b83b79 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -243,6 +243,10 @@ bool dump_base_name_prefixed = false
 Variable
 bool flag_disable_hsa = false
 
+; Flag holding modes for loop epilogue vectorization
+Variable
+unsigned int flag_tree_vectorize_epilogues
+
 ###
 Driver
 
@@ -2557,6 +2561,19 @@ ftree-vectorize
 Common Report Var(flag_tree_vectorize) Optimization
 Enable vectorization on trees.
 
+ftree-vectorize-short-loops
+Common Report Var(flag_tree_vectorize_short_loops) Optimization
+Enable vectorization of loops with low trip count using masking.
+
+ftree-vectorize-epilogues=
+Common Report Joined Optimization
+Comma separated list of loop epilogue vectorization modes.
+Available modes: combine, mask, nomask.
+
+fno-tree-vectorize-epilogues
+Common RejectNegative Optimization
+Disable epilogues vectorization.
+
 ftree-vectorizer-verbose=
 Common Joined RejectNegative Ignore
 Does nothing.  Preserved for backward compatibility.
@@ -2577,6 +2594,10 @@ fsimd-cost-model=
 Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) 
Init(VECT_COST_MODEL_UNLIMITED) Optimization
 Specifies the vectorization cost model for code marked with a simd directive.
 
+fvect-epilogue-cost-model=
+Common Joined RejectNegative Enum(vect_cost_model) 
Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) Optimization
+Specifies the cost model for epilogue vectorization.
+
 Enum
 Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
vectorizer cost model %qs)
 
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index dd57e16..24081b1 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -200,6 +200,15 @@ enum vect_cost_model {
   VECT_COST_MODEL_DEFAULT = 3
 };
 
+/* Epilogue vectorization modes.  */
+enum vect_epilogue_mode {
+  VECT_EPILOGUE_COMBINE = 1 << 0,
+  VECT_EPILOGUE_MASK = 1 << 1,
+  VECT_EPILOGUE_NOMASK = 1 << 2,
+  VECT_EPILOGUE_ALL = VECT_EPILOGUE_COMBINE | VECT_EPILOGUE_MASK
+ | VECT_EPILOGUE_NOMASK
+};
+
 /* Different instrumentation modes.  */
 enum sanitize_code {
   /* AddressSanitizer.  */
diff --git a/gcc/opts.c b/gcc/opts.c
index 0f9431a..a0c0987 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1531,6 +1531,63 @@ parse_sanitizer_options (const char *p, location_t loc, 
int scode,
   return flags;
 }
 
+/* Parse comma separated vectorizer suboptions from P for option SCODE,
+   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
+   don't issue diagnostics.  */
+
+unsigned int
+parse_vectorizer_options (const char *p, location_t loc, int scode,
+ unsigned int flags, int value, bool complain)
+{
+  if (scode != OPT_ftree_vectorize_epilogues_)
+return flags;
+
+  if (!p)
+return value;
+
+  while (*p != 0)
+{
+  size_t len;
+  const char *comma = strchr (p, ',');
+  unsigned int flag = 0;
+
+  if (comma == NULL)
+   len = strlen (p);
+  else
+   len = comma - p;
+  if (len == 0)
+   {
+ p = comma + 1;
+ continue;
+   }
+
+  /* Check to see if the string matches an option class name.  */
+  if (len == strlen ("combine")
+ && memcmp (p, "combine", len) == 0)
+   flag = VECT_EPILOGUE_COMBINE;
+  else if (len == strlen ("mask")
+ && memcmp (p, "mask", len) == 0)
+   flag = VECT_EPILOGUE_MASK;
+  else if (len == strlen ("nomask")
+ && memcmp (p, "nomask", len) == 0)
+   flag = VECT_EPILOGUE_NOMASK;
+  else if (complain)
+   error_at (loc, "unrecognized argument to -ftree-vectorize-epilogues= "
+ "option: %q.*s", (int) len, p);
+
+  if (value)
+   flags |= flag;
+  else
+   flags &= ~flag;
+
+  if (comma == NULL)
+   break;
+  p = comma + 1;
+}
+
+  return flags;
+}
+
 /* Handle target- and language-independent options.  Return zero to
generate an "unknown option" message.  Only options that need
extra handling need to be listed here; if you simply want
@@ -2018,6 +2075,18 @@ common_handle_option (struct gcc_options *opts,
   if (!opts_set->x_flag_tree_slp_vectorize)
 opts->x_flag_tree_slp_vectorize = value;
   break;
+
+case OPT_ftree_vectorize_epilogues_:
+  opts->x_flag_tree_vectorize_epilogues
+   = parse_vectorizer_options 

[RFC][PATCH, vec-tails 00/10] Support vectorization of loop epilogues

2016-05-19 Thread Ilya Enkovich
Hi,

This series is an extension of previous work on loop epilogue combining [1].

It introduces three ways to handle vectorized loop epilogues: combine it with
vectorized loop, vectorize it with masks, vectorize it using a smaller vector
size.

Also it supports vectorization of loops with low trip count.

Epilogue combining is used as a basic masking transformation.  Epilogue
masking and low trip count loop vectorization is considered as epilogue
combining with a zero trip count vector loop.

Epilogues vectorization is controlled via new option -ftree-vectorize-epilogues=
which gets a comma separated list of enabled modes which include combine, mask,
nomask.  There is a separate option -ftree-vectorize-short-loops for low trip
count loops.

To support epilogues vectorization I use a queue of loops to be vectorized in
vectorize_loops and change vect_transform_loop to return generated epilogue
(in case we want to try vectorize it).  If epilogue is returned then it is
queued for processing.  This variant of epilogues processing was chosen because
it is simple and works for all epilogue processing options.

There are currently some limitations implied by this scheme:
 - Copied loop misses some required optimization info (e.g. scev info)
which may result in an epilogue which cannot be vectorized
 - Loop epilogue may require if-convertion
 - Alias/alignment checks are not inherited and therefore will be performed
one more time for epilogue.  For now epilogue vectorization is just disabled
in case alias versioning is required and alignment enhancement is
disabled for epilogues.

There is a set of new fields added to _loop_vec_info to support epilogues
vectorization.

LOOP_VINFO_CAN_BE_MASKED - true if vectorized loop can be masked.  It is
computed during vectorization analysis (in various vectorizable_* functions).

LOOP_VINFO_REQUIRED_MASKS - for loop which can be masked it holds all masks
required to mask the loop.

LOOP_VINFO_COMBINE_EPILOGUE - true if we decided vectorized loop should be
masked.

LOOP_VINFO_MASK_EPILOGUE - true if we decided an epilogue of this loop
should be vectorized and masked

LOOP_VINFO_NEED_MASKING - true if vectorized loop has to be masked (set for
epilogues we want to mask and low trip count loops).

LOOP_VINFO_ORIG_LOOP_INFO - for epilogues this holds loop_vec_info of the
original vectorized loop.

To make a decision whether we want to mask or combine a loop epilogue
cost model is extended with masking costs.  This includes vect_masking_prologue
and vect_masking_body elements added to vect_cost_model_location enum and
finish_cost extended with two additional returned values correspondingly.  Also
in addition to add_stmt_cost I also add add_stmt_masking_cost to compute
a cost for masking a statement.

vect_estimate_min_profitable_iters checks if epilogue masking is profitable
and also computes a number of iterations required to have profitable
epilogue combining (this number may be used as a threshold in vectorized
loop guard).

These patches do not enable any of new features by default for all optimization
levels.  Masking features are expected to be mostly used for AVX-512 targets
and lack of hardware suitable for wide performance testing is the reason cost
model is not tuned and optimizations are not enabled by default.  With small
tests using a small number of loop iterations and 'heavy' epilogues (e.g.
number of iterations is VF*2-1) I see expected ~2x gain on existing KNL 
hardware.
Later this year we expect to get an access to KNL machines and have an
opportunity to tune masking cost model.

On Haswell hardware I don't see performance gains on similar loops which means
masked code is not better than a scalar one when we have a heavy masks usage.
This still might be useful in case number statements requiring masking is
relatively small (I used test a[i] += b[i] which needs masking for 3 out of 4
vector statements).  We will continue search for cases where masking is
profitable for Haswell to tune masking costs appropriately.

Below are ChangeLogs for whole series.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03014.html

Thanks,
Ilya
--
gcc/

2016-05-19  Ilya Enkovich  

* common.opt (flag_tree_vectorize_epilogues): New.
(ftree-vectorize-short-loops): New.
(ftree-vectorize-epilogues=): New.
(fno-tree-vectorize-epilogues): New.
(fvect-epilogue-cost-model=): New.
* flag-types.h (enum vect_epilogue_mode): New.
* opts.c (parse_vectorizer_options): New.
(common_handle_option): Support -ftree-vectorize-epilogues=
and -fno-tree-vectorize-epilogues options.


gcc/

2016-05-19  Ilya Enkovich  

* tree-vectorizer.h (struct _loop_vec_info): Add new fields
can_be_masked, required_masks, mask_epilogue, combine_epilogue,
need_masking, orig_loop_info.
(LOOP_VINFO_CAN_BE_MASKED): New.
(LOOP_VINFO_REQUIRED_MASKS): New.

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-19 Thread Jeff Law

On 05/19/2016 12:40 PM, Aaron Conole wrote:

Nathan Sidwell  writes:


On 02/24/16 16:52, Aaron Conole wrote:

The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.


this is ok in principle.  I have a couple of questions & nits below though.


Thank you for the consideration.  I will be submitting a new patch that
I hope fully addresses your comments below, either tomorrow or Monday.

Thanks so much for the review.


I don't see a previous commit from you -- do you have a copyright
assignment with the FSF? (although this patch is simple, my guess is
the idea it implements is sufficiently novel to need one).  We can
handle that off list.


I'm happy to report that I did send in some FSF paperwork this week.
Hopefully it is on record now, but even if it isn't I live a train ride
away from the FSF headquarters so I'd be happy to take the time to make
sure it's all signed correctly.
Also note that Aaron works for Red Hat and should be covered by our 
existing assignments.


jeff



Re: New C++ PATCH for c++/10200 et al

2016-05-19 Thread Jason Merrill

On 05/18/2016 01:42 PM, Jason Merrill wrote:

On 05/13/2016 03:17 PM, Jason Merrill wrote:

On 02/16/2016 07:49 PM, Jason Merrill wrote:

Clearly the DR 141 change is requiring much larger adjustments in the
rest of the compiler than I'm comfortable making at this point in the
GCC 6 schedule, so I'm backing out my earlier changes for 10200 and
69753 and replacing them with a more modest fix for 10200: Now we will
still find member function templates by unqualified lookup, we just
won't find namespace-scope function templates.  The earlier approach
will return in GCC 7 stage 1.


As promised.  The prerequisite for the DR 141 change was fixing the
C++11 handling of type-dependence of member access expressions,
including calls.  14.6.2.2 says,

A class member access expression (5.2.5) is type-dependent if the
expression refers to a member of the current instantiation and the type
of the referenced member is dependent, or the class member access
expression refers to a member of an unknown specialization. [ Note: In
an expression of the form x.y or xp->y the type of the expression is
usually the type of the member y of the class of x (or the class pointed
to by xp). However, if x or xp refers to a dependent type that is not
the current instantiation, the type of y is always dependent. If x or xp
refers to a non-dependent type or refers to the current instantiation,
the type of y is the type of the class member access expression. —end
note ]

Previously we had been treating such expressions as type-dependent if
the object-expression is type-dependent, even if its type is the current
instantiation.  Fixing this required a few changes in other areas that
now have to deal with non-dependent member function calls within a
template.


A small tweak to handling of value-dependent functions to better match
the text of the standard.


And some other places that needed to be updated to handle multiple 
levels of template args due to non-dependent member function calls.


Jason


commit 19e3a881e67edf3181b33cb064e9fea699aba8c9
Author: Jason Merrill 
Date:   Wed May 18 16:52:58 2016 -0400

	Fix handling of non-dependent calls with default template args.

	PR c++/10200
	* pt.c (fn_type_unification): Add outer template args if needed.
	(type_unification_real): Handle getting full args.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fde3091..3908592 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17578,6 +17578,13 @@ fn_type_unification (tree fn,
   tree tinst;
   tree r = error_mark_node;
 
+  tree full_targs = targs;
+  if (TMPL_ARGS_DEPTH (targs)
+  < TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (fn)))
+full_targs = (add_outermost_template_args
+		  (DECL_TI_ARGS (DECL_TEMPLATE_RESULT (fn)),
+		   targs));
+
   if (decltype_p)
 complain |= tf_decltype;
 
@@ -17623,6 +17630,14 @@ fn_type_unification (tree fn,
   location_t loc = input_location;
   bool incomplete = false;
 
+  if (explicit_targs == error_mark_node)
+	goto fail;
+
+  if (TMPL_ARGS_DEPTH (explicit_targs)
+	  < TMPL_ARGS_DEPTH (full_targs))
+	explicit_targs = add_outermost_template_args (full_targs,
+		  explicit_targs);
+
   /* Adjust any explicit template arguments before entering the
 	 substitution context.  */
   explicit_targs
@@ -17702,6 +17717,7 @@ fn_type_unification (tree fn,
 	goto fail;
 
   /* Place the explicitly specified arguments in TARGS.  */
+  explicit_targs = INNERMOST_TEMPLATE_ARGS (explicit_targs);
   for (i = NUM_TMPL_ARGS (explicit_targs); i--;)
 	TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (explicit_targs, i);
 }
@@ -17751,7 +17767,7 @@ fn_type_unification (tree fn,
   checks = NULL;
 
   ok = !type_unification_real (DECL_INNERMOST_TEMPLATE_PARMS (fn),
-			   targs, parms, args, nargs, /*subr=*/0,
+			   full_targs, parms, args, nargs, /*subr=*/0,
 			   strict, flags, , explain_p);
   if (!explain_p)
 pop_tinst_level ();
@@ -18247,7 +18263,7 @@ unify_one_argument (tree tparms, tree targs, tree parm, tree arg,
 
 static int
 type_unification_real (tree tparms,
-		   tree targs,
+		   tree full_targs,
 		   tree xparms,
 		   const tree *xargs,
 		   unsigned int xnargs,
@@ -18270,6 +18286,8 @@ type_unification_real (tree tparms,
   gcc_assert (xparms == NULL_TREE || TREE_CODE (xparms) == TREE_LIST);
   gcc_assert (ntparms > 0);
 
+  tree targs = INNERMOST_TEMPLATE_ARGS (full_targs);
+
   /* Reset the number of non-defaulted template arguments contained
  in TARGS.  */
   NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs) = NULL_TREE;
@@ -18304,7 +18322,7 @@ type_unification_real (tree tparms,
   arg = args[ia];
   ++ia;
 
-  if (unify_one_argument (tparms, targs, parm, arg, subr, strict,
+  if (unify_one_argument (tparms, full_targs, parm, arg, subr, strict,
 			  explain_p))
 	return 1;
 }
@@ -18324,7 +18342,7 @@ type_unification_real (tree tparms,
 
   /* Copy the parameter into parmvec.  */
 

Re: [patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Kai Tietz
Ok.  I just looked into patch.  Sorry for the delay.

As it is still possible to build old behavior, the patch is ok for me.

Thanks,
Kai


2016-05-19 20:55 GMT+02:00 Sandra Loosemore :
> On 05/19/2016 12:40 PM, Kai Tietz wrote:
>>
>> Hi,
>>
>> hopefully this time gmail uses  mail-encoding elmz ask for ...
>>
>> Sorry to object here.  I would like to point out that defaulting to
>> dw2 on 32-bit if SEH is used for 64-bit is nothing good in general.
>> This is reasoned by the problems existing in dw2 in combination with
>> other compiler-generated binaries.
>
>
> The patch does not change the default behavior for any target.  It only
> makes configuring with --disable-sjlj-exceptions work in a situation where
> it previously triggered a build error.
>
> The reason why I need this is precisely because of a compatibility issue --
> I have an existing 32-bit library supplied by a third party that uses DW2 EH
> that I need to link with.
>
> -Sandra
>
>


Re: [patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Jeff Law

On 05/19/2016 12:55 PM, Sandra Loosemore wrote:

On 05/19/2016 12:40 PM, Kai Tietz wrote:

Hi,

hopefully this time gmail uses  mail-encoding elmz ask for ...

Sorry to object here.  I would like to point out that defaulting to
dw2 on 32-bit if SEH is used for 64-bit is nothing good in general.
This is reasoned by the problems existing in dw2 in combination with
other compiler-generated binaries.


The patch does not change the default behavior for any target.  It only
makes configuring with --disable-sjlj-exceptions work in a situation
where it previously triggered a build error.

Precisely.

jeff


Re: [patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Sandra Loosemore

On 05/19/2016 12:40 PM, Kai Tietz wrote:

Hi,

hopefully this time gmail uses  mail-encoding elmz ask for ...

Sorry to object here.  I would like to point out that defaulting to
dw2 on 32-bit if SEH is used for 64-bit is nothing good in general.
This is reasoned by the problems existing in dw2 in combination with
other compiler-generated binaries.


The patch does not change the default behavior for any target.  It only 
makes configuring with --disable-sjlj-exceptions work in a situation 
where it previously triggered a build error.


The reason why I need this is precisely because of a compatibility issue 
-- I have an existing 32-bit library supplied by a third party that uses 
DW2 EH that I need to link with.


-Sandra




Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-19 Thread Cesar Philippidis
On 05/19/2016 04:29 AM, Alexander Monakov wrote:
> On Wed, 18 May 2016, Cesar Philippidis wrote:

> Note that the documentation suggests using 'make_safe_from' to concisely
> express conflict resolution:
> 
>> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
>> index 33a4862..69bbb22 100644
>> --- a/gcc/config/nvptx/nvptx.md
>> +++ b/gcc/config/nvptx/nvptx.md
>> @@ -794,6 +794,24 @@
>>""
>>"%.\\tsqrt%#%t0\\t%0, %1;")
>>  
>> +(define_expand "sincossf3"
>> +  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
>> +   UNSPEC_COS))
>> +   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
>> +(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
>> +  "flag_unsafe_math_optimizations"
> 
> ... here instead of special-casing the conflict case in curly braces you can
> just write:
> 
> "operands[2] = make_safe_from (operands[2], operands[0]);"
> 
>> +{
>> +  if (REGNO (operands[0]) == REGNO (operands[2]))
>> +{
>> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[2]));
>> +  emit_insn (gen_rtx_SET (tmp, operands[2]));
>> +  emit_insn (gen_sinsf2 (operands[1], tmp));
>> +  emit_insn (gen_cossf2 (operands[0], tmp));
>> +  DONE;
>> +}
>> +})

Done. Is this ok for trunk?

Cesar

2016-05-19  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.md (sincossf3): New pattern.

	gcc/testsuite/
	* gcc.target/nvptx/sincos.c: New test.


diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..1dd256d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -794,6 +794,16 @@
   ""
   "%.\\tsqrt%#%t0\\t%0, %1;")
 
+(define_expand "sincossf3"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 2 "nvptx_register_operand" "R")]
+	   UNSPEC_COS))
+   (set (match_operand:SF 1 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_dup 2)] UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+  "operands[2] = make_safe_from (operands[2], operands[0]);"
+)
+
 (define_insn "sinsf2"
   [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
 	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
diff --git a/gcc/testsuite/gcc.target/nvptx/sincos.c b/gcc/testsuite/gcc.target/nvptx/sincos.c
new file mode 100644
index 000..921ec41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/sincos.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern float sinf (float);
+extern float cosf (float);
+
+float
+sincos_add (float x)
+{
+  float s = sinf (x);
+  float c = cosf (x);
+
+  return s + c;
+}
+
+/* { dg-final { scan-assembler-times "sin.approx.f32" 1 } } */
+/* { dg-final { scan-assembler-times "cos.approx.f32" 1 } } */


Re: [PATCH 1/9] Change ENABLE_VALGRIND_CHECKING to ENABLE_VALGRIND_ANNOTATIONS guard.

2016-05-19 Thread Jeff Law

On 05/19/2016 04:43 AM, marxin wrote:

Following change is very similar to what I did in:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02103.html

It's more logical to encapsulate valgrind annotation magic within
a ENABLE_VALGRIND_ANNOTATIONS macro rather than ENABLE_VALGRIND_CHECKING.

libcpp/ChangeLog:

2016-05-18  Martin Liska  

* config.in: Regenerated.
* configure: Likewise.
* configure.ac: Handle --enable-valgrind-annotations.
* lex.c (new_buff): Use ENABLE_VALGRIND_ANNOTATIONS instead
of ENABLE_VALGRIND_CHECKING.
(_cpp_free_buff): Likewise.

OK.
jeff



Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-19 Thread Aaron Conole
Nathan Sidwell  writes:

> On 02/24/16 16:52, Aaron Conole wrote:
>> The previous gcov behavior was to always output errors on the stderr channel.
>> This is fine for most uses, but some programs will require stderr to be
>> untouched by libgcov for certain tests. This change allows configuring
>> the gcov output via an environment variable which will be used to open
>> the appropriate file.
>
> this is ok in principle.  I have a couple of questions & nits below though.

Thank you for the consideration.  I will be submitting a new patch that
I hope fully addresses your comments below, either tomorrow or Monday.

Thanks so much for the review.

> I don't see a previous commit from you -- do you have a copyright
> assignment with the FSF? (although this patch is simple, my guess is
> the idea it implements is sufficiently novel to need one).  We can
> handle that off list.

I'm happy to report that I did send in some FSF paperwork this week.
Hopefully it is on record now, but even if it isn't I live a train ride
away from the FSF headquarters so I'd be happy to take the time to make
sure it's all signed correctly.

>> diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
>> index 4e3b244..0eb9755 100644
>> --- a/libgcc/libgcov-driver-system.c
>> +++ b/libgcc/libgcov-driver-system.c
>> @@ -23,6 +23,24 @@ a copy of the GCC Runtime Library Exception along
>> with this program;
>>  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>  .  */
>>
>> +FILE *__gcov_error_file = NULL;
>
> Unless I'm missing something, isn't this only accessed from this file?
> (So could be static with a non-underbarred name)

Ack.

>> @@ -30,12 +48,27 @@ gcov_error (const char *fmt, ...)
>>  {
>>int ret;
>>va_list argp;
>> +
>> +  if (!__gcov_error_file)
>> +__gcov_error_file = get_gcov_error_file();
>
> Needs space before ()

Ack.

>> +
>>va_start (argp, fmt);
>> -  ret = vfprintf (stderr, fmt, argp);
>> +  ret = vfprintf (__gcov_error_file, fmt, argp);
>>va_end (argp);
>>return ret;
>>  }
>>
>> +#if !IN_GCOV_TOOL
>
> And this protection here, makes me wonder what happens if one is
> IN_GCOV_TOOL. Does it pay attention to GCOV_ERROR_FILE?  That would
> seem incorrect, and thus the above should be changed so that stderr is
> unconditionally used when IN_GCOV_TOOL?

You are correct.  I will fix it.

>> +static void
>> +gcov_error_exit(void)
>> +{
>> +  if (__gcov_error_file && __gcov_error_file != stderr)
>> +{
>
> Braces are not needed here.

Ack.

>> --- a/libgcc/libgcov-driver.c
>> +++ b/libgcc/libgcov-driver.c
>> @@ -46,6 +46,10 @@ void __gcov_init (struct gcov_info *p
>> __attribute__ ((unused))) {}
>
>> +  gcov_error_exit();
>
> Needs space before ().

Ack.

> nathan


Re: [patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Kai Tietz
Hi,

hopefully this time gmail uses  mail-encoding elmz ask for ...

Sorry to object here.  I would like to point out that defaulting to
dw2 on 32-bit if SEH is used for 64-bit is nothing good in general.
This is reasoned by the problems existing in dw2 in combination with
other compiler-generated binaries.

Regards
Kai

2016-05-19 20:05 GMT+02:00 Jeff Law :
> On 05/19/2016 11:36 AM, Sandra Loosemore wrote:
>>
>> This is a slightly revised version of the WIP patch against GCC 5.1 I
>> previously posted here:
>>
>> https://gcc.gnu.org/ml/gcc/2016-05/msg00135.html
>>
>> To recap, I needed a biarch x86_64 mingw-w64 target compiler that uses
>> DWARF-2 exception handling in 32-bit mode (for compatibility with an
>> older i686-mingw32 toolchain configured with --disable-sjlj-exceptions).
>>  But, the configuration machinery was rejecting this.  It used to be
>> that SJLJ exceptions were the only choice for 64-bit mode so it was
>> correct to reject --disable-sjlj-exceptions in a biarch toolchain, but
>> now the default for 64-bit mode is SEH instead.  With this patch,
>> configuring with --disable-sjlj-exceptions selects DWARF-2 EH in 32-bit
>> mode and SEH in 64-bit mode.
>>
>> I tested this in a cross-compiler configured to build C and C++ only for
>> x86_64 mingw-w64 target, and test results look about the same as the
>> default configuration, which uses SJLJ for 32-bit mode.  (If it's
>> relevant, I also used the compiler built with the 5.1 version of the
>> patch to build complete Windows-host toolchains for nios2-elf and
>> nios2-linux-gnu, which I tested by running the GDB testsuite.)  There
>> are no actual code changes here so I'd expect the 32-bit DWARF-2 EH
>> support to work exactly the same as in a 32-bit-only x86 configuration.
>>
>> OK to commit?
>
> OK.
>
> Thanks,
> Jeff


Re: [RFC] Type promotion pass and elimination of zext/sext

2016-05-19 Thread Jeff Law

On 05/15/2016 06:45 PM, Kugan Vivekanandarajah wrote:

Hi Richard,

Now that stage1 is open, I would like to get the type promotion passes
reviewed again. I have tested the patches on aarch64, x86-64, and
ppc64le without any new execution failures. There some test-cases that
fails for patterns. I will address them after getting feedback on the
basic structure.
I find myself wondering if this will eliminate some of the cases where 
Kai's type casting motion was useful.  And just to be clear, that would 
be a good thing.




1. When we promote SSA as part of promote_ssa, we either promote the
definition. Or create a copy stmt that is inserted after the stmt that
define it. i.e, we want to promote the SSA and reflect the promotion
on all the uses (we promote in place). We do this because, we don’t
want to change all the uses.

+/* Promote definition DEF to promoted type.  If the stmt that defines def
+   is def_stmt, make the type of def promoted type.  If the stmt is such
+   that, result of the def_stmt cannot be of promoted type, create a new_def
+   of the original_type and make the def_stmt assign its value to newdef.
+   Then, create a NOP_EXPR to convert new_def to def of promoted type.
+
+   For example, for stmt with original_type char and promoted_type int:
+char _1 = mem;
+becomes:
+char _2 = mem;
+int _1 = (int)_2;
When does this case happen, and how is this any better than PRE or other 
elimination/code motion algorithms in improving the generated code?


I would hazard a guess that it could happen if you still needed the char 
sized used in a small number of cases, but generally wanted to promote 
most uses to int?



+

However, if the defining stmt has to be the last stmt in the basic
block (eg, stmt that can throw), and if there is more than one normal
edges where we use this value, we cant insert the copy in all the
edges. Please note that the copy stmt copes the value to promoted SSA
with the same name.

Therefore I had to return false in this case for promote_ssa and fixup
uses. I ran into this while testing ppc64le. I am sure it can happen
in other cases.

Right.

Jeff


Re: Revert gcc r227962

2016-05-19 Thread Jeff Law

On 02/27/2016 03:39 AM, JonY wrote:

On 2/27/2016 05:26, Jeff Law wrote:

On 02/26/2016 04:04 AM, JonY wrote:

Hi,

I've submitted a patch that was committed as r227962, it causes some
unintended side effects (namely libuuid on Cygwin). Can someone please
revert?

Kai still needs some time to setup his gcc development environment.

We'd need to have a better sense of why this is causing problems.  If
Kai could chime in with a description of the problem it would be helpful.



Normally, /usr/lib is searched BEFORE /usr/lib/w32api. The patch caused
w32api to be searched before /usr/lib.

libuuid.a is known to exist in both directories, with the *nix version
in /usr/lib and the import library for Windows in w32api. The w32api
copy is completely unrelated to the *nix version. This break cygwin apps
that expect to link against *nix libuuid.
So if we make this change (revert 227962), my understanding is that 
cygwin bootstraps will fail because they won't find kernel32 and perhaps 
other libraries.


Jeff


Re: [JAVA PATCH] Builtin support for popcount* and bswap* functions

2016-05-19 Thread Jeff Law

On 02/22/2016 11:13 AM, ro...@nextmovesoftware.com wrote:


The following patch provides builtin support for byte swapping and bit counting.
On suitable hardware, these generate the x86 popcount instructions, as also
generated by the SUN HotSpot JIT/JVM for these method calls.

java.lang.Integer.bitCount -> __builtin_popcount
java.lang.Long.bitCount -> __builtin_popcountl

java.lang.Short.reverseBytes -> __builtin_bswap16
java.lang.Integer.reverseBytes -> __builtin_bswap32
java.lang.Long.reverseBytes -> __builtin_bswap64

New test cases have been added to libjava to double check nothing breaks.
Whist I was there I noticed that the math builtins (many of which I added 
support for
back in 2003/2004) weren't marked as ECF_LEAF, indicating that they don't/can't
invoke any user specified functions.   This has also been fixed in the patch 
below.

The following patch has been tested on x86_64-pc-linux-gnu with a full make 
bootstrap
and make check with no new failures/regressions.

Ok for stage1 once it reopens?

Cheers,

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, 
Cambridge CB4 0EY

2016-02-21  Roger Sayle  

gcc/java:
* builtins.c (java_builtins): Use popcount* and bswap* builtins to
implement bitCount() and reverseBytes() methods in java.lang.Integer
and friends.
(initialize_builtins): Annotate math builtins with ECF_LEAF.  Call
define_builtin for the new popcount* and bswap* builtins.

libjava:
* testsuite/libjava.lang/BuiltinBitCount.java: New test case.
* testsuite/libjava.lang/BuiltinReverseBytes.java: Likewise.

OK for the trunk.

Thanks,
Jeff



Re: [patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Jeff Law

On 05/19/2016 11:36 AM, Sandra Loosemore wrote:

This is a slightly revised version of the WIP patch against GCC 5.1 I
previously posted here:

https://gcc.gnu.org/ml/gcc/2016-05/msg00135.html

To recap, I needed a biarch x86_64 mingw-w64 target compiler that uses
DWARF-2 exception handling in 32-bit mode (for compatibility with an
older i686-mingw32 toolchain configured with --disable-sjlj-exceptions).
 But, the configuration machinery was rejecting this.  It used to be
that SJLJ exceptions were the only choice for 64-bit mode so it was
correct to reject --disable-sjlj-exceptions in a biarch toolchain, but
now the default for 64-bit mode is SEH instead.  With this patch,
configuring with --disable-sjlj-exceptions selects DWARF-2 EH in 32-bit
mode and SEH in 64-bit mode.

I tested this in a cross-compiler configured to build C and C++ only for
x86_64 mingw-w64 target, and test results look about the same as the
default configuration, which uses SJLJ for 32-bit mode.  (If it's
relevant, I also used the compiler built with the 5.1 version of the
patch to build complete Windows-host toolchains for nios2-elf and
nios2-linux-gnu, which I tested by running the GDB testsuite.)  There
are no actual code changes here so I'd expect the 32-bit DWARF-2 EH
support to work exactly the same as in a 32-bit-only x86 configuration.

OK to commit?

OK.

Thanks,
Jeff


Re: [cilkplus] Fix precompiled header bug (PR cilkplus/70865)

2016-05-19 Thread Jeff Law

On 05/19/2016 11:58 AM, Ryan Burn wrote:

The file cilk.h defines the memory managed cilk_trees variable, but
fails to include the header in the GTFILES list. When a precompiled
header is loaded, the array is then not properly restored and points
to garbage memory, causing a segfault. This patch fixes the problem by
adding the cilk files to GTFILES and gengtype.c.

Bootstrapped and regression tested on x86_64-linux.

2016-05-16  Ryan Burn  

  * Makefile.in (GTFILES): Add cilk.h and cilk-common.c.
  * gengtype.c (open_base_files): Add cilk.h to ifiles.


Thanks.  Installed on the trunk.

jeff


MAINTAINERS update

2016-05-19 Thread Jeff Law


Spurred by the lack of response to Sandra's message WRT a cygwin/mingw 
issue, I did a quick pass through the MAINTAINERS file for folks that 
are listed as maintainers, but aren't (to the best of my knowledge) 
acting in those positions anymore.


I removed their names from the maintainers sections, but kept/added them 
to write-after-approval.Affected individuals:


Geoff Keating
Kazu Hirata
Steve Ellcey
Eric Christopher
Kai Tietz
Dave Korn
Bryce McKinlay
Michael Hayes
Torbjorn Granlund
Jason Eckhardt
Roger Sayle
Daniel Berlin


I'm sure further cleanup is possible/advisable.  This was just a quick 
pass through to clean up some cruft.


OK for the trunk?

Jeff


* MAINTAINERS: Move several inactive maintainers to the
write-after-approval section.

diff --git a/MAINTAINERS b/MAINTAINERS
index c615168..abda292 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24,7 +24,6 @@ Richard Earnshaw  

 Richard Biener 
 Richard Henderson  
 Jakub Jelinek  
-Geoffrey Keating   
 Richard Kenner 
 Jeff Law   
 Michael Meissner   
@@ -59,14 +58,12 @@ frv portNick Clifton

 frv port   Alexandre Oliva 
 ft32 port  James Bowman
 h8 portJeff Law
-h8 portKazu Hirata 
 hppa port  Jeff Law
 hppa port  John David Anglin   
 i386 port  Jan Hubicka 
 i386 port  Uros Bizjak 
 i386 vector ISA extns  Kirill Yukhin   
 ia64 port  Jim Wilson  
-ia64 port  Steve Ellcey
 iq2000 portNick Clifton
 lm32 port  Sebastien Bourdeauducq  
 m32c port  DJ Delorie  
@@ -77,7 +74,6 @@ m68k-motorola-sysv port   Philippe De Muyter  

 mcore port Nick Clifton
 microblaze Michael Eager   
 mips port  Catherine Moore 
-mips port  Eric Christopher
 mips port  Matthew Fortune 
 mmix port  Hans-Peter Nilsson  
 mn10300 port   Jeff Law
@@ -125,12 +121,10 @@ xtensa port   Sterling Augustine  

 aixDavid Edelsohn  
 Android sub-port   Maxim Kuvyrkov  
 darwin portMike Stump  
-darwin portEric Christopher
 DJGPP  DJ Delorie  
 freebsdAndreas Tobler  
 GNU/Hurd   Thomas Schwinge 
 hpux   John David Anglin   
-hpux   Steve Ellcey
 solarisRainer Orth 

 netbsd Jason Thorpe
 netbsd Krister Walfridsson 
@@ -141,8 +135,6 @@ RTEMS Ports Sebastian Huber 

 VMSDouglas Rupp
 VMSTristan Gingold 
 VxWorks ports  Nathan Sidwell  
-windows, cygwin, mingw Kai Tietz   
-windows, cygwin, mingw Dave Korn   
 
Language Front Ends Maintainers
 
@@ -169,7 +161,6 @@ fp-bit  Ian Lance Taylor

 libdecnumber   Ben Elliston
 libgcc Ian Lance Taylor
 libgcj Tom Tromey  
-libgcj Bryce McKinlay  
 libgo  Ian 

[cilkplus] Fix precompiled header bug (PR cilkplus/70865)

2016-05-19 Thread Ryan Burn
The file cilk.h defines the memory managed cilk_trees variable, but
fails to include the header in the GTFILES list. When a precompiled
header is loaded, the array is then not properly restored and points
to garbage memory, causing a segfault. This patch fixes the problem by
adding the cilk files to GTFILES and gengtype.c.

Bootstrapped and regression tested on x86_64-linux.

2016-05-16  Ryan Burn  

  * Makefile.in (GTFILES): Add cilk.h and cilk-common.c.
  * gengtype.c (open_base_files): Add cilk.h to ifiles.


pr70865.patch
Description: Binary data


Re: C++ PATCH to improve dump_decl (PR c++/71075)

2016-05-19 Thread Marek Polacek
On Thu, May 19, 2016 at 12:55:18PM -0400, Jason Merrill wrote:
> Well, constants aren't declarations.  Why are we calling dump_decl here?

Eh, not sure what I was thinking...

We call dump_decl because unify_template_argument_mismatch has %qD:
inform (input_location,
"  template argument %qE does not match %qD", arg, parm);
so I suspect we want to print expression ever for PARM:

Untested, but I think it should test fine.  Ok if testing passes?

2016-05-19  Marek Polacek  

PR c++/71075
* pt.c (unify_template_argument_mismatch): Use %qE instead of %qD.

* g++.dg/diagnostic/pr71075.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index fde3091..ea7778d 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -6165,7 +6165,7 @@ unify_template_argument_mismatch (bool explain_p, tree 
parm, tree arg)
 {
   if (explain_p)
 inform (input_location,
-   "  template argument %qE does not match %qD", arg, parm);
+   "  template argument %qE does not match %qE", arg, parm);
   return 1;
 }
 
diff --git gcc/testsuite/g++.dg/diagnostic/pr71075.C 
gcc/testsuite/g++.dg/diagnostic/pr71075.C
index e69de29..6bb1e68 100644
--- gcc/testsuite/g++.dg/diagnostic/pr71075.C
+++ gcc/testsuite/g++.dg/diagnostic/pr71075.C
@@ -0,0 +1,8 @@
+// PR c++/71075
+
+template struct A {};
+template void foo(A) {}
+int main() {
+  foo(A()); // { dg-error "no matching" }
+// { dg-message "template argument .2. does not match .1." "" { target *-*-* } 
6 }
+}

Marek


[patch] Allow configuration with --disable-sjlj-exceptions on biarch x86 Windows targets

2016-05-19 Thread Sandra Loosemore
This is a slightly revised version of the WIP patch against GCC 5.1 I 
previously posted here:


https://gcc.gnu.org/ml/gcc/2016-05/msg00135.html

To recap, I needed a biarch x86_64 mingw-w64 target compiler that uses 
DWARF-2 exception handling in 32-bit mode (for compatibility with an 
older i686-mingw32 toolchain configured with --disable-sjlj-exceptions). 
 But, the configuration machinery was rejecting this.  It used to be 
that SJLJ exceptions were the only choice for 64-bit mode so it was 
correct to reject --disable-sjlj-exceptions in a biarch toolchain, but 
now the default for 64-bit mode is SEH instead.  With this patch, 
configuring with --disable-sjlj-exceptions selects DWARF-2 EH in 32-bit 
mode and SEH in 64-bit mode.


I tested this in a cross-compiler configured to build C and C++ only for 
x86_64 mingw-w64 target, and test results look about the same as the 
default configuration, which uses SJLJ for 32-bit mode.  (If it's 
relevant, I also used the compiler built with the 5.1 version of the 
patch to build complete Windows-host toolchains for nios2-elf and 
nios2-linux-gnu, which I tested by running the GDB testsuite.)  There 
are no actual code changes here so I'd expect the 32-bit DWARF-2 EH 
support to work exactly the same as in a 32-bit-only x86 configuration.


OK to commit?

-Sandra

2016-05-19  Sandra Loosemore  

	gcc/
	* config/i386/cygming.h (DWARF2_UNWIND_INFO): Allow 
	--disable-sjlj-exceptions for TARGET_BI_ARCH to select DWARF-2 EH
	for 32-bit mode and SEH for 64-bit.
	* config/i386/mingw32.h (SHARED_LIBGCC_UNDEFS_SPEC): Handle
	TARGET_64BIT_DEFAULT.

	libgcc/
	* config.host [x86_64-*-cygwin*]: Handle tmake_eh_file for mixed 
	dw2/seh configuration.
	[x86_64-*-mingw*]: Likewise.
Index: gcc/config/i386/cygming.h
===
--- gcc/config/i386/cygming.h	(revision 236256)
+++ gcc/config/i386/cygming.h	(working copy)
@@ -339,16 +339,13 @@ do {		\
 #define ASM_COMMENT_START " #"
 
 #ifndef DWARF2_UNWIND_INFO
-/* If configured with --disable-sjlj-exceptions, use DWARF2, else
-   default to SJLJ.  */
+/* If configured with --disable-sjlj-exceptions, use DWARF2 for 32-bit
+   mode else default to SJLJ.  64-bit code uses SEH unless you request
+   SJLJ.  */
 #if  (defined (CONFIG_SJLJ_EXCEPTIONS) && !CONFIG_SJLJ_EXCEPTIONS)
 /* The logic of this #if must be kept synchronised with the logic
-   for selecting the tmake_eh_file fragment in config.gcc.  */
+   for selecting the tmake_eh_file fragment in libgcc/config.host.  */
 #define DWARF2_UNWIND_INFO 1
-/* If multilib is selected break build as sjlj is required.  */
-#if defined (TARGET_BI_ARCH)
-#error For 64-bit windows and 32-bit based multilib version of gcc just SJLJ exceptions are supported.
-#endif
 #else
 #define DWARF2_UNWIND_INFO 0
 #endif
Index: gcc/config/i386/mingw32.h
===
--- gcc/config/i386/mingw32.h	(revision 236256)
+++ gcc/config/i386/mingw32.h	(working copy)
@@ -100,10 +100,12 @@ along with GCC; see the file COPYING3.  
 #if DWARF2_UNWIND_INFO
 /* DW2-unwind is just available for 32-bit mode.  */
 #if TARGET_64BIT_DEFAULT
-#error DW2 unwind is not available for 64-bit.
-#endif
+#define SHARED_LIBGCC_UNDEFS_SPEC \
+  "%{m32: %{shared-libgcc: -u ___register_frame_info -u ___deregister_frame_info}}"
+#else
 #define SHARED_LIBGCC_UNDEFS_SPEC \
  "%{shared-libgcc: -u ___register_frame_info -u ___deregister_frame_info}"
+#endif
 #else
 #define SHARED_LIBGCC_UNDEFS_SPEC ""
 #endif
Index: libgcc/config.host
===
--- libgcc/config.host	(revision 236256)
+++ libgcc/config.host	(working copy)
@@ -678,6 +678,9 @@ x86_64-*-cygwin*)
 	# This has to match the logic for DWARF2_UNWIND_INFO in gcc/config/i386/cygming.h
 	if test x$ac_cv_sjlj_exceptions = xyes; then
 		tmake_eh_file="i386/t-sjlj-eh"
+	elif test "${host_address}" = 32; then
+	# biarch -m32 with --disable-sjlj-exceptions
+	 	tmake_eh_file="i386/t-dw2-eh"
 	else
 		tmake_eh_file="i386/t-seh-eh"
 	fi
@@ -730,6 +733,10 @@ x86_64-*-mingw*)
 	# This has to match the logic for DWARF2_UNWIND_INFO in gcc/config/i386/cygming.h
 	if test x$ac_cv_sjlj_exceptions = xyes; then
 		tmake_eh_file="i386/t-sjlj-eh"
+	elif test "${host_address}" = 32; then
+	# biarch -m32 with --disable-sjlj-exceptions
+	 	tmake_eh_file="i386/t-dw2-eh"
+		md_unwind_header=i386/w32-unwind.h
 	else
 		tmake_eh_file="i386/t-seh-eh"
 	fi


Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-05-19 Thread Joseph Myers
On Thu, 19 May 2016, Jiong Wang wrote:

> Then,
> 
>   * if we add scalar HF mode to standard patterns, vector HF modes operation
> will be
> turned into scalar HF operations instead of scalar SF operations.
> 
>   * if we add vector HF mode to standard patterns, vector HF modes operations
> will
> generate vector HF instructions directly.
> 
>   Will this still cause precision inconsistence with old gcc when there are
> cascade
>   vector float operations?

I'm not sure inconsistency with old GCC is what's relevant here.

Standard-named RTL patterns have particular semantics.  Those semantics do 
not depend on the target architecture (except where there are target 
macros / hooks to define such dependence).  If you have an instruction 
that matches those target-independent semantics, it should be available 
for the standard-named pattern.  I believe that is the case here, for both 
the scalar and the vector instructions - they have the standard semantics, 
so should be available for the standard patterns.

It is the responsibility of the target-independent parts of the compiler 
to ensure that the RTL generated matches the source code semantics, so 
that providing a standard pattern for an instruction that matches the 
pattern's semantics does not cause any problems regarding source code 
semantics.

That said: if the expander in old GCC is converting a vector HF operation 
into scalar SF operations, I'd expect it also to include a conversion from 
SFmode back to HFmode after those operations, since it will be producing a 
vector HF result.  And that would apply for each individual operation 
expanded.  So I would not expect inconsistency to arise from making direct 
HFmode operations available (given that the semantics of scalar + - * / 
are the same whether you do them directly on HFmode or promote to SFmode, 
do the operation there and then convert the result back to HFmode before 
doing any further operations on it).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR c/71171: Fix uninitialized source_range in c_parser_postfix_expression

2016-05-19 Thread Jeff Law

On 05/18/2016 07:08 PM, David Malcolm wrote:

PR c/71171 reports yet another instance of the src_range of a
c_expr being used without initialization.  Investigation shows
that this was due to error-handling, where the "value" field of
a c_expr is set to error_mark_node without touching the
src_range, leading to complaints from valgrind.

This seems to be a common mistake, so this patch introduces a
new method, c_expr::set_error, which sets the value to
error_mark_node whilst initializing the src_range to
UNKNOWN_LOCATION.

This fixes the valgrind issue seen in PR c/71171, along with various
similar issues seen when running the testsuite using the checker
patch I posted here:
  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00887.html
(this checker still doesn't fully work yet, but it seems to be good
for easily detecting these issues without needing Valgrind).

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk and for gcc-6-branch?

gcc/c/ChangeLog:
PR c/71171
* c-parser.c (c_parser_generic_selection): Use c_expr::set_error
in error-handling.
(c_parser_postfix_expression): Likewise.
* c-tree.h (c_expr::set_error): New method.
* c-typeck.c (parser_build_binary_op): In error-handling, ensure
that result's range is initialized.

OK.
jeff



Re: [PATCH 2/3] function: Factor out make_*logue_seq

2016-05-19 Thread Jeff Law

On 05/19/2016 01:16 AM, Jakub Jelinek wrote:

On Wed, May 18, 2016 at 05:13:25PM -0500, Segher Boessenkool wrote:

On Wed, May 18, 2016 at 01:35:16PM -0500, Segher Boessenkool wrote:

On Wed, May 18, 2016 at 11:20:29AM -0700, H.J. Lu wrote:

* function.c (make_split_prologue_seq, make_prologue_seq,
make_epilogue_seq): New functions, factored out from...
(thread_prologue_and_epilogue_insns): Here.


It breaks x86:


Are you sure it is this patch causing it?  As noted, it was tested on x86.


I am pretty sure.  How did you test it on x86?


"make -k check".  I'll test 32-bit now.


Actually, it also fails on 64 bit.  It passed my testing because it does
not fail together with patch 3/3, and does not fail on powerpc at all.


If 3/3 isn't approved soon, can you please revert the problematic commit
until it is if that patch can't work right on its own and needs the other
patch too)?  The trunk is in terrible state right now at least on
x86_64/i686-linux, various tests hang forever (e.g. some cleanup-* tests)
and there are hundreds of failures, making it impossible to do proper
regression testing.

FWIW, I'm still looking at 3/3.

jeff


Re: [PATCH, rtl-optimization]: Fix TRAP_IF dependencies by forcing pending loads to memory

2016-05-19 Thread Jeff Law

On 05/19/2016 11:02 AM, Uros Bizjak wrote:

Hello!

I was looking at recent g++.dg/ext/sync-4.C testsuite FAILure on
alpha-linux-gnu. The testcase installs SIGSEGV handler and among other
tests, it does various tests with atomic operations on NULL addresses.

One test (f19):

FN(19, void, (__atomic_exchange((ditype*)p, (ditype*)0, (ditype*)0,
__ATOMIC_SEQ_CST)))

expands to following initial RTX sequence:

5: r71:DI=0
6: r70:DI=[r71:DI]
7: {trap_if 0x1;use $29:DI;}
8: barrier

where _.sched2 pass is free to reorder insns from

6: $1:DI=[0]
  REG_UNUSED $1:DI
7: {trap_if 0x1;use $29:DI;}
  REG_DEAD $29:DI
8: barrier

to:

7: {trap_if 0x1;use $29:DI;}
  REG_DEAD $29:DI
6: $1:DI=[0]
  REG_UNUSED $1:DI
8: barrier

resulting in:

$ ./a.out
Trace/breakpoint trap

Please note that handler is able to recover from SIGSEGV, but not from
SIGTRAP. If these two signals are reordered, the testcase fails.

Proposed patch solves this issue by also forcing pending loads to
memory. This way, memory access is ordered with trap insn, and the
testcase passes.

2016-05-19  Uros Bizjak  

* sched-deps.c (sched_analyze_2) : Also
force pending loads from memory.

Patch was bootstrapped and regression tested on alphaev68-linux-gnu
and x86_64-linux-gnu {,-m32}.

OK for mainline and release branches?

OK.

jeff


Re: [PATCH v3] Take known zero bits into account when checking extraction.

2016-05-19 Thread Jeff Law

On 05/19/2016 05:18 AM, Dominik Vogt wrote:

On Mon, May 16, 2016 at 01:09:36PM -0600, Jeff Law wrote:

> On 05/11/2016 02:52 AM, Dominik Vogt wrote:

> >On Wed, May 11, 2016 at 10:40:11AM +0200, Bernd Schmidt wrote:
> >That's what I mentioned somewhere during the discussion.  The s390
> >backend just uses COSTS_N_INSNS(1) for AND as well as ZERO_EXTEND,
> >so this won't ever trigger.  I just left the rtx_cost call in the
> >patch for further discussion as Jeff said he liked the approach.
> >We don't need it to achieve the behaviour we want for s390.

> I liked it, just based on the general theory that we should be
> comparing costs of a transform to the original much more often than
> we currently do.
>
> If Bernd prefers it gone and you don't need it to achieve your
> goals, then I won't object to the costing stuff going away.

All right, third version attached, without the rtx_vost call;
bootstrapped and regression tested on s390, s390x, x86_64.

On Wed, Apr 27, 2016 at 09:20:05AM +0100, Dominik Vogt wrote:

> The attached patch is a result of discussing an S/390 issue with
> "and with complement" in some cases.
>
>   https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
>   https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html
>
> Combine would merge a ZERO_EXTEND and a SET taking the known zero
> bits into account, resulting in an AND.  Later on,
> make_compound_operation() fails to replace that with a ZERO_EXTEND
> which we get for free on S/390 but leaves the AND, eventually
> resulting in two consecutive AND instructions.
>
> The current code in make_compound_operation() that detects
> opportunities for ZERO_EXTEND does not work here because it does
> not take the known zero bits into account:
>
>   /* If the constant is one less than a power of two, this might be
>representable by an extraction even if no shift is present.
>If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>we are in a COMPARE.  */
>   else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
>   new_rtx = make_extraction (mode,
>  make_compound_operation (XEXP (x, 0),
>   next_code),
>  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
>
> An attempt to use the zero bits in the above conditions resulted
> in many situations that generated worse code, so the patch tries
> to fix this in a more conservative way.  While the effect is
> completely positive on S/390, this will very likely have
> unforeseeable consequences on other targets.
>
> Bootstrapped and regression tested on s390 and s390x only at the
> moment.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-v3-ChangeLog


gcc/ChangeLog

* combine.c (make_compound_operation): Take known zero bits into
account when checking for possible zero_extend.
gcc/testsuite/ChangeLog

* gcc.dg/zero_bits_compound-1.c: New test.
* gcc.dg/zero_bits_compound-2.c: New test.
I'm a little worried about the tests.  They check for lp64, but the 
tests actually need a stronger set of conditions to pass.


I'm thinking that the tests ought to be opt-in as I don't think we have 
a set of effective-target tests we can use.


So OK with the tests restricted to the targets where you've verified 
they work.


Jeff




[PATCH, rtl-optimization]: Fix TRAP_IF dependencies by forcing pending loads to memory

2016-05-19 Thread Uros Bizjak
Hello!

I was looking at recent g++.dg/ext/sync-4.C testsuite FAILure on
alpha-linux-gnu. The testcase installs SIGSEGV handler and among other
tests, it does various tests with atomic operations on NULL addresses.

One test (f19):

FN(19, void, (__atomic_exchange((ditype*)p, (ditype*)0, (ditype*)0,
__ATOMIC_SEQ_CST)))

expands to following initial RTX sequence:

5: r71:DI=0
6: r70:DI=[r71:DI]
7: {trap_if 0x1;use $29:DI;}
8: barrier

where _.sched2 pass is free to reorder insns from

6: $1:DI=[0]
  REG_UNUSED $1:DI
7: {trap_if 0x1;use $29:DI;}
  REG_DEAD $29:DI
8: barrier

to:

7: {trap_if 0x1;use $29:DI;}
  REG_DEAD $29:DI
6: $1:DI=[0]
  REG_UNUSED $1:DI
8: barrier

resulting in:

$ ./a.out
Trace/breakpoint trap

Please note that handler is able to recover from SIGSEGV, but not from
SIGTRAP. If these two signals are reordered, the testcase fails.

Proposed patch solves this issue by also forcing pending loads to
memory. This way, memory access is ordered with trap insn, and the
testcase passes.

2016-05-19  Uros Bizjak  

* sched-deps.c (sched_analyze_2) : Also
force pending loads from memory.

Patch was bootstrapped and regression tested on alphaev68-linux-gnu
and x86_64-linux-gnu {,-m32}.

OK for mainline and release branches?

Uros.
Index: sched-deps.c
===
--- sched-deps.c(revision 236461)
+++ sched-deps.c(working copy)
@@ -2709,9 +2709,12 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rt
return;
   }
 
-/* Force pending stores to memory in case a trap handler needs them.  */
+/* Force pending stores to memory in case a trap handler needs them.
+   Also force pending loads from memory; loads and stores can segfault
+   and the signal handler won't be triggered if the trap insn was moved
+   above load or store insn.  */
 case TRAP_IF:
-  flush_pending_lists (deps, insn, true, false);
+  flush_pending_lists (deps, insn, true, true);
   break;
 
 case PREFETCH:


Re: [PATCH, libgcc/ARM 1/6, ping1] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-05-19 Thread Kyrill Tkachov


On 19/05/16 17:55, Thomas Preudhomme wrote:

On Thursday 19 May 2016 17:42:26 Kyrill Tkachov wrote:

Hi Thomas,

I'm not very familiar with the libgcc machinery, but I have a comment on an
arm.h hunk inline.
On 17/05/16 10:58, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

  * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and
  __ARM_ARCH_ISA_ARM to
  decide whether to prevent some libgcc routines being included for
  some
  multilibs rather than __ARM_ARCH_6M__ and add comment to indicate
  the
  link between this condition and the one in
  libgcc/config/arm/lib1func.S.
  * config/arm/arm.h (TARGET_ARM_V6M): Add check to
  TARGET_ARM_ARCH.
  (TARGET_ARM_V7M): Likewise.

*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

  * lib/target-supports.exp (check_effective_target_arm_cortex_m):
  Use
  __ARM_ARCH_ISA_ARM to test for Cortex-M devices.

*** libgcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

  * config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1
  rather
  than ARMv6-M.
  * config/arm/lib1funcs.S (__prefer_thumb__): Define among other
  cases
  for all Thumb-1 only targets.
  (__only_thumb1__): Define for all Thumb-1 only targets.
  (THUMB_LDIV0): Test for __only_thumb1__ rather than
  __ARM_ARCH_6M__.
  (EQUIV): Likewise.
  (ARM_FUNC_ALIAS): Likewise.
  (umodsi3): Add check to __only_thumb1__ to guard the idiv
  version.
  (modsi3): Likewise.
  (HAVE_ARM_CLZ): Remove block defining it.
  (clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__
  and
  check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ.
  (clzdi2): Likewise.
  (ctzsi2): Likewise.
  (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather
  than
  __ARM_ARCH_6M__ in guard for checking whether it is defined.
  (final includes): Test for __only_thumb1__ rather than
  __ARM_ARCH_6M__ and add comment to indicate the connection
  between
  this condition and the one in gcc/config/arm/elf.h.
  * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
  __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
  * config/arm/t-softfp: Likewise.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
5b1a03080d0a00fc1ef6934f6bce552e65230640..7eb11d920944c693700d80bb3fb3f9fe
66611c19 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2188,8 +2188,10 @@ extern int making_const_table;

   #define TARGET_ARM_ARCH  \
   
 (arm_base_arch)	\


-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm
\ + && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm
\ + && arm_arch_thumb2)

I think now that you're checking TARGET_ARM_ARCH you don't need
the "!arm_arch_notm && !arm_arch_thumb2" && "!arm_arch_notm &&
arm_arch_thumb2".

% git --no-pager grep TARGET_ARM_V.M config/arm
config/arm/arm.h:#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
config/arm/arm.h:#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)

What about just removing those? I kept them because I wasn't sure of their
purpose but I think we should just remove them.


That's fine with me.
Then you'd also want to remove the TARGET_ARM_V8M definition from your second 
patch
that I flagged up in that review.

Kyrill


Best regards,

Thomas





Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-05-19 Thread Jiong Wang



On 18/05/16 01:58, Joseph Myers wrote:

On Tue, 17 May 2016, Matthew Wahab wrote:


As with the VFP FP16 arithmetic instructions, operations on __fp16
values are done by conversion to single-precision. Any new optimization
supported by the instruction descriptions can only apply to code
generated using intrinsics added in this patch series.

As with the scalar instructions, I think it is legitimate in most cases to
optimize arithmetic via single precision to work direct on __fp16 values
(and this would be natural for vectorization of __fp16 arithmetic).


Hi Josephy,

  Currently for vector types like v4hf, there is not type promotion, it 
will live
on arm until it reaches vector lower pass where it's splitted into hf 
operations, then
these hf operations will be widened into sf operation during rtl expand 
as we don't have

scalar hf support on standard patterns.

Then,

  * if we add scalar HF mode to standard patterns, vector HF modes 
operation will be

turned into scalar HF operations instead of scalar SF operations.

  * if we add vector HF mode to standard patterns, vector HF modes 
operations will

generate vector HF instructions directly.

  Will this still cause precision inconsistence with old gcc when there 
are cascade

  vector float operations?

  Thanks

Regards,
Jiong


Re: [PATCH, ARM 2/7, ping1] Add support for ARMv8-M

2016-05-19 Thread Kyrill Tkachov

Hi Thomas,

On 17/05/16 11:08, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2015-11-23  Thomas Preud'homme  

 * config/arm/arm-arches.def (armv8-m.base): Define new architecture.
 (armv8-m.main): Likewise.
 (armv8-m.main+dsp): Likewise
 * config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define.
 (FL_FOR_ARCH8M_MAIN): Likewise.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and
 armv8-m.main+dsp to BE8_LINK_SPEC.
 * config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M.
 (enum base_architecture): Add BASE_ARCH_8M_BASE and BASE_ARCH_8M_MAIN.
 (TARGET_ARM_V8M): Define.
 * config/arm/arm.c (arm_arch_name): Increase size to work with ARMv8-M
 Baseline and Mainline.
 (arm_option_override_internal): Also disable arm_restrict_it when
 !arm_arch_notm.
 (arm_file_start): Increase architecture buffer size.


I'd say "buffer size for printing architecture name" rather than architecture :)


 * doc/invoke.texi: Document architectures armv8-m.base, armv8-m.main
 and armv8-m.main+dsp.
 (mno-unaligned-access): Clarify that this is disabled by default for
 ARMv8-M Baseline architecture as well.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

 * lib/target-supports.exp: Generate add_options_for_arm_arch_FUNC and
 check_effective_target_arm_arch_FUNC_multilib for ARMv8-M Baseline and
 ARMv8-M Mainline architectures.


*** libgcc/ChangeLog ***

2015-11-10  Thomas Preud'homme  

 * config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8-M.


diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index
fd02b18db01d03da19fd66412f4bf1df00ce25da..be46521c9eaea54f9ad78a92874567589289dbdf
100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -62,6 +62,12 @@ ARM_ARCH("armv8.1-a", cortexa53,  8A,
  ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 FL2_FOR_ARCH8_1A))
+ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
+ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE))
+ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
+ARM_FSET_MAKE_CPU1(FL_CO_PROC |  FL_FOR_ARCH8M_MAIN))
+ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
+ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
  ARM_ARCH("iwmmxt",  iwmmxt, 5TE,ARM_FSET_MAKE_CPU1 (FL_LDSCHED |
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
  ARM_ARCH("iwmmxt2", iwmmxt2,5TE,ARM_FSET_MAKE_CPU1 (FL_LDSCHED |
FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
  
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

index
d8179c441bb53dced94d2ebf497aad093e4ac600..63235cb63acf3e676fac5b61e1195081efd64075
100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -420,6 +420,8 @@ extern bool arm_is_constant_pool_ref (rtx);
  #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
  #define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8)
  #define FL2_FOR_ARCH8_1A  FL2_ARCH8_1
+#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV)
+#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
  
  /* There are too many feature bits to fit in a single word so the set of cpu

and
 fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index
adec6c95367f686a6732d259dc789933bc23b780..1544ef7fcd4a853d0881d37e8e3b7a630c7692f0
100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -428,10 +428,19 @@ EnumValue
  Enum(arm_arch) String(armv8.1-a+crc) Value(28)
  
  EnumValue

-Enum(arm_arch) String(iwmmxt) Value(29)
+Enum(arm_arch) String(armv8-m.base) Value(29)
  
  EnumValue

-Enum(arm_arch) String(iwmmxt2) Value(30)
+Enum(arm_arch) String(armv8-m.main) Value(30)
+
+EnumValue
+Enum(arm_arch) String(armv8-m.main+dsp) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(32)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(33)
  
  Enum

  Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
7eb11d920944c693700d80bb3fb3f9fe66611c19..1d976b36300d92d538098b3cf83c60d62ed2be1c
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -266,7 +266,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 || arm_arch7) && arm_arch_notm)
  
  /* Nonzero if this chip supports load-acquire and store-release.  */

-#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8)
+#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
  
  /* Nonzero if integer division instructions 

Re: [PATCH, libgcc/ARM 1/6, ping1] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-05-19 Thread Thomas Preudhomme
On Thursday 19 May 2016 17:42:26 Kyrill Tkachov wrote:
> Hi Thomas,
> 
> I'm not very familiar with the libgcc machinery, but I have a comment on an
> arm.h hunk inline.
> On 17/05/16 10:58, Thomas Preudhomme wrote:
> > Ping?
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2015-11-13  Thomas Preud'homme  
> > 
> >  * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and
> >  __ARM_ARCH_ISA_ARM to
> >  decide whether to prevent some libgcc routines being included for
> >  some
> >  multilibs rather than __ARM_ARCH_6M__ and add comment to indicate
> >  the
> >  link between this condition and the one in
> >  libgcc/config/arm/lib1func.S.
> >  * config/arm/arm.h (TARGET_ARM_V6M): Add check to
> >  TARGET_ARM_ARCH.
> >  (TARGET_ARM_V7M): Likewise.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2015-11-10  Thomas Preud'homme  
> > 
> >  * lib/target-supports.exp (check_effective_target_arm_cortex_m):
> >  Use
> >  __ARM_ARCH_ISA_ARM to test for Cortex-M devices.
> > 
> > *** libgcc/ChangeLog ***
> > 
> > 2015-12-17  Thomas Preud'homme  
> > 
> >  * config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1
> >  rather
> >  than ARMv6-M.
> >  * config/arm/lib1funcs.S (__prefer_thumb__): Define among other
> >  cases
> >  for all Thumb-1 only targets.
> >  (__only_thumb1__): Define for all Thumb-1 only targets.
> >  (THUMB_LDIV0): Test for __only_thumb1__ rather than
> >  __ARM_ARCH_6M__.
> >  (EQUIV): Likewise.
> >  (ARM_FUNC_ALIAS): Likewise.
> >  (umodsi3): Add check to __only_thumb1__ to guard the idiv
> >  version.
> >  (modsi3): Likewise.
> >  (HAVE_ARM_CLZ): Remove block defining it.
> >  (clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__
> >  and
> >  check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ.
> >  (clzdi2): Likewise.
> >  (ctzsi2): Likewise.
> >  (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather
> >  than
> >  __ARM_ARCH_6M__ in guard for checking whether it is defined.
> >  (final includes): Test for __only_thumb1__ rather than
> >  __ARM_ARCH_6M__ and add comment to indicate the connection
> >  between
> >  this condition and the one in gcc/config/arm/elf.h.
> >  * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
> >  __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
> >  * config/arm/t-softfp: Likewise.
> > 
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index
> > 5b1a03080d0a00fc1ef6934f6bce552e65230640..7eb11d920944c693700d80bb3fb3f9fe
> > 66611c19 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -2188,8 +2188,10 @@ extern int making_const_table;
> > 
> >   #define TARGET_ARM_ARCH   \
> >   
> > (arm_base_arch) \
> > 
> > -#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
> > -#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
> > +#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm
> > \ + && !arm_arch_thumb2)
> > +#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm
> > \ + && arm_arch_thumb2)
> 
> I think now that you're checking TARGET_ARM_ARCH you don't need
> the "!arm_arch_notm && !arm_arch_thumb2" && "!arm_arch_notm &&
> arm_arch_thumb2".

% git --no-pager grep TARGET_ARM_V.M config/arm
config/arm/arm.h:#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
config/arm/arm.h:#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)

What about just removing those? I kept them because I wasn't sure of their 
purpose but I think we should just remove them.

Best regards,

Thomas


Re: C++ PATCH to improve dump_decl (PR c++/71075)

2016-05-19 Thread Jason Merrill

Well, constants aren't declarations.  Why are we calling dump_decl here?

Jason



Re: [PATCH, ARM 7/7, ping1] Enable atomics for ARMv8-M Mainline

2016-05-19 Thread Thomas Preudhomme
On Thursday 19 May 2016 17:18:29 Kyrill Tkachov wrote:
> Hi Thomas,
> 
> On 17/05/16 11:15, Thomas Preudhomme wrote:
> > Ping?
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2015-12-17  Thomas Preud'homme  
> > 
> >  * config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M
> >  Mainline.
> > 
> > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> > index
> > 347b5b0a5cc0bc1e3b5020c8124d968e76ce48a4..e154bd31b8084f9f45ad4409e7b38de6
> > 52538c51 100644
> > --- a/gcc/config/arm/arm.h
> > +++ b/gcc/config/arm/arm.h
> > @@ -266,7 +266,7 @@ extern void (*arm_lang_output_object_attributes_hook)
> > (void);
> > 
> >  || arm_arch7) && arm_arch_notm)
> >   
> >   /* Nonzero if this chip supports load-acquire and store-release.  */
> > 
> > -#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
> > +#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
> 
> So this change is correct because ARMv8-M Mainline uses Thumb2
> and is therefore TARGET_32BIT.
> 
> This is ok but I'd like to see a follow up patch to enable the tests
> that exercise acquire-release instructions in the arm.exp testsuite
> for ARMv8-M Mainline so that we can be sure they get proper testsuite
> coverage.

Good thing I already have one around. I need to separate it from other stuff 
though, so I'll probably send it on Monday.

Cheers,

Thomas


C++ PATCH to improve dump_decl (PR c++/71075)

2016-05-19 Thread Marek Polacek
This PR compains about ugly diagnostics where we print
template argument '2' does not match '#'integer_cst' not supported by 
dump_decl#'
The following patch teaches dump_decl how to print constants properly.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-19  Marek Polacek  

PR c++/71075
* error.c (dump_decl): Handle *_CST.

* g++.dg/diagnostic/pr71075.C: New test.

diff --git gcc/cp/error.c gcc/cp/error.c
index 7d70f89..d3232bd 100644
--- gcc/cp/error.c
+++ gcc/cp/error.c
@@ -1269,6 +1269,14 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
   dump_type (pp, t, flags);
   break;
 
+case VOID_CST:
+case INTEGER_CST:
+case REAL_CST:
+case STRING_CST:
+case COMPLEX_CST:
+  pp->constant (t);
+  break;
+
 default:
   pp_unsupported_tree (pp, t);
   /* Fall through to error.  */
diff --git gcc/testsuite/g++.dg/diagnostic/pr71075.C 
gcc/testsuite/g++.dg/diagnostic/pr71075.C
index e69de29..6bb1e68 100644
--- gcc/testsuite/g++.dg/diagnostic/pr71075.C
+++ gcc/testsuite/g++.dg/diagnostic/pr71075.C
@@ -0,0 +1,8 @@
+// PR c++/71075
+
+template struct A {};
+template void foo(A) {}
+int main() {
+  foo(A()); // { dg-error "no matching" }
+// { dg-message "template argument .2. does not match .1." "" { target *-*-* } 
6 }
+}

Marek


Re: [PATCH, libgcc/ARM 1/6, ping1] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-05-19 Thread Kyrill Tkachov

Hi Thomas,

I'm not very familiar with the libgcc machinery, but I have a comment on an 
arm.h hunk inline.

On 17/05/16 10:58, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

 * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
 decide whether to prevent some libgcc routines being included for some
 multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
 link between this condition and the one in
 libgcc/config/arm/lib1func.S.
 * config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH.
 (TARGET_ARM_V7M): Likewise.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

 * lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
 __ARM_ARCH_ISA_ARM to test for Cortex-M devices.


*** libgcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

 * config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather
 than ARMv6-M.
 * config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
 for all Thumb-1 only targets.
 (__only_thumb1__): Define for all Thumb-1 only targets.
 (THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__.
 (EQUIV): Likewise.
 (ARM_FUNC_ALIAS): Likewise.
 (umodsi3): Add check to __only_thumb1__ to guard the idiv version.
 (modsi3): Likewise.
 (HAVE_ARM_CLZ): Remove block defining it.
 (clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and
 check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ.
 (clzdi2): Likewise.
 (ctzsi2): Likewise.
 (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
 __ARM_ARCH_6M__ in guard for checking whether it is defined.
 (final includes): Test for __only_thumb1__ rather than
 __ARM_ARCH_6M__ and add comment to indicate the connection between
 this condition and the one in gcc/config/arm/elf.h.
 * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
 __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
 * config/arm/t-softfp: Likewise.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
5b1a03080d0a00fc1ef6934f6bce552e65230640..7eb11d920944c693700d80bb3fb3f9fe66611c19
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2188,8 +2188,10 @@ extern int making_const_table;
  #define TARGET_ARM_ARCH   \
(arm_base_arch) \
  
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)

-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \
+   && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \
+   && arm_arch_thumb2)
  


I think now that you're checking TARGET_ARM_ARCH you don't need
the "!arm_arch_notm && !arm_arch_thumb2" && "!arm_arch_notm && arm_arch_thumb2".

Kyrill


  /* The highest Thumb instruction set version supported by the chip.  */
  #define TARGET_ARM_ARCH_ISA_THUMB \
diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index
77f30554d5286bd83aeab0c8dc308cfd44e732dc..246de5492665ba2a0292736a9c53fbaaef184d72
100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -148,8 +148,9 @@
while (0)
  
  /* Horrible hack: We want to prevent some libgcc routines being included

-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
  #undef L_fixdfsi
  #undef L_fixunsdfsi
  #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
supports.exp
index
04ca17656f2f26dda710e8a0f9ca77dd963ab39b..38151375c29cd007f1cc34ead3aa495606224061
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3320,10 +3320,8 @@ proc check_effective_target_arm_cortex_m { } {
return 0
  }
  return [check_no_compiler_messages arm_cortex_m assembly {
-   #if !defined(__ARM_ARCH_7M__) \
-&& !defined (__ARM_ARCH_7EM__) \
-&& !defined (__ARM_ARCH_6M__)
-   #error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__
+   #if defined(__ARM_ARCH_ISA_ARM)
+   #error __ARM_ARCH_ISA_ARM is defined
#endif
int i;
  } "-mthumb"]
diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index
5d35aa6afca224613c94cf923f8a2ee8dac949f2..b1d375dfb88efb899ce6213013205b85e531a884
100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -1,4 +1,4 @@
-/* Miscellaneous BPABI functions.  ARMv6M implementation
+/* Miscellaneous BPABI functions.  

Re: [C++ PATCH] PR c++/69855

2016-05-19 Thread Jason Merrill

On 05/05/2016 09:11 AM, Ville Voutilainen wrote:

On 5 May 2016 at 13:36, Paolo Carlini  wrote:

.. minor nit: the new testcase has a number of trailing blank lines.


New patch attached. :)


Sorry for the delay.

Please use ".diff" for patches so that they are properly recognized as 
text/x-patch.


The patch looks good, but it could use a comment explaining what it's doing.

Any thoughts on doing something similar for extern variable declarations?

Jason



Re: [Patch ARM/AArch64 11/11] Add missing tests for vreinterpret, operating of fp16 type.

2016-05-19 Thread Kyrill Tkachov


On 11/05/16 14:24, Christophe Lyon wrote:

2016-05-04  Christophe Lyon  

 * gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Add fp16 tests.
 * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Likewise.
 * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: Likewise.


Ok.
Thanks for working on these!
Kyrill



Change-Id: Ic8061f1a5f3e042844a33a70c0f42a5f92c43c98

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
index 2570f73..0de2ab3 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
@@ -21,6 +21,8 @@ VECT_VAR_DECL(expected_s8_8,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 
0xf3,
0xf4, 0xf5, 0xf6, 0xf7 };
  VECT_VAR_DECL(expected_s8_9,int,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
0xf2, 0xff, 0xf3, 0xff };
+VECT_VAR_DECL(expected_s8_10,int,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
+0x00, 0xcb, 0x80, 0xca };
  
  /* Expected results for vreinterpret_s16_xx.  */

  VECT_VAR_DECL(expected_s16_1,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
@@ -32,6 +34,7 @@ VECT_VAR_DECL(expected_s16_6,int,16,4) [] = { 0xfff0, 0x, 
0xfff1, 0x };
  VECT_VAR_DECL(expected_s16_7,int,16,4) [] = { 0xfff0, 0x, 0x, 0x 
};
  VECT_VAR_DECL(expected_s16_8,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
  VECT_VAR_DECL(expected_s16_9,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_s16_10,int,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 
};
  
  /* Expected results for vreinterpret_s32_xx.  */

  VECT_VAR_DECL(expected_s32_1,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
@@ -43,6 +46,7 @@ VECT_VAR_DECL(expected_s32_6,int,32,2) [] = { 0xfff0, 
0xfff1 };
  VECT_VAR_DECL(expected_s32_7,int,32,2) [] = { 0xfff0, 0x };
  VECT_VAR_DECL(expected_s32_8,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
  VECT_VAR_DECL(expected_s32_9,int,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
+VECT_VAR_DECL(expected_s32_10,int,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
  
  /* Expected results for vreinterpret_s64_xx.  */

  VECT_VAR_DECL(expected_s64_1,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
@@ -54,6 +58,7 @@ VECT_VAR_DECL(expected_s64_6,int,64,1) [] = { 
0xfff1fff0 };
  VECT_VAR_DECL(expected_s64_7,int,64,1) [] = { 0xfff0 };
  VECT_VAR_DECL(expected_s64_8,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
  VECT_VAR_DECL(expected_s64_9,int,64,1) [] = { 0xfff3fff2fff1fff0 };
+VECT_VAR_DECL(expected_s64_10,int,64,1) [] = { 0xca80cb00cb80cc00 };
  
  /* Expected results for vreinterpret_u8_xx.  */

  VECT_VAR_DECL(expected_u8_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
@@ -74,6 +79,8 @@ VECT_VAR_DECL(expected_u8_8,uint,8,8) [] = { 0xf0, 0xf1, 
0xf2, 0xf3,
 0xf4, 0xf5, 0xf6, 0xf7 };
  VECT_VAR_DECL(expected_u8_9,uint,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
 0xf2, 0xff, 0xf3, 0xff };
+VECT_VAR_DECL(expected_u8_10,uint,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
+ 0x00, 0xcb, 0x80, 0xca };
  
  /* Expected results for vreinterpret_u16_xx.  */

  VECT_VAR_DECL(expected_u16_1,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
@@ -85,6 +92,7 @@ VECT_VAR_DECL(expected_u16_6,uint,16,4) [] = { 0xfff0, 
0x, 0xfff1, 0x };
  VECT_VAR_DECL(expected_u16_7,uint,16,4) [] = { 0xfff0, 0x, 0x, 0x 
};
  VECT_VAR_DECL(expected_u16_8,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 
};
  VECT_VAR_DECL(expected_u16_9,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 
};
+VECT_VAR_DECL(expected_u16_10,uint,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 
};
  
  /* Expected results for vreinterpret_u32_xx.  */

  VECT_VAR_DECL(expected_u32_1,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
@@ -96,6 +104,7 @@ VECT_VAR_DECL(expected_u32_6,uint,32,2) [] = { 0xfff1fff0, 
0xfff3fff2 };
  VECT_VAR_DECL(expected_u32_7,uint,32,2) [] = { 0xfff0, 0x };
  VECT_VAR_DECL(expected_u32_8,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
  VECT_VAR_DECL(expected_u32_9,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
+VECT_VAR_DECL(expected_u32_10,uint,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
  
  /* Expected results for vreinterpret_u64_xx.  */

  VECT_VAR_DECL(expected_u64_1,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
@@ -107,6 +116,7 @@ VECT_VAR_DECL(expected_u64_6,uint,64,1) [] = { 
0xfff3fff2fff1fff0 };
  VECT_VAR_DECL(expected_u64_7,uint,64,1) [] = { 0xfff1fff0 };
  VECT_VAR_DECL(expected_u64_8,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
  VECT_VAR_DECL(expected_u64_9,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
+VECT_VAR_DECL(expected_u64_10,uint,64,1) [] = { 0xca80cb00cb80cc00 };
  
  /* Expected results for 

Re: [Patch ARM/AArch64 10/11] Add missing tests for intrinsics operating on poly64 and poly128 types.

2016-05-19 Thread Kyrill Tkachov


On 13/05/16 16:16, James Greenhalgh wrote:

On Wed, May 11, 2016 at 03:24:00PM +0200, Christophe Lyon wrote:

2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (result):
Add poly64x1_t and poly64x2_t cases if supported.
* gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
(buffer, buffer_pad, buffer_dup, buffer_dup_pad): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/p64_p128.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
@@ -0,0 +1,665 @@
+/* This file contains tests for all the *p64 intrinsics, except for
+   vreinterpret which have their own testcase.  */
+
+/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-add-options arm_crypto } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results: vbsl.  */
+VECT_VAR_DECL(vbsl_expected,poly,64,1) [] = { 0xfff1 };
+VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfff1,
+ 0xfff1 };
+
+/* Expected results: vceq.  */
+VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };

vceqq_p64
vceqz_p64
vceqzq_p64
vtst_p64
vtstq_p64

are missing, but will not be trivial to add. Could you raise a bug report
(or fix it if you like :-) )?

This is OK without a fix for those intrinsics with a suitable bug report
opened.


That's ok by me too.
Thanks,
Kyrill


Thanks,
James





Re: [Patch ARM/AArch64 09/11] Add missing vrnd{,a,m,n,p,x} tests.

2016-05-19 Thread Kyrill Tkachov

Hi Christophe,

On 11/05/16 14:23, Christophe Lyon wrote:

2016-05-02  Christophe Lyon  

* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c: New.
* gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c: New.


Drop the gcc/testsuite from the ChangeLog entry. Just "* gcc.target/aarch64/..."

Ok with the fixed ChangeLog.

Thanks,
Kyrill



Change-Id: Iab5f98dc4b15f9a2f61b622a9f62b207872f1737

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
new file mode 100644
index 000..5f492d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrnd
+#define TEST_MSG "VRND"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
new file mode 100644
index 000..629240d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
@@ -0,0 +1,43 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1 (NAME)
+
+void FNNAME (INSN) (void)
+{
+  /* vector_res = vrndX (vector), then store the result.  */
+#define TEST_VRND2(INSN, Q, T1, T2, W, N)  \
+  VECT_VAR (vector_res, T1, W, N) =\
+INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N));   \
+vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N),\
+  VECT_VAR (vector_res, T1, W, N))
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VRND1(INSN, Q, T1, T2, W, N)  \
+  TEST_VRND2 (INSN, Q, T1, T2, W, N)
+
+#define TEST_VRND(Q, T1, T2, W, N) \
+  TEST_VRND1 (INSN, Q, T1, T2, W, N)
+
+  DECL_VARIABLE (vector, float, 32, 2);
+  DECL_VARIABLE (vector, float, 32, 4);
+
+  DECL_VARIABLE (vector_res, float, 32, 2);
+  DECL_VARIABLE (vector_res, float, 32, 4);
+
+  clean_results ();
+
+  VLOAD (vector, buffer, , float, f, 32, 2);
+  VLOAD (vector, buffer, q, float, f, 32, 4);
+
+  TEST_VRND ( , float, f, 32, 2);
+  TEST_VRND (q, float, f, 32, 4);
+
+  CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
+  CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+}
+
+int
+main (void)
+{
+  FNNAME (INSN) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
new file mode 100644
index 000..816fd28d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrnda
+#define TEST_MSG "VRNDA"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
new file mode 100644
index 000..029880c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc180, 0xc170 };
+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc180, 0xc170,
+  0xc160, 0xc150 };
+
+#define INSN vrndm
+#define TEST_MSG "VRNDM"
+
+#include "vrndX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
new file mode 100644
index 000..571243c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
@@ -0,0 +1,16 @@
+/* { 

Re: [Patch ARM/AArch64 08/11] Add missing vstX_lane fp16 tests.

2016-05-19 Thread Kyrill Tkachov


On 11/05/16 14:23, Christophe Lyon wrote:

2016-05-02  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Add fp16 tests.


Ok.
Thanks,
Kyrill



Change-Id: I64e30bc30a9a9cc5c47eff212e7d745bf3230fe7

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
index b923b64..282edd5 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
@@ -14,6 +14,7 @@ VECT_VAR_DECL(expected_st2_0,uint,32,2) [] = { 0xfff0, 
0xfff1 };
  VECT_VAR_DECL(expected_st2_0,poly,8,8) [] = { 0xf0, 0xf1, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
  VECT_VAR_DECL(expected_st2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -24,6 +25,8 @@ VECT_VAR_DECL(expected_st2_0,uint,32,4) [] = { 0xfff0, 
0xfff1,
   0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0x0, 0x0,
+0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_0,hfloat,32,4) [] = { 0xc180, 0xc170,
 0x0, 0x0 };
  
@@ -39,6 +42,7 @@ VECT_VAR_DECL(expected_st2_1,uint,32,2) [] = { 0x0, 0x0 };

  VECT_VAR_DECL(expected_st2_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_1,hfloat,32,2) [] = { 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -48,6 +52,8 @@ VECT_VAR_DECL(expected_st2_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 
0x0,
  VECT_VAR_DECL(expected_st2_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st2_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st2_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
  
  /* Expected results for vst3, chunk 0.  */

@@ -62,6 +68,7 @@ VECT_VAR_DECL(expected_st3_0,uint,32,2) [] = { 0xfff0, 
0xfff1 };
  VECT_VAR_DECL(expected_st3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0x0,
  0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st3_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0 };
+VECT_VAR_DECL(expected_st3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0 };
  VECT_VAR_DECL(expected_st3_0,hfloat,32,2) [] = { 0xc180, 0xc170 };
  VECT_VAR_DECL(expected_st3_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -73,6 +80,8 @@ VECT_VAR_DECL(expected_st3_0,uint,32,4) [] = { 0xfff0, 
0xfff1,
   0xfff2, 0x0 };
  VECT_VAR_DECL(expected_st3_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0,
+0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st3_0,hfloat,32,4) [] = { 0xc180, 0xc170,
 0xc160, 0x0 };
  
@@ -88,6 +97,7 @@ VECT_VAR_DECL(expected_st3_1,uint,32,2) [] = { 0xfff2, 0x0 };

  VECT_VAR_DECL(expected_st3_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st3_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st3_1,hfloat,32,2) [] = { 0xc160, 0x0 };
  VECT_VAR_DECL(expected_st3_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
  0x0, 0x0, 0x0, 0x0 };
@@ -97,6 +107,8 @@ VECT_VAR_DECL(expected_st3_1,uint,16,8) [] = { 0x0, 0x0, 
0x0, 0x0,
  VECT_VAR_DECL(expected_st3_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
  VECT_VAR_DECL(expected_st3_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_st3_1,hfloat,16,8) [] = { 

Re: [PATCH, ARM 7/7, ping1] Enable atomics for ARMv8-M Mainline

2016-05-19 Thread Kyrill Tkachov

Hi Thomas,

On 17/05/16 11:15, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
347b5b0a5cc0bc1e3b5020c8124d968e76ce48a4..e154bd31b8084f9f45ad4409e7b38de652538c51
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -266,7 +266,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 || arm_arch7) && arm_arch_notm)
  
  /* Nonzero if this chip supports load-acquire and store-release.  */

-#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
+#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
  


So this change is correct because ARMv8-M Mainline uses Thumb2
and is therefore TARGET_32BIT.

This is ok but I'd like to see a follow up patch to enable the tests
that exercise acquire-release instructions in the arm.exp testsuite
for ARMv8-M Mainline so that we can be sure they get proper testsuite
coverage.

Thanks,
Kyrill



  /* Nonzero if this chip provides the movw and movt instructions.  */
  #define TARGET_HAVE_MOVT  (arm_arch_thumb2 || arm_arch8)


Best regards,

Thomas

On Thursday 17 December 2015 17:39:29 Thomas Preud'homme wrote:

Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
This specific patch enable atomics for ARMv8-M Mainline. No change is
needed to existing patterns since Thumb-2 backend can already handle them
fine.

[1] For a quick overview of ARMv8-M please refer to the initial cover
letter.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
1f79c37b5c36a410a2d500ba92c62a5ba4ca1178..fa2a6fb03ffd2ca53bfb7e7c8f03022b6
26880e0 100644 --- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -258,7 +258,7 @@ extern void
(*arm_lang_output_object_attributes_hook)(void);
 || arm_arch7) && arm_arch_notm)

  /* Nonzero if this chip supports load-acquire and store-release.  */
-#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
+#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)

  /* Nonzero if this chip provides the movw and movt instructions.  */
  #define TARGET_HAVE_MOVT  (arm_arch_thumb2 || arm_arch8)


Testing:

* Toolchain was built successfully with and without the ARMv8-M support
patches with the following multilib list:
armv6-m,armv7-m,armv7e-m,cortex-m7. The code generation for crtbegin.o,
crtend.o, crti.o, crtn.o, libgcc.a, libgcov.a, libc.a, libg.a,
libgloss-linux.a, libm.a, libnosys.a, librdimon.a, librdpmon.a, libstdc++.a
and libsupc++.a is unchanged for all these targets.

* GCC also showed no testsuite regression when targeting ARMv8-M Baseline
compared to ARMv6-M on ARM Fast Models and when targeting ARMv6-M and
ARMv7-M (compared to without the patch) * GCC was bootstrapped successfully
targeting Thumb-1 and targeting Thumb-2

Is this ok for stage3?

Best regards,

Thomas




Re: [PATCH, ARM 5/7, ping1] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-05-19 Thread Thomas Preudhomme
On Wednesday 18 May 2016 12:30:41 Kyrill Tkachov wrote:
> Hi Thomas,
> 
> This looks mostly good with a few nits inline.
> Please repost with the comments addressed.

Updated ChangeLog entries:

*** gcc/ChangeLog ***

2016-05-18  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT.
* config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
(thumb_legitimate_constant_p): Strip the high part of a label_ref.
(thumb1_rtx_costs): Also return 0 if setting a half word constant and
MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL by
UINTVAL.
(thumb1_size_rtx_costs): Make set of half word constant also cost 1
extra instruction if MOVW is available.  Use a cost variable
incremented by COSTS_N_INSNS (1) when the condition match rather than
returning an arithmetic expression based on COSTS_N_INSNS.  Make
constant with bottom half word zero cost 2 instruction if MOVW is
available.
* config/arm/arm.md (define_attr "arch"): Add v8mb.
(define_attr "arch_enabled"): Set to yes if arch value is v8mb and
target is ARMv8-M Baseline.
* config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only
alternative for constants satisfying j constraint.
(thumb1_movsi_insn): Likewise.
(movsi splitter for K alternative): Tighten condition to not trigger
if movt is available and j constraint is satisfied.
(Pe immediate splitter): Likewise.
(thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
constant fitting in an halfword to use MOVW.
* doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new ARM
effective target.


*** gcc/testsuite/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ko):
Define effective target.
* gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead of
arm_thumb1_ok as effective target to exclude ARMv8-M Baseline.


and patch:

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
47216b4a1959ccdb18e329db411bf7f941e67163..f42e996e5a7ce979fe406b8261d50fb2ba005f6b
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -269,7 +269,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
 
 /* Nonzero if this chip provides the movw and movt instructions.  */
-#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d75a34f10d5ed22cff0a0b5d3ad433f111b059ee..a05e559c905daa55e686491a038342360c721912
 
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8220,6 +8220,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x)
 static bool
 thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
+  /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high
+ RTX.  These RTX must therefore be allowed for Thumb-1 so that when run
+ for ARMv8-M baseline or later the result is valid.  */
+  if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
+x = XEXP (x, 0);
+
   return (CONST_INT_P (x)
  || CONST_DOUBLE_P (x)
  || CONSTANT_ADDRESS_P (x)
@@ -8306,7 +8312,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
 case CONST_INT:
   if (outer == SET)
{
- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+ if (UINTVAL (x) < 256
+ || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x)))
return 0;
  if (thumb_shiftable_const (INTVAL (x)))
return COSTS_N_INSNS (2);
@@ -9009,7 +9016,7 @@ static inline int
 thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 {
   machine_mode mode = GET_MODE (x);
-  int words;
+  int words, cost;
 
   switch (code)
 {
@@ -9055,17 +9062,26 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
   /* A SET doesn't have a mode, so let's look at the SET_DEST to get
 the mode.  */
   words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST (x;
-  return COSTS_N_INSNS (words)
-+ COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x))
-   || satisfies_constraint_K (SET_SRC (x))
-  /* thumb1_movdi_insn.  */
-   || ((words > 1) && MEM_P (SET_SRC (x;
+  cost = COSTS_N_INSNS (words);
+  if (satisfies_constraint_J (SET_SRC (x))
+ || satisfies_constraint_K 

Re: [PATCH, ARM 4/7, ping1] Factor out MOVW/MOVT availability and desirability checks

2016-05-19 Thread Thomas Preudhomme
On Wednesday 18 May 2016 11:47:47 Kyrill Tkachov wrote:
> Hi Thomas,

Hi Kyrill,

Please find below the updated patch and associated ChangeLog entry.

*** gcc/ChangeLog ***

2016-05-18  Thomas Preud'homme  

* config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability
with TARGET_HAVE_MOVT.
(TARGET_HAVE_MOVT): Define.
* config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
* config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt
availability.
(addsi splitter): Use TARGET_THUMB && TARGET_HAVE_MOVT rather than
TARGET_THUMB2.
(symbol_refs movsi splitter): Remove TARGET_32BIT check.
(arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability.
* config/arm/constraints.md (define_constraint "j"): Use
TARGET_HAVE_MOVT to check movt availability.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
1d976b36300d92d538098b3cf83c60d62ed2be1c..d199e5ebb89194fdcc962ae9653dd159a67bb7bc
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -237,7 +237,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 
 /* Should MOVW/MOVT be used in preference to a constant pool.  */
 #define TARGET_USE_MOVT \
-  (arm_arch_thumb2 \
+  (TARGET_HAVE_MOVT \
&& (arm_disable_literal_pool \
|| (!optimize_size && !current_tune->prefer_constant_pool)))
 
@@ -268,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 /* Nonzero if this chip supports load-acquire and store-release.  */
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
 
+/* Nonzero if this chip provides the MOVW and MOVW instructions.  */
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7b95ba0b379c31ee650e714ce2198a43b1cadbac..d75a34f10d5ed22cff0a0b5d3ad433f111b059ee
 
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3897,7 +3897,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
 {
 case SET:
   /* See if we can use movw.  */
-  if (arm_arch_thumb2 && (i & 0x) == 0)
+  if (TARGET_HAVE_MOVT && (i & 0x) == 0)
return 1;
   else
/* Otherwise, try mvn.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
4049f104c6d5fd8bfd8f68ecdfae6a3d34d4333f..8aa9fedf5c07e78bc7ba793b39bebcc45a4d5921
 
100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5705,7 +5705,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
+  "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
@@ -5765,7 +5765,8 @@
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
   (match_operand:SI 2 "const_int_operand" ""]
-  "TARGET_THUMB2
+  "TARGET_THUMB
+   && TARGET_HAVE_MOVT
&& arm_disable_literal_pool
&& reload_completed
&& GET_CODE (operands[1]) == SYMBOL_REF"
@@ -5796,8 +5797,7 @@
 (define_split
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(match_operand:SI 1 "general_operand" ""))]
-  "TARGET_32BIT
-   && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+  "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
&& !flag_pic && !target_word_relocations
&& !arm_tls_referenced_p (operands[1])"
   [(clobber (const_int 0))]
@@ -10965,7 +10965,7 @@
(const_int 16)
(const_int 16))
 (match_operand:SI 1 "const_int_operand" ""))]
-  "arm_arch_thumb2"
+  "TARGET_HAVE_MOVT"
   "movt%?\t%0, %L1"
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
3b71c4a527064290066348cb234c6abb8c8e2e43..4ece5f013c92adee04157b5c909e1d47c894c994
 
100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -66,7 +66,7 @@
 
 (define_constraint "j"
  "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"
- (and (match_test "TARGET_32BIT && arm_arch_thumb2")
+ (and (match_test "TARGET_HAVE_MOVT")
   (ior (and (match_code "high")
(match_test "arm_valid_symbolic_address_p (XEXP (op, 0))"))
   (and (match_code "const_int")


Best regards,

Thomas
> 
> On 17/05/16 11:11, Thomas Preudhomme wrote:
> > Ping?
> > 
> > *** 

Re: [PING][PATCH] New plugin event when evaluating a constexpr call

2016-05-19 Thread Jason Merrill

On 05/06/2016 10:23 AM, Andres Tiraboschi wrote:

+static tree
+eval_call_plugin_callback (const constexpr_ctx *ctx, tree fun,
+  bool lval, bool *non_constant_p, bool *overflow_p)


This function needs a comment.


-static void
+void
 cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t,


Why expose this function?  If you want the reduced forms of the 
arguments in your plugin, can't you call cxx_eval_constant_expression on 
them directly?


Jason



Re: [PATCH] Fix PR c++/70822 (bogus error with parenthesized SCOPE_REF)

2016-05-19 Thread Jason Merrill

OK.

Jason


Make do_loop use estimated_num_iterations/expected_num_iterations

2016-05-19 Thread Jan Hubicka
Hi,
this patch makes doloop_optimize to use the
get_estimated_loop_iterations_int/get_max_loop_iterations_int instead of weakter
check for const_iter.  Bootstrapped/regtested x86_64-linux, OK?

Honza

* loop-doloop.c (doloop_optimize): Use get_estimated_loop_iterations_int
and get_max_loop_iterations_int.
Index: loop-doloop.c
===
--- loop-doloop.c   (revision 236450)
+++ loop-doloop.c   (working copy)
@@ -610,7 +610,8 @@ doloop_optimize (struct loop *loop)
   widest_int iterations, iterations_max;
   rtx_code_label *start_label;
   rtx condition;
-  unsigned level, est_niter;
+  unsigned level;
+  HOST_WIDE_INT est_niter;
   int max_cost;
   struct niter_desc *desc;
   unsigned word_mode_size;
@@ -635,21 +636,16 @@ doloop_optimize (struct loop *loop)
 }
   mode = desc->mode;
 
-  est_niter = 3;
-  if (desc->const_iter)
-est_niter = desc->niter;
-  /* If the estimate on number of iterations is reliable (comes from profile
- feedback), use it.  Do not use it normally, since the expected number
- of iterations of an unrolled loop is 2.  */
-  if (loop->header->count)
-est_niter = expected_loop_iterations (loop);
+  est_niter = get_estimated_loop_iterations_int (loop);
+  if (est_niter == -1)
+est_niter = get_max_loop_iterations_int (loop);
 
-  if (est_niter < 3)
+  if (est_niter >= 0 && est_niter < 3)
 {
   if (dump_file)
fprintf (dump_file,
 "Doloop: Too few iterations (%u) to be profitable.\n",
-est_niter);
+(unsigned int)est_niter);
   return false;
 }
 


Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-19 Thread Jason Merrill
Why implement this in the front end rather than at the gimple level?

Jason


On Tue, May 10, 2016 at 2:19 PM, Marek Polacek  wrote:
> Over the years, we got several PRs that suggested to add a warning that would
> warn about unreachable statements between `switch (cond)' and the first case.
> In some cases our -Wuninitialized warning can detect such a case, but mostly
> not.  This patch is an attempt to add a proper warning about this peculiar
> case.  I chose to not warn about declarations between switch and the first
> case, because we use that in the codebase and I think this kind of use is
> fine.  As it's usually the case, some obscure cases cropped up during testing,
> this time those were Duff's devices, so I had to go the extra mile to handle
> them specially.
>
> This is a C FE part only; I'd like to hear some feedback first before I plunge
> into the C++ FE.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2016-05-10  Marek Polacek  
>
> PR c/49859
> * c.opt (Wswitch-unreachable): New option.
>
> * c-parser.c: Include tree-iterator.h.
> (c_parser_switch_statement): Implement -Wswitch-unreachable warning.
>
> * doc/invoke.texi: Document -Wswitch-unreachable.
>
> * gcc.dg/Wjump-misses-init-1.c: Use -Wno-switch-unreachable.
> * gcc.dg/nested-func-1.c: Likewise.
> * gcc.dg/pr67784-4.c: Likewise.
> * gcc.dg/Wswitch-unreachable-1.c: New test.
> * gcc.dg/c99-vla-jump-5.c (f): Add dg-warning.
> * gcc.dg/switch-warn-1.c (foo1): Likewise.
> * c-c++-common/goacc/sb-2.c: Likewise.
> * gcc.dg/gomp/block-10.c: Likewise.
> * gcc.dg/gomp/block-9.c: Likewise.
> * gcc.dg/gomp/target-1.c: Likewise.
> * gcc.dg/gomp/target-2.c: Likewise.
> * gcc.dg/gomp/taskgroup-1.c: Likewise.
> * gcc.dg/gomp/teams-1.c: Likewise.
>
> diff --git gcc/c-family/c.opt gcc/c-family/c.opt
> index bdc6ee0..8b6fdbb 100644
> --- gcc/c-family/c.opt
> +++ gcc/c-family/c.opt
> @@ -634,6 +634,11 @@ Wswitch-bool
>  C ObjC C++ ObjC++ Var(warn_switch_bool) Warning Init(1)
>  Warn about switches with boolean controlling expression.
>
> +Wswitch-unreachable
> +C ObjC C++ ObjC++ Var(warn_switch_unreachable) Warning Init(1)
> +Warn about statements between switch's controlling expression and the first
> +case.
> +
>  Wtemplates
>  C++ ObjC++ Var(warn_templates) Warning
>  Warn on primary template declaration.
> diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> index 6523c08..bdf8e8e 100644
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-indentation.h"
>  #include "gimple-expr.h"
>  #include "context.h"
> +#include "tree-iterator.h"
>
>  /* We need to walk over decls with incomplete struct/union/enum types
> after parsing the whole translation unit.
> @@ -5605,7 +5606,30 @@ c_parser_switch_statement (c_parser *parser, bool 
> *if_p)
>c_start_case (switch_loc, switch_cond_loc, expr, explicit_cast_p);
>save_break = c_break_label;
>c_break_label = NULL_TREE;
> +  location_t first_loc = (c_parser_next_token_is (parser, CPP_OPEN_BRACE)
> + ? c_parser_peek_2nd_token (parser)->location
> + : c_parser_peek_token (parser)->location);
>body = c_parser_c99_block_statement (parser, if_p);
> +  tree first = expr_first (TREE_CODE (body) == BIND_EXPR
> +  ? BIND_EXPR_BODY (body) : body);
> +  /* Statements between `switch' and the first case are never executed.  */
> +  if (warn_switch_unreachable
> +  && first != NULL_TREE
> +  && TREE_CODE (first) != CASE_LABEL_EXPR
> +  && TREE_CODE (first) != LABEL_EXPR)
> +{
> +  if (TREE_CODE (first) == DECL_EXPR
> + && DECL_INITIAL (DECL_EXPR_DECL (first)) == NULL_TREE)
> +   /* Don't warn for a declaration, but warn for an initialization.  */;
> +  else if (TREE_CODE (first) == GOTO_EXPR
> +  && TREE_CODE (GOTO_DESTINATION (first)) == LABEL_DECL
> +  && DECL_ARTIFICIAL (GOTO_DESTINATION (first)))
> +   /* Don't warn for compiler-generated gotos.  These occur in Duff's
> +  devices, for example.  */;
> +  else
> +   warning_at (first_loc, OPT_Wswitch_unreachable,
> +   "statement will never be executed");
> +}
>c_finish_case (body, ce.original_type);
>if (c_break_label)
>  {
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index a54a0af..8f9c186 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -297,7 +297,8 @@ Objective-C and Objective-C++ Dialects}.
>  -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
>  -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol
>  -Wmissing-format-attribute -Wsubobject-linkage @gol
> --Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool 

Make vectorizer to use max_loop_iterations_int

2016-05-19 Thread Jan Hubicka
Hi,
this patch makes vect_analyze_loop_2 to give up on loops with low max iteration
counts (instead of only considering estimated_stmt_executions_int).  This
change was eaerlier approved by Richi in stage4 and reverted.
I got wrong the testm instead == -1 there was != -1 and thus the regressions.
Noticed that only now.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 236477)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2016-05-19  Jan Hubicka  
+
+   * tree-vect-loop.c (vect_analyze_loop_2): Use also 
+   max_loop_iterations_int.
+
 2016-05-19  Marek Polacek  
 
PR tree-optimization/71031
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 236450)
+++ tree-vect-loop.c(working copy)
@@ -2065,6 +2065,8 @@ start_over:
 
   estimated_niter
 = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter == -1)
+estimated_niter = max_niter;
   if (estimated_niter != -1
   && ((unsigned HOST_WIDE_INT) estimated_niter
   <= MAX (th, (unsigned)min_profitable_estimate)))


Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-19 Thread Marek Polacek
Any comments on this patch?  Should I pursue the C++ part?

On Tue, May 10, 2016 at 08:19:29PM +0200, Marek Polacek wrote:
> Over the years, we got several PRs that suggested to add a warning that would
> warn about unreachable statements between `switch (cond)' and the first case.
> In some cases our -Wuninitialized warning can detect such a case, but mostly
> not.  This patch is an attempt to add a proper warning about this peculiar
> case.  I chose to not warn about declarations between switch and the first
> case, because we use that in the codebase and I think this kind of use is
> fine.  As it's usually the case, some obscure cases cropped up during testing,
> this time those were Duff's devices, so I had to go the extra mile to handle
> them specially.
> 
> This is a C FE part only; I'd like to hear some feedback first before I plunge
> into the C++ FE.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-05-10  Marek Polacek  
> 
>   PR c/49859
>   * c.opt (Wswitch-unreachable): New option.
> 
>   * c-parser.c: Include tree-iterator.h.
>   (c_parser_switch_statement): Implement -Wswitch-unreachable warning.
> 
>   * doc/invoke.texi: Document -Wswitch-unreachable.
> 
>   * gcc.dg/Wjump-misses-init-1.c: Use -Wno-switch-unreachable.
>   * gcc.dg/nested-func-1.c: Likewise.
>   * gcc.dg/pr67784-4.c: Likewise.
>   * gcc.dg/Wswitch-unreachable-1.c: New test.
>   * gcc.dg/c99-vla-jump-5.c (f): Add dg-warning.
>   * gcc.dg/switch-warn-1.c (foo1): Likewise.
>   * c-c++-common/goacc/sb-2.c: Likewise.
>   * gcc.dg/gomp/block-10.c: Likewise.
>   * gcc.dg/gomp/block-9.c: Likewise.
>   * gcc.dg/gomp/target-1.c: Likewise.
>   * gcc.dg/gomp/target-2.c: Likewise.
>   * gcc.dg/gomp/taskgroup-1.c: Likewise.
>   * gcc.dg/gomp/teams-1.c: Likewise.
> 
> diff --git gcc/c-family/c.opt gcc/c-family/c.opt
> index bdc6ee0..8b6fdbb 100644
> --- gcc/c-family/c.opt
> +++ gcc/c-family/c.opt
> @@ -634,6 +634,11 @@ Wswitch-bool
>  C ObjC C++ ObjC++ Var(warn_switch_bool) Warning Init(1)
>  Warn about switches with boolean controlling expression.
>  
> +Wswitch-unreachable
> +C ObjC C++ ObjC++ Var(warn_switch_unreachable) Warning Init(1)
> +Warn about statements between switch's controlling expression and the first
> +case.
> +
>  Wtemplates
>  C++ ObjC++ Var(warn_templates) Warning
>  Warn on primary template declaration.
> diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> index 6523c08..bdf8e8e 100644
> --- gcc/c/c-parser.c
> +++ gcc/c/c-parser.c
> @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-indentation.h"
>  #include "gimple-expr.h"
>  #include "context.h"
> +#include "tree-iterator.h"
>  
>  /* We need to walk over decls with incomplete struct/union/enum types
> after parsing the whole translation unit.
> @@ -5605,7 +5606,30 @@ c_parser_switch_statement (c_parser *parser, bool 
> *if_p)
>c_start_case (switch_loc, switch_cond_loc, expr, explicit_cast_p);
>save_break = c_break_label;
>c_break_label = NULL_TREE;
> +  location_t first_loc = (c_parser_next_token_is (parser, CPP_OPEN_BRACE)
> +   ? c_parser_peek_2nd_token (parser)->location
> +   : c_parser_peek_token (parser)->location);
>body = c_parser_c99_block_statement (parser, if_p);
> +  tree first = expr_first (TREE_CODE (body) == BIND_EXPR
> +? BIND_EXPR_BODY (body) : body);
> +  /* Statements between `switch' and the first case are never executed.  */
> +  if (warn_switch_unreachable
> +  && first != NULL_TREE
> +  && TREE_CODE (first) != CASE_LABEL_EXPR
> +  && TREE_CODE (first) != LABEL_EXPR)
> +{
> +  if (TREE_CODE (first) == DECL_EXPR
> +   && DECL_INITIAL (DECL_EXPR_DECL (first)) == NULL_TREE)
> + /* Don't warn for a declaration, but warn for an initialization.  */;
> +  else if (TREE_CODE (first) == GOTO_EXPR
> +&& TREE_CODE (GOTO_DESTINATION (first)) == LABEL_DECL
> +&& DECL_ARTIFICIAL (GOTO_DESTINATION (first)))
> + /* Don't warn for compiler-generated gotos.  These occur in Duff's
> +devices, for example.  */;
> +  else
> + warning_at (first_loc, OPT_Wswitch_unreachable,
> + "statement will never be executed");
> +}
>c_finish_case (body, ce.original_type);
>if (c_break_label)
>  {
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index a54a0af..8f9c186 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -297,7 +297,8 @@ Objective-C and Objective-C++ Dialects}.
>  -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
>  -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol
>  -Wmissing-format-attribute -Wsubobject-linkage @gol
> --Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
> +-Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool @gol
> 

Re: [PATCH], PR 71201, Fix xxperm on PowerPC ISA 3.0, add vpermr/xxpermr support

2016-05-19 Thread Segher Boessenkool
On Thu, May 19, 2016 at 10:53:41AM -0400, Michael Meissner wrote:
> GCC 6.1 added support for the XXPERM instruction for the PowerPC ISA 3.0.  The
> XXPERM instruction is essentially a 4 operand instruction, with only 3 
> operands
> in the instruction (the target register overlaps with the first input
> register).  The Power9 hardware has fusion support where if the instruction
> that precedes the XXPERM is a XXLOR move instruction to set the first input
> argument, it is fused with the XXPERM.  I added code to support this fusion.
> 
> Unfortunately, in running the testsuite on the power9 simulator, we discovered
> that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
> alternatives confused the register allocator and/or the passes after the
> register allocator.  This patch removes the explicit fusion support from
> XXPERM.

Okay.  Please keep the PR open until that problem is fixed.  It also
shouldn't be "target" category, if the problem is RA.

> In addition, ISA 3.0 added XXPERMR and VPERMR instructions for little endian
> support where the permute vector reverses the bytes.  This patch adds support
> for XXPERMR/VPERMR.

Please send that as a separate patch, it has nothing to do with the PR.

> + x = gen_rtx_UNSPEC (mode,
> + gen_rtvec (3, target, reg, 

Trailing space.

> +  if (TARGET_P9_VECTOR)
> +{
> +  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), 

And another.

> +  The VNAND is preferred for future fusion opportunities.  */
> +  notx = gen_rtx_NOT (V16QImode, sel);
> +  iorx = (TARGET_P8_VECTOR
> +   ? gen_rtx_IOR (V16QImode, notx, notx)
> +   : gen_rtx_AND (V16QImode, notx, notx));
> +  emit_insn (gen_rtx_SET (norreg, iorx));
> +  

Some more.

> +/* { dg-final { scan-assembler  "vpermr\|xxpermr" } } */

Tab in the middle of the line.


Segher


Re: [PATCH] Assuage ICE in VRP with [1, X] + UINT_MAX (PR tree-optimization/71031)

2016-05-19 Thread Richard Biener
On May 19, 2016 5:14:24 PM GMT+02:00, Marek Polacek  wrote:
>On Thu, May 19, 2016 at 03:54:05PM +0200, Richard Biener wrote:
>> On Thu, 19 May 2016, Marek Polacek wrote:
>> 
>> > Since Bin's changes to the niter analysis in r231097, we find
>ourselves in
>> > a situation where extract_range_from_binary_expr is given [1, od_5]
>+ UINT_MAX
>> > on type unsigned.  We combine the lower bounds, which is 1 +
>UINT_MAX = 0(OVF).
>> > We then combine the upper bounds, because the max_op0 is not a
>constant, the
>> > result of that is UINT_MAX.  That results in min overflow -- and an
>assert is
>> > unhappy about that.  As suggested in the PR, a safe thing would be
>to change
>> > the assert to dropping to varying.
>> > 
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?
>> 
>> Ok if you leave the assert in (there is at least one unhandled case,
>> min_ovf == 1 && max_ovf == -1).
>> 
>> Alternatively make the code match the comment and drop the
>> == 0 checks in your added condition.  Which would suggest
>> to make the else { an else if (asserted condition) and common
>> the varying case to else { }.
>
>Oh, this is indeed better.  So like this?

Yes.

Thanks,
Richard.

>Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?
>
>2016-05-19  Marek Polacek  
>
>   PR tree-optimization/71031
>   * tree-vrp.c (extract_range_from_binary_expr_1): Turn assert into a
>   condition and adjust the code a bit.
>
>   * gcc.dg/tree-ssa/vrp100.c: New test.
>
>diff --git gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
>gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
>index e69de29..c0fe4b5 100644
>--- gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
>+++ gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
>@@ -0,0 +1,32 @@
>+/* PR tree-optimization/71031 */
>+/* { dg-do compile } */
>+/* { dg-options "-Os" } */
>+
>+int zj;
>+int **yr;
>+
>+void
>+nn (void)
>+{
>+  unsigned int od = 4;
>+
>+  for (;;)
>+{
>+  int lk;
>+
>+  for (lk = 0; lk < 2; ++lk)
>+{
>+  static int cm;
>+
>+  zj = 0;
>+  if (od == 0)
>+return;
>+  ++od;
>+  for (cm = 0; cm < 2; ++cm)
>+{
>+  --od;
>+  **yr = 0;
>+}
>+}
>+}
>+}
>diff --git gcc/tree-vrp.c gcc/tree-vrp.c
>index 69e6248..92d889c 100644
>--- gcc/tree-vrp.c
>+++ gcc/tree-vrp.c
>@@ -2519,20 +2519,13 @@ extract_range_from_binary_expr_1 (value_range
>*vr,
> min = wide_int_to_tree (expr_type, tmin);
> max = wide_int_to_tree (expr_type, tmax);
>   }
>-else if (min_ovf == -1 && max_ovf == 1)
>-  {
>-/* Underflow and overflow, drop to VR_VARYING.  */
>-set_value_range_to_varying (vr);
>-return;
>-  }
>-else
>+else if ((min_ovf == -1 && max_ovf == 0)
>+ || (max_ovf == 1 && min_ovf == 0))
>   {
> /* Min underflow or max overflow.  The range kind
>changes to VR_ANTI_RANGE.  */
> bool covers = false;
> wide_int tem = tmin;
>-gcc_assert ((min_ovf == -1 && max_ovf == 0)
>-|| (max_ovf == 1 && min_ovf == 0));
> type = VR_ANTI_RANGE;
> tmin = tmax + 1;
> if (wi::cmp (tmin, tmax, sgn) < 0)
>@@ -2551,6 +2544,12 @@ extract_range_from_binary_expr_1 (value_range
>*vr,
> min = wide_int_to_tree (expr_type, tmin);
> max = wide_int_to_tree (expr_type, tmax);
>   }
>+else
>+  {
>+/* Other underflow and/or overflow, drop to VR_VARYING.  */
>+set_value_range_to_varying (vr);
>+return;
>+  }
>   }
> else
>   {
>
>   Marek




[PATCH][wwwdocs] Improve arm and aarch64-related info in readings.html

2016-05-19 Thread Kyrill Tkachov

Hi all,

I noticed that we have a readings.html page that has pointers to documentation 
of various backends that GCC supports.
The info on arm seems a bit out of date and somewhat confusing, and there is no 
entry for aarch64.
This patch tries to address that.

The arm entry is updated to not mention armv2(?) and thumb and an aarch64 entry 
is added with
a link to the ARM documentation.

Ok to commit?

Thanks,
Kyrill
? readings.html~
Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.242
diff -U 3 -r1.242 readings.html
--- readings.html	14 Nov 2015 23:40:21 -	1.242
+++ readings.html	15 Feb 2016 13:29:03 -
@@ -59,6 +59,11 @@
 
 
 
+ AArch64
+  http://infocenter.arm.com/help/index.jsp;>
+	ARM Documentation
+ 
+
  alpha
Manufacturer: Compaq (DEC)
http://www.tru64unix.compaq.com/docs/base_doc/DOCUMENTATION/V51A_HTML/ARH9MBTE/TITLE.HTM;>Calling
@@ -81,12 +86,12 @@
   http://www.synopsys.com/IP/PROCESSORIP/ARCPROCESSORS/Pages/default.aspx;>ARC Documentation
  
 
- arm (armv2, thumb)
-  Manufacturer: Various, by license from ARM
+ ARM
+  Manufacturer: Various, by license from ARM.
   CPUs include: ARM7 and ARM7T series (eg. ARM7TDMI), ARM9 and StrongARM
   http://infocenter.arm.com/help/index.jsp;>ARM Documentation
  
- 
+
  AVR
   Manufacturer: Atmel
   http://www.atmel.com/products/microcontrollers/avr/;>AVR Documentation


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-19 Thread Eric Botcazou
> Index: trunk/gcc/tree.def
> ===
> *** trunk.orig/gcc/tree.def   2016-05-17 17:19:41.783958489 +0200
> --- trunk/gcc/tree.def2016-05-19 10:23:35.779141973 +0200
> *** DEFTREECODE (ADDR_EXPR, "addr_expr", tcc
> *** 852,857 
> --- 852,871 
>  descriptor of type ptr_mode.  */
>   DEFTREECODE (FDESC_EXPR, "fdesc_expr", tcc_expression, 2)
> 
> + /* Given a word, a value and a bit position within the word,
> +produce the value that results if replacing the parts of word
> +starting at the bit position with value.
> +Operand 0 is a tree for the word of integral or vector type;
> +Operand 1 is a tree for the value of integral or vector element type;
> +Operand 2 is a tree giving the constant position of the first
> referenced bit; +The number of bits replaced is given by the precision
> of the value +type if that is integral or by its size if it is
> non-integral. +???  The reason to make the size of the replacement
> implicit is to not +have a quaternary operation.
> +The replaced bits shall be fully inside the word.  If the word is of
> +vector type the replaced bits shall be aligned with its elements.  */
> + DEFTREECODE (BIT_INSERT_EXPR, "bit_field_insert", tcc_expression, 3)
> +

"word" is ambiguous (what is a word of vector type?).  What's allowed as 
operand #0 exactly?  If that's anything, I'd call it a value too, possibly 
with a qualifier, for example:

 /* Given a container value, a replacement value and a bit position within
the container, produce the value that results from replacing the part of
the container starting at the bit position with the replacement value.
Operand 0 is a tree for the container value of integral or vector type;
Operand 1 is a tree for the replacement value of another integral or
vector element type;
Operand 2 is a tree giving the constant bit position;
The number of bits replaced is given by the precision of the type of the
replacement value if it is integral or by its size if it is non-integral.
???  The reason to make the size of the replacement implicit is to avoid
introducing a quaternary operation.
The replaced bits shall be fully inside the container.  If the container
is of vector type, then these bits shall be aligned with its elements.  */
DEFTREECODE (BIT_INSERT_EXPR, "bit_field_insert", tcc_expression, 3)

-- 
Eric Botcazou


Re: [PATCH] Assuage ICE in VRP with [1, X] + UINT_MAX (PR tree-optimization/71031)

2016-05-19 Thread Marek Polacek
On Thu, May 19, 2016 at 03:54:05PM +0200, Richard Biener wrote:
> On Thu, 19 May 2016, Marek Polacek wrote:
> 
> > Since Bin's changes to the niter analysis in r231097, we find ourselves in
> > a situation where extract_range_from_binary_expr is given [1, od_5] + 
> > UINT_MAX
> > on type unsigned.  We combine the lower bounds, which is 1 + UINT_MAX = 
> > 0(OVF).
> > We then combine the upper bounds, because the max_op0 is not a constant, the
> > result of that is UINT_MAX.  That results in min overflow -- and an assert 
> > is
> > unhappy about that.  As suggested in the PR, a safe thing would be to change
> > the assert to dropping to varying.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?
> 
> Ok if you leave the assert in (there is at least one unhandled case,
> min_ovf == 1 && max_ovf == -1).
> 
> Alternatively make the code match the comment and drop the
> == 0 checks in your added condition.  Which would suggest
> to make the else { an else if (asserted condition) and common
> the varying case to else { }.

Oh, this is indeed better.  So like this?

Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?

2016-05-19  Marek Polacek  

PR tree-optimization/71031
* tree-vrp.c (extract_range_from_binary_expr_1): Turn assert into a
condition and adjust the code a bit.

* gcc.dg/tree-ssa/vrp100.c: New test.

diff --git gcc/testsuite/gcc.dg/tree-ssa/vrp100.c 
gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
index e69de29..c0fe4b5 100644
--- gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
@@ -0,0 +1,32 @@
+/* PR tree-optimization/71031 */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+int zj;
+int **yr;
+
+void
+nn (void)
+{
+  unsigned int od = 4;
+
+  for (;;)
+{
+  int lk;
+
+  for (lk = 0; lk < 2; ++lk)
+{
+  static int cm;
+
+  zj = 0;
+  if (od == 0)
+return;
+  ++od;
+  for (cm = 0; cm < 2; ++cm)
+{
+  --od;
+  **yr = 0;
+}
+}
+}
+}
diff --git gcc/tree-vrp.c gcc/tree-vrp.c
index 69e6248..92d889c 100644
--- gcc/tree-vrp.c
+++ gcc/tree-vrp.c
@@ -2519,20 +2519,13 @@ extract_range_from_binary_expr_1 (value_range *vr,
  min = wide_int_to_tree (expr_type, tmin);
  max = wide_int_to_tree (expr_type, tmax);
}
- else if (min_ovf == -1 && max_ovf == 1)
-   {
- /* Underflow and overflow, drop to VR_VARYING.  */
- set_value_range_to_varying (vr);
- return;
-   }
- else
+ else if ((min_ovf == -1 && max_ovf == 0)
+  || (max_ovf == 1 && min_ovf == 0))
{
  /* Min underflow or max overflow.  The range kind
 changes to VR_ANTI_RANGE.  */
  bool covers = false;
  wide_int tem = tmin;
- gcc_assert ((min_ovf == -1 && max_ovf == 0)
- || (max_ovf == 1 && min_ovf == 0));
  type = VR_ANTI_RANGE;
  tmin = tmax + 1;
  if (wi::cmp (tmin, tmax, sgn) < 0)
@@ -2551,6 +2544,12 @@ extract_range_from_binary_expr_1 (value_range *vr,
  min = wide_int_to_tree (expr_type, tmin);
  max = wide_int_to_tree (expr_type, tmax);
}
+ else
+   {
+ /* Other underflow and/or overflow, drop to VR_VARYING.  */
+ set_value_range_to_varying (vr);
+ return;
+   }
}
  else
{

Marek


[PATCH, libstdc++] Add missing atomic-builtins argument to experimental/memory_resource/1.cc

2016-05-19 Thread Thomas Preudhomme
Hi Jonathan,

The dg-require-atomic-builtins in experimental/memory_resource/1.cc does not 
currently work as intended because it is missing its argument. This patch fixes 
that.

ChangeLog entry is as follows:

*** libstdc++-v3/ChangeLog ***

2016-05-18  Thomas Preud'homme  

* testsuite/experimental/memory_resource/1.cc: Add required argument
to dg-require-atomic-builtins.


diff --git a/libstdc++-v3/testsuite/experimental/memory_resource/1.cc 
b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
index 
22d4e0d966f9c83bb1341137d62c834b441c08f3..08c02e5e31b287cb678c12f499a985899e612748
 
100644
--- a/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
+++ b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
@@ -1,5 +1,5 @@
 // { dg-options "-std=gnu++14" }
-// { dg-require-atomic-builtins }
+// { dg-require-atomic-builtins "" }
 
 // Copyright (C) 2015-2016 Free Software Foundation, Inc.
 //


Is this ok for trunk?

Best regards,

Thomas


Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-19 Thread Matthew Wahab

On 18/05/16 16:20, Joseph Myers wrote:

On Wed, 18 May 2016, Matthew Wahab wrote:


AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like
flush-to-zero that could affect the outcome of a calculation.


The result of a float computation on two values immediately promoted from
fp16 cannot be within the subnormal range for float.  Thus, only one flush
to zero can happen, on the final conversion back to fp16, and that cannot
make the result different from doing direct arithmetic in fp16 (assuming
flush to zero affects conversion from float to fp16 the same way it
affects direct fp16 arithmetic).


[..]


In short: instructions for direct HFmode arithmetic should be described
with patterns with the standard names.  It's the job of the
architecture-independent compiler to ensure that fp16 arithmetic in the
user's source code only generates direct fp16 arithmetic in GIMPLE (and
thus ends up using those patterns) if that is a correct representation of
the source code's semantics according to ACLE.

The intrinsics you provide can then be written to use direct arithmetic,
and rely on convert_to_real_1 eliminating the promotions, rather than
needing built-in functions at all, just like many arm_neon.h intrinsics
make direct use of GNU C vector arithmetic.


I think it's clear that this has exhausted my knowledge of FP semantics.

Forcing promotion to single-precision was to settle concerns brought up in internal 
discussions about __fp16 semantics. I'll see if anybody has any problem with the 
changes you suggest.


Thanks,
Matthew



[PATCH], PR 71201, Fix xxperm on PowerPC ISA 3.0, add vpermr/xxpermr support

2016-05-19 Thread Michael Meissner
GCC 6.1 added support for the XXPERM instruction for the PowerPC ISA 3.0.  The
XXPERM instruction is essentially a 4 operand instruction, with only 3 operands
in the instruction (the target register overlaps with the first input
register).  The Power9 hardware has fusion support where if the instruction
that precedes the XXPERM is a XXLOR move instruction to set the first input
argument, it is fused with the XXPERM.  I added code to support this fusion.

Unfortunately, in running the testsuite on the power9 simulator, we discovered
that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
alternatives confused the register allocator and/or the passes after the
register allocator.  This patch removes the explicit fusion support from
XXPERM.

In addition, ISA 3.0 added XXPERMR and VPERMR instructions for little endian
support where the permute vector reverses the bytes.  This patch adds support
for XXPERMR/VPERMR.

[gcc]
2016-05-19  Michael Meissner  
Kelvin Nilsen  

PR target/71201
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate
vpermr/xxpermr on ISA 3.0.
(altivec_expand_vec_perm_le): Likewise.
* config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec.
(altivec_vperm__internal): Drop ISA 3.0 xxperm fusion
alternative.
(altivec_vperm_v8hiv16qi): Likewise.
(altivec_vperm__uns_internal): Likewise.
(vperm_v8hiv4si): Likewise.
(vperm_v16qiv8hi): Likewise.
(altivec_vpermr__internal): Add VPERMR/XXPERMR support for
ISA 3.0.

[gcc/testsuite]
2016-05-19  Michael Meissner  
Kelvin Nilsen  

* gcc.target/powerpc/p9-permute.c: Run test on big endian as well
as little endian.
* gcc.target/powerpc/p9-vpermr.c: New test for ISA 3.0 vpermr
support.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236423)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
@@ -6858,19 +6858,27 @@ rs6000_expand_vector_set (rtx target, rt
UNSPEC_VPERM);
   else 
 {
-  /* Invert selector.  We prefer to generate VNAND on P8 so
- that future fusion opportunities can kick in, but must
- generate VNOR elsewhere.  */
-  rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
-  rtx iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
-  rtx tmp = gen_reg_rtx (V16QImode);
-  emit_insn (gen_rtx_SET (tmp, iorx));
-
-  /* Permute with operands reversed and adjusted selector.  */
-  x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
- UNSPEC_VPERM);
+  if (TARGET_P9_VECTOR)
+   x = gen_rtx_UNSPEC (mode,
+   gen_rtvec (3, target, reg, 
+  force_reg (V16QImode, x)),
+   UNSPEC_VPERMR);
+  else
+   {
+ /* Invert selector.  We prefer to generate VNAND on P8 so
+that future fusion opportunities can kick in, but must
+generate VNOR elsewhere.  */
+ rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x));
+ rtx iorx = (TARGET_P8_VECTOR
+ ? gen_rtx_IOR (V16QImode, notx, notx)
+ : gen_rtx_AND (V16QImode, notx, notx));
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_insn (gen_rtx_SET (tmp, iorx));
+ 
+ /* Permute with operands reversed and adjusted selector.  */
+ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+ UNSPEC_VPERM);
+   }
 }
 
   emit_insn (gen_rtx_SET (target, x));
@@ -34238,17 +34246,25 @@ altivec_expand_vec_perm_le (rtx operands
   if (!REG_P (target))
 tmp = gen_reg_rtx (mode);
 
-  /* Invert the selector with a VNAND if available, else a VNOR.
- The VNAND is preferred for future fusion opportunities.  */
-  notx = gen_rtx_NOT (V16QImode, sel);
-  iorx = (TARGET_P8_VECTOR
- ? gen_rtx_IOR (V16QImode, notx, notx)
- : gen_rtx_AND (V16QImode, notx, notx));
-  emit_insn (gen_rtx_SET (norreg, iorx));
-
-  /* Permute with operands reversed and adjusted selector.  */
-  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op1, op0, norreg),
-  UNSPEC_VPERM);
+  if (TARGET_P9_VECTOR)
+{
+  unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), 
+  UNSPEC_VPERMR);
+}
+  else
+{
+  /* 

Re: [PATCH][CilkPlus] Allow parenthesized initialization in for-loops

2016-05-19 Thread Jason Merrill

On 05/10/2016 03:28 PM, Ilya Verbin wrote:

What about (some_class i { 0 }; some_class < ...; some_class++)
and similar syntax?


It's allowed, thanks, I missed this in the initial patch.


The testsuite coverage is insufficient (nothing e.g.
tests templates or #pragma simd).


Patch is updated.  Is it sufficient now?



- if (!CLASS_TYPE_P (TREE_TYPE (decl))
- && !type_dependent_expression_p (decl))
+ if (!is_class && !type_dependent_expression_p (decl))
goto non_class;
}
-   
+
  cp_finish_decl (decl, init, !is_non_constant_init,
  asm_specification,
  LOOKUP_ONLYCONVERTING);
  orig_init = init;
- if (CLASS_TYPE_P (TREE_TYPE (decl)))
+ if (is_class)


This change is wrong; do_auto_deduction will have changed TREE_TYPE 
(decl), so it could be a class now.



+ else if (is_cilk && next_is_op_paren)
+   {
+ cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN);
+ init = cp_parser_assignment_expression (parser);
+ cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
+ goto non_class;
+   }
+ else if (is_cilk && next_is_eq)
+   {
+ bool braced = false;
+ cp_parser_require (parser, CPP_EQ, RT_EQ);
+ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+   {
+ braced = true;
+ cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
+   }
+ init = cp_parser_assignment_expression (parser);
+ if (braced)
+   cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+ goto non_class;
+   }
+ else if (is_cilk && next_is_op_brace)
+   {
+ cp_lexer_set_source_position (parser->lexer);
+ maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
+ cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
+ init = cp_parser_assignment_expression (parser);
+ cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+ goto non_class;
+   }


Why not use cp_parser_initializer for scalars?

Jason



Re: [PR 70646] Store size to inlining predicate conditions

2016-05-19 Thread Martin Jambor
Hi,

On Wed, May 18, 2016 at 12:19:11PM +0200, jh wrote:
> Dne 2016-05-11 17:45, Martin Jambor napsal:
> > Hi,
> > 
> > 
> > 2016-04-20  Martin Jambor  
> > 
> > PR ipa/70646
> > * ipa-inline.h (condition): New field size.
> > * ipa-inline-analysis.c (add_condition): New parameter SIZE, use it
> > for comaprison and store it into the new condition.
> > (evaluate_conditions_for_known_args): Use condition size to check
> > access sizes for all but CHANGED conditions.
> > (unmodified_parm_1): New parameter size_p, store access size into it.
> > (unmodified_parm): Likewise.
> > (unmodified_parm_or_parm_agg_item): Likewise.
> > (eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p.
> > (set_cond_stmt_execution_predicate): Extract access sizes and store
> > them to conditions.
> > (set_switch_stmt_execution_predicate): Likewise.
> > (will_be_nonconstant_expr_predicate): Likewise.
> > (will_be_nonconstant_predicate): Likewise.
> > (inline_read_section): Stream condition size.
> > (inline_write_summary): Likewise.
> 
> This is OK for mainline and branches week later. You will need to bump up
> the LTO stream
> revision. Just one question:

Thanks, I have committed the patch to trunk and added hunks bumping
the minor LTO versions to the patches for release versions.

> 
> Thanks,
> Honza
> > 
> > -  if (operand_equal_p (TYPE_SIZE (TREE_TYPE (c->val)),
> > -  TYPE_SIZE (TREE_TYPE (val)), 0))
> > +  if (tree_to_shwi (TYPE_SIZE (TREE_TYPE (val))) != c->size)
> 
> Will it work for variable sized types and/or types whose size will not fit
> in SHWI? Can these happen here or they are ruled out earlier in alaysis?
> 

At this point, val needs to be na ipa_invariant, which implies fixed
and small size.

The patch I posted applies as-is to the gcc-6 branch but I had to do
some manual backporting for the gcc-5 and gcc-4_9 branches.  In the
latter, I had to change the prototype of ipa_load_from_parm_agg to
also return size of the load.

For the reference, the 4.9 patch is below.  I have bootstrapped and
tested the different alternatives for all branches and will start
committing them now.

Thanks a lot,

Martin


2016-05-18  Martin Jambor  

PR ipa/70646
* ipa-inline.h (condition): New field size.
* ipa-inline-analysis.c (add_condition): New parameter SIZE, use it
for comaprison and store it into the new condition.
(evaluate_conditions_for_known_args): Use condition size to check
access sizes for all but CHANGED conditions.
(unmodified_parm_1): New parameter size_p, store access size into it.
(unmodified_parm): Likewise.
(unmodified_parm_or_parm_agg_item): Likewise.
(eliminated_by_inlining_prob): Pass NULL to unmodified_parm as size_p.
(set_cond_stmt_execution_predicate): Extract access sizes and store
them to conditions.
(set_switch_stmt_execution_predicate): Likewise.
(will_be_nonconstant_expr_predicate): Likewise.
(will_be_nonconstant_predicate): Likewise.
(inline_read_section): Stream condition size.
(inline_write_summary): Likewise.
* lto-streamer.h (LTO_minor_version): Bump.
* ipa-prop.c (ipa_load_from_parm_agg): Added size_p parameter, pass it
to ipa_load_from_parm_agg_1.
* ipa-prop.h (ipa_load_from_parm_agg): Update declaration.

testsuite/
* gcc.dg/ipa/pr70646.c: New test.
---
 gcc/ipa-inline-analysis.c  | 125 ++---
 gcc/ipa-inline.h   |   2 +
 gcc/ipa-prop.c |   4 +-
 gcc/ipa-prop.h |   2 +-
 gcc/lto-streamer.h |   2 +-
 gcc/testsuite/gcc.dg/ipa/pr70646.c |  40 
 6 files changed, 122 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr70646.c

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 9e71f43..47a3810 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -233,13 +233,14 @@ struct agg_position_info
   bool by_ref;
 };
 
-/* Add condition to condition list CONDS.  AGGPOS describes whether the used
-   oprand is loaded from an aggregate and where in the aggregate it is.  It can
-   be NULL, which means this not a load from an aggregate.  */
+/* Add condition to condition list SUMMARY. OPERAND_NUM, SIZE, CODE and VAL
+   correspond to fields of condition structure.  AGGPOS describes whether the
+   used operand is loaded from an aggregate and where in the aggregate it is.
+   It can be NULL, which means this not a load from an aggregate.  */
 
 static struct predicate
 add_condition (struct inline_summary *summary, int operand_num,
-  struct agg_position_info *aggpos,
+  HOST_WIDE_INT size, struct agg_position_info *aggpos,
   enum tree_code code, tree val)
 {
   int i;

[PATCH] Speed up if-cvt

2016-05-19 Thread Richard Biener

This avoids re-computing BB predicates.  The more interesting job
will be to avoid re-computing post dominators (for the whole function)
all the time.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-05-19  Richard Biener  

* tree-if-conv.c (add_bb_predicate_gimplified_stmts): Use
gimple_seq_add_seq_without_update.
(release_bb_predicate): Assert we have no operands to free.
(if_convertible_loop_p_1): Calculate post dominators later.
Do not free BB predicates here.
(combine_blocks): Do not recompute BB predicates.
(version_loop_for_if_conversion): Save BB predicates around
loop versioning.

* gcc.dg/tree-ssa/ifc-cd.c: Adjust.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 236441)
--- gcc/tree-if-conv.c  (working copy)
*** set_bb_predicate_gimplified_stmts (basic
*** 257,263 
  static inline void
  add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
  {
!   gimple_seq_add_seq
  (&(((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts), stmts);
  }
  
--- 257,263 
  static inline void
  add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
  {
!   gimple_seq_add_seq_without_update
  (&(((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts), stmts);
  }
  
*** release_bb_predicate (basic_block bb)
*** 280,289 
gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
if (stmts)
  {
!   gimple_stmt_iterator i;
  
-   for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next ())
-   free_stmt_operands (cfun, gsi_stmt (i));
set_bb_predicate_gimplified_stmts (bb, NULL);
  }
  }
--- 280,290 
gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
if (stmts)
  {
!   if (flag_checking)
!   for (gimple_stmt_iterator i = gsi_start (stmts);
!!gsi_end_p (i); gsi_next ())
! gcc_assert (! gimple_use_ops (gsi_stmt (i)));
  
set_bb_predicate_gimplified_stmts (bb, NULL);
  }
  }
*** if_convertible_loop_p_1 (struct loop *lo
*** 1322,1328 
  return false;
  
calculate_dominance_info (CDI_DOMINATORS);
-   calculate_dominance_info (CDI_POST_DOMINATORS);
  
/* Allow statements that can be handled during if-conversion.  */
ifc_bbs = get_loop_body_in_if_conv_order (loop);
--- 1323,1328 
*** if_convertible_loop_p_1 (struct loop *lo
*** 1370,1375 
--- 1370,1376 
  = new hash_map;
baseref_DR_map = new hash_map;
  
+   calculate_dominance_info (CDI_POST_DOMINATORS);
predicate_bbs (loop);
  
for (i = 0; refs->iterate (i, ); i++)
*** if_convertible_loop_p_1 (struct loop *lo
*** 1421,1429 
return false;
  }
  
-   for (i = 0; i < loop->num_nodes; i++)
- free_bb_predicate (ifc_bbs[i]);
- 
/* Checking PHIs needs to be done after stmts, as the fact whether there
   are any masked loads or stores affects the tests.  */
for (i = 0; i < loop->num_nodes; i++)
--- 1422,1427 
*** combine_blocks (struct loop *loop)
*** 2298,2304 
edge e;
edge_iterator ei;
  
-   predicate_bbs (loop);
remove_conditions_and_labels (loop);
insert_gimplified_predicates (loop);
predicate_all_scalar_phis (loop);
--- 2296,2301 
*** version_loop_for_if_conversion (struct l
*** 2428,2440 
--- 2425,2447 
  integer_zero_node);
gimple_call_set_lhs (g, cond);
  
+   /* Save BB->aux around loop_version as that uses the same field.  */
+   void **saved_preds = XALLOCAVEC (void *, loop->num_nodes);
+   for (unsigned i = 0; i < loop->num_nodes; i++)
+ saved_preds[i] = ifc_bbs[i]->aux;
+ 
initialize_original_copy_tables ();
new_loop = loop_version (loop, cond, _bb,
   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
   REG_BR_PROB_BASE, true);
free_original_copy_tables ();
+ 
+   for (unsigned i = 0; i < loop->num_nodes; i++)
+ ifc_bbs[i]->aux = saved_preds[i];
+ 
if (new_loop == NULL)
  return false;
+ 
new_loop->dont_vectorize = true;
new_loop->force_vectorize = false;
gsi = gsi_last_bb (cond_bb);
Index: gcc/testsuite/gcc.dg/tree-ssa/ifc-cd.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ifc-cd.c  (revision 236441)
--- gcc/testsuite/gcc.dg/tree-ssa/ifc-cd.c  (working copy)
*** void foo (int *x1, int *x2, int *x3, int
*** 25,28 
  }
  }
  
! /* { dg-final { scan-tree-dump-times "Use predicate of bb" 8 "ifcvt" } } */
--- 25,28 
  }
  }
  
! /* { dg-final { scan-tree-dump-times "Use predicate of bb" 4 "ifcvt" } } */


Re: [C++ Patch/RFC] PR 70572 ("[4.9/5/6/7 Regression] ICE on code with decltype (auto) on x86_64-linux-gnu in digest_init_r")

2016-05-19 Thread Jason Merrill

On 05/18/2016 07:13 PM, Paolo Carlini wrote:

+ error ("cannot declare variable %q+D with function type", decl);


I think the error message would be more helpful if it mentioned 
decltype(auto), maybe


"initializer for % has function type, did you forget 
the %<()%>?", DECL_NAME (decl)


(or some other way to print the variable type as declared rather than as 
deduced).


Jason



Re: [PATCH] Assuage ICE in VRP with [1, X] + UINT_MAX (PR tree-optimization/71031)

2016-05-19 Thread Richard Biener
On Thu, 19 May 2016, Marek Polacek wrote:

> Since Bin's changes to the niter analysis in r231097, we find ourselves in
> a situation where extract_range_from_binary_expr is given [1, od_5] + UINT_MAX
> on type unsigned.  We combine the lower bounds, which is 1 + UINT_MAX = 
> 0(OVF).
> We then combine the upper bounds, because the max_op0 is not a constant, the
> result of that is UINT_MAX.  That results in min overflow -- and an assert is
> unhappy about that.  As suggested in the PR, a safe thing would be to change
> the assert to dropping to varying.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?

Ok if you leave the assert in (there is at least one unhandled case,
min_ovf == 1 && max_ovf == -1).

Alternatively make the code match the comment and drop the
== 0 checks in your added condition.  Which would suggest
to make the else { an else if (asserted condition) and common
the varying case to else { }.

Thanks,
Richard.

> 2016-05-19  Marek Polacek  
> 
>   PR tree-optimization/71031
>   * tree-vrp.c (extract_range_from_binary_expr_1): Drop to varying for
>   min overflow or max underflow and remove an assert.
> 
>   * gcc.dg/tree-ssa/vrp100.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/tree-ssa/vrp100.c 
> gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
> index e69de29..c0fe4b5 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/71031 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +int zj;
> +int **yr;
> +
> +void
> +nn (void)
> +{
> +  unsigned int od = 4;
> +
> +  for (;;)
> +{
> +  int lk;
> +
> +  for (lk = 0; lk < 2; ++lk)
> +{
> +  static int cm;
> +
> +  zj = 0;
> +  if (od == 0)
> +return;
> +  ++od;
> +  for (cm = 0; cm < 2; ++cm)
> +{
> +  --od;
> +  **yr = 0;
> +}
> +}
> +}
> +}
> diff --git gcc/tree-vrp.c gcc/tree-vrp.c
> index 69e6248..791d738 100644
> --- gcc/tree-vrp.c
> +++ gcc/tree-vrp.c
> @@ -2525,14 +2525,19 @@ extract_range_from_binary_expr_1 (value_range *vr,
> set_value_range_to_varying (vr);
> return;
>   }
> +   else if ((min_ovf == 1 && max_ovf == 0)
> +|| (min_ovf == 0 && max_ovf == -1))
> + {
> +   /* Min overflow or max underflow, drop to VR_VARYING.  */
> +   set_value_range_to_varying (vr);
> +   return;
> + }
> else
>   {
> /* Min underflow or max overflow.  The range kind
>changes to VR_ANTI_RANGE.  */
> bool covers = false;
> wide_int tem = tmin;
> -   gcc_assert ((min_ovf == -1 && max_ovf == 0)
> -   || (max_ovf == 1 && min_ovf == 0));
> type = VR_ANTI_RANGE;
> tmin = tmax + 1;
> if (wi::cmp (tmin, tmax, sgn) < 0)
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][ARM] Fix costing of sign-extending load in rtx costs

2016-05-19 Thread Ramana Radhakrishnan
On 27/04/16 15:13, Kyrill Tkachov wrote:
> Hi all,
> 
> Another costs issue that came out of the investigation for PR 65932 is that
> sign-extending loads get a higher cost than they should in the arm backend.
> The problem is that when handling a sign-extend of a MEM we add the cost
> of the load_sign_extend cost field and then recursively add the cost of the 
> inner MEM
> rtx, which is bogus. This will end up adding an extra load cost on it.
> 
> The solution in this patch is to just remove that recursive step.
> With this patch from various CSE dumps I see much more sane costs assign to 
> these
> expressions (such as 12 instead of 32 or higher).
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-04-27  Kyrylo Tkachov  
> 
> * config/arm/arm.c (arm_new_rtx_costs, SIGN_EXTEND case):
> Don't add cost of inner memory when handling sign-extended
> loads.


Ok ... it took me a while to work out that this was sane.


regards
Ramana





[PATCH] Assuage ICE in VRP with [1, X] + UINT_MAX (PR tree-optimization/71031)

2016-05-19 Thread Marek Polacek
Since Bin's changes to the niter analysis in r231097, we find ourselves in
a situation where extract_range_from_binary_expr is given [1, od_5] + UINT_MAX
on type unsigned.  We combine the lower bounds, which is 1 + UINT_MAX = 0(OVF).
We then combine the upper bounds, because the max_op0 is not a constant, the
result of that is UINT_MAX.  That results in min overflow -- and an assert is
unhappy about that.  As suggested in the PR, a safe thing would be to change
the assert to dropping to varying.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?

2016-05-19  Marek Polacek  

PR tree-optimization/71031
* tree-vrp.c (extract_range_from_binary_expr_1): Drop to varying for
min overflow or max underflow and remove an assert.

* gcc.dg/tree-ssa/vrp100.c: New test.

diff --git gcc/testsuite/gcc.dg/tree-ssa/vrp100.c 
gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
index e69de29..c0fe4b5 100644
--- gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp100.c
@@ -0,0 +1,32 @@
+/* PR tree-optimization/71031 */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+int zj;
+int **yr;
+
+void
+nn (void)
+{
+  unsigned int od = 4;
+
+  for (;;)
+{
+  int lk;
+
+  for (lk = 0; lk < 2; ++lk)
+{
+  static int cm;
+
+  zj = 0;
+  if (od == 0)
+return;
+  ++od;
+  for (cm = 0; cm < 2; ++cm)
+{
+  --od;
+  **yr = 0;
+}
+}
+}
+}
diff --git gcc/tree-vrp.c gcc/tree-vrp.c
index 69e6248..791d738 100644
--- gcc/tree-vrp.c
+++ gcc/tree-vrp.c
@@ -2525,14 +2525,19 @@ extract_range_from_binary_expr_1 (value_range *vr,
  set_value_range_to_varying (vr);
  return;
}
+ else if ((min_ovf == 1 && max_ovf == 0)
+  || (min_ovf == 0 && max_ovf == -1))
+   {
+ /* Min overflow or max underflow, drop to VR_VARYING.  */
+ set_value_range_to_varying (vr);
+ return;
+   }
  else
{
  /* Min underflow or max overflow.  The range kind
 changes to VR_ANTI_RANGE.  */
  bool covers = false;
  wide_int tem = tmin;
- gcc_assert ((min_ovf == -1 && max_ovf == 0)
- || (max_ovf == 1 && min_ovf == 0));
  type = VR_ANTI_RANGE;
  tmin = tmax + 1;
  if (wi::cmp (tmin, tmax, sgn) < 0)

Marek


Re: [PATCH] PR c++/71184: Fix NULL dereference in cp_parser_operator

2016-05-19 Thread Jason Merrill

On 05/18/2016 08:59 PM, David Malcolm wrote:

+   cp_token *close_token =
+ cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE);
+   if (close_token)
+ end_loc = close_token->location;


You could combine these into

if (cp_token *close_token
= cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE);
  end_loc = close_token->location;

(also splitting the line before the = rather than after).

OK.

Jason



Re: [PATCH][ARM] PR target/71056: Don't use vectorized builtins when NEON is not available

2016-05-19 Thread Ramana Radhakrishnan
On 11/05/16 15:32, Kyrill Tkachov wrote:
> Hi all,
> 
> In this PR a NEON builtin is introduced during SLP vectorisation even when 
> NEON is not available
> because arm_builtin_vectorized_function is missing an appropriate check in 
> the BSWAP handling code.
> 
> Then during expand when we try to expand the NEON builtin the code in 
> arm_expand_neon_builtin rightly
> throws an error telling the user to enable NEON, even though the testcase 
> doesn't use any intrinsics.
> 
> This patch fixes the bug by bailing out early if !TARGET_NEON. This allows us 
> to remove a redundant
> TARGET_NEON check further down in the function as well.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> Ok for trunk?
> 
> This appears on GCC 6 as well.
> On older branches the test failure doesn't trigger but the logic looks buggy 
> anyway.
> Ok for the branches as well if testing is clean?
> 
> Thanks,
> Kyrill
> 
> 2016-05-11  Kyrylo Tkachov  
> 
> PR target/71056
> * config/arm/arm-builtins.c (arm_builtin_vectorized_function): Return
> NULL_TREE early if NEON is not available.  Remove now redundant check
> in ARM_CHECK_BUILTIN_MODE.
> 
> 2016-05-11  Kyrylo Tkachov  
> 
> PR target/71056
> * gcc.target/arm/pr71056.c: New test.

OK. LGTM - please apply if no regressions and backport onto GCC 6 after the 
auto-testers have let this bake on trunk for a little while.

I'd rather not apply it to the release branches unless we can trigger it there 
but it maybe newer logic in the bswap pass that detects this.


regards
Ramana


Re: [PATCH 0/3] Support for mandatory tail calls

2016-05-19 Thread Jason Merrill
On Thu, May 19, 2016 at 9:28 AM, Richard Biener
 wrote:
> On Thu, May 19, 2016 at 3:19 PM, Jason Merrill  wrote:
>> On Thu, May 19, 2016 at 12:30 AM, Basile Starynkevitch
>>  wrote:
>>> On 05/19/2016 12:12 AM, Jeff Law wrote:

 On 05/17/2016 04:01 PM, David Malcolm wrote:
>
> There have been requests [1] for libgccjit to better support
> functional programming by supporting the contination-passing style,
> in which every function "returns" by calling a "continuation"
> function pointer.
>
> These calls must be guaranteed to be implemented as a jump,
> otherwise the program could consume an arbitrary amount of stack
> space as it executed.
>
> This patch kit implements this.
>
> Patch 1 is a preliminary tweak to calls.c
>
> Patch 2 implements a new flag in tree.h: CALL_EXPR_MUST_TAIL_CALL,
> which makes calls.c try harder to implement a flagged call as a
> tail-call/sibling call, and makes it issue an error if
> the optimization is impossible.  It doesn't implement any
> frontend support for setting the flag (instead using a plugin
> to test it).  We had some discussion on the jit list about possibly
> introducing a new builtin for this, but the patch punts on this
> issue.

 I wonder if we should have an attribute so that the flag can be set for
 C/C++ code.  I've seen requests for forcing tail calls in C/C++ code 
 several
 times in the past, precisely to support continuations.
>>>
>>> Why an attribute? Attributes are on declarations. I think it should better
>>> be some pragma like _Pragma(GCC tail cail, foo(x,y)) or some builtin (or
>>> else some syntax extension like goto return foo(x,y); ...) because what we
>>> really want is to annotate a particular call to be tail-recursive.
>>
>> C++11 attributes can apply to expression-statements as well, e.g.
>>
>> [[gnu::tail_call]] fn();
>>
>> though not to sub-expressions.
>
> That's nice.  Can they apply to things like loops?
>
>  [[gnu::no_unroll]] for (int i=0; i<4; ++i)
>a[i] = 0;

Yes, to any statement.

Jason


Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-19 Thread Jason Merrill

On 05/18/2016 09:40 PM, Martin Sebor wrote:

The handling of flexible array members whose element type was
dependent tried to deal with the case when the element type
was not yet completed but it did it wrong.  The attached patch
corrects the handling by trying to complete the element type
first.


How about changing complete_type to complete the element type even for 
an array of unknown bound?  It seems to me that it would be useful to do 
that to set alignment and 'structor flags even if we can't set TYPE_SIZE.


It would also be useful to have a 'complete type or array of unknown 
bound of complete type' predicate to use here and in layout_var_decl, 
grokdeclarator, and type_with_alias_set_p.


Jason



  1   2   >