date:20160504

[RFC][PR70841] Reassoc fails to handle FP division

2016-05-04 Thread kugan


Hi,

I tried to handle reassoc fails to handle FP division. I think the best 
way to do this is to do like what is done for MINUS_EXPR. I.e, convert 
the RDIV_EXPR to MULT_EXPR by (1/x) early and later in 
opt_rdiv_with_multiply, optimize it.


Here is a patch that passes bootstrap and regression testing on 
x86-64-linux-gnu.


Does this look Ok for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-05-05  Kugan Vivekanandarajah  

PR middle-end/70841
* gcc.dg/tree-ssa/pr70841.c: New test.

gcc/ChangeLog:

2016-05-05  Kugan Vivekanandarajah  

PR middle-end/70841
* tree-ssa-reassoc.c (should_break_up_rdiv): New.
(break_up_rdiv): New
(break_up_subtract_bb): Call should_break_up_rdiv and break_up_rdiv.
(do_reassoc): Rename break_up_subtract_bb to 
break_up_subtract_and_div_bb.
(sort_cmp_int): New.
(opt_rdiv_with_multiply): New.
(reassociate_bb): Call opt_rdiv_with_multiply.
(do_reassoc): Renamed called function break_up_subtract_bb to
break_up_subtract_and_div_bb.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr70841.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr70841.c
index e69de29..0b456aa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr70841.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr70841.c
@@ -0,0 +1,15 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -freciprocal-math -fdump-tree-optimized" } */
+
+float foo (float x, float y)
+{
+return x * y / x;
+}
+
+float foo2 (float x, float y)
+{
+return (y / x) * x ;
+}
+
+/* { dg-final { scan-tree-dump-times "return y_" 2 "optimized" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index d23dabd..29a5422 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4168,6 +4168,40 @@ should_break_up_subtract (gimple *stmt)
   return false;
 }
 
+/* Return true if we should break up the RDIV in STMT into an  MULT_EXPR
+   with reciprocal.  */
+
+static bool
+should_break_up_rdiv (gimple *stmt)
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  tree binlhs = gimple_assign_rhs1 (stmt);
+  tree binrhs = gimple_assign_rhs2 (stmt);
+  gimple *immusestmt;
+
+  if (VECTOR_TYPE_P (TREE_TYPE (lhs)))
+return false;
+
+  if (TREE_CODE (lhs) == SSA_NAME
+  && (immusestmt = get_single_immediate_use (lhs))
+  && is_gimple_assign (immusestmt)
+  && gimple_assign_rhs_code (immusestmt) == MULT_EXPR)
+return true;
+  if (TREE_CODE (binlhs) == SSA_NAME
+  && (immusestmt = SSA_NAME_DEF_STMT (binlhs))
+  && get_single_immediate_use (binlhs)
+  && is_gimple_assign (immusestmt)
+  && gimple_assign_rhs_code (immusestmt) == MULT_EXPR)
+return true;
+  if (TREE_CODE (binrhs) == SSA_NAME
+  && (immusestmt = SSA_NAME_DEF_STMT (binrhs))
+  && get_single_immediate_use (binrhs)
+  && is_gimple_assign (immusestmt)
+  && gimple_assign_rhs_code (immusestmt) == MULT_EXPR)
+return true;
+  return false;
+}
+
 /* Transform STMT from A - B into A + -B.  */
 
 static void
@@ -4187,6 +4221,23 @@ break_up_subtract (gimple *stmt, gimple_stmt_iterator 
*gsip)
   update_stmt (stmt);
 }
 
+/* Transform STMT from A / B into A X (1/B).  */
+static void
+break_up_rdiv (gimple *stmt, gimple_stmt_iterator *gsip)
+{
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree rhs2 = gimple_assign_rhs2 (stmt);
+  tree tmp = make_ssa_name (TREE_TYPE (rhs1));
+  tree one = fold_convert (TREE_TYPE (rhs1),
+  build_int_cst (integer_type_node, 1));
+  gassign *div_stmt = gimple_build_assign (tmp, RDIV_EXPR, one, rhs2);
+  gimple_set_uid (div_stmt, gimple_uid (stmt));
+  gsi_insert_before (gsip, div_stmt, GSI_NEW_STMT);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_assign_set_rhs_with_ops (, MULT_EXPR, rhs1, tmp);
+  update_stmt (stmt);
+}
+
 /* Determine whether STMT is a builtin call that raises an SSA name
to an integer power and has only one use.  If so, and this is early
reassociation and unsafe math optimizations are permitted, place
@@ -4492,7 +4543,7 @@ can_reassociate_p (tree op)
and set UIDs within each basic block.  */
 
 static void
-break_up_subtract_bb (basic_block bb)
+break_up_subtract_and_div_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
@@ -4522,6 +4573,15 @@ break_up_subtract_bb (basic_block bb)
  if (should_break_up_subtract (stmt))
break_up_subtract (stmt, );
}
+  else if (flag_reciprocal_math
+ && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
+   {
+ if (!can_reassociate_p (gimple_assign_rhs1 (stmt))
+ || !can_reassociate_p (gimple_assign_rhs2 (stmt)))
+   continue;
+ if (should_break_up_rdiv (stmt))
+   break_up_rdiv (stmt, );
+   }
   else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR
   && can_reassociate_p (gimple_assign_rhs1 (stmt)))
plus_negates.safe_push (gimple_assign_lhs (stmt));
@@

Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-05-04 Thread kugan


Hi Richard,



maybe instert_stmt_after will help here, I don't think you got the insertion
logic correct, thus insert_stmt_after (mul_stmt, def_stmt) which I think
misses GIMPLE_NOP handling.  At least

+  if (SSA_NAME_VAR (op) != NULL

huh?  I suppose you could have tested SSA_NAME_IS_DEFAULT_DEF
but just the GIMPLE_NOP def-stmt test should be enough.

+ && gimple_code (def_stmt) == GIMPLE_NOP)
+   {
+ gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+ stmt = gsi_stmt (gsi);
+ gsi_insert_before (, mul_stmt, GSI_NEW_STMT);

not sure if that is the best insertion point choice, it un-does all
code-sinking done
(and no further sinking is run after the last reassoc pass).  We do know we
are handling all uses of op in our chain so inserting before the plus-expr
chain root should work here (thus 'stmt' in the caller context).  I'd
use that here instead.
I think I'd use that unconditionally even if it works and not bother
finding something
more optimal.



I now tried using instert_stmt_after with special handling for 
GIMPLE_PHI as you described.




Apart from this this now looks ok to me.

But the testcases need some work


--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
...
+
+/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */

I would have expected 3.


We now have an additional _15 = x_1(D) * 2

  Also please check for \\\* 5 for example

to be more specific (and change the cases so you get different constants
for the different functions).




That said, please make the scans more specific.


I have now changes the test-cases to scan more specific multiplication 
scan as you wanted.



Does this now look better?


Thanks,
Kugan
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
index e69de29..0dcfe32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+
+float f1_float (float x, float z)
+{
+float y = x + z;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+return y;
+}
+
+float f1_float2 (float x)
+{
+float y = x + 3 * x + x;
+return y;
+}
+
+int f1_int (int x)
+{
+int y = x + 4 * x + x;
+return y;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\* 8\\\.0e\\\+0" 1 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\* 5\\\.0e\\\+0" 1 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\* 6" 1 "reassoc1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
index e69de29..470be8c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1" } */
+
+unsigned f1 (unsigned x, unsigned z)
+{
+unsigned y = x + z;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + x;
+return y;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\* 7" 1 "reassoc1" } } */
+
+unsigned f2 (unsigned x, unsigned z)
+{
+unsigned y = x + z;
+y = y + x;
+y = y + x;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+return y;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\* 5" 1 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\* 4" 1 "reassoc1" } } */
+
+unsigned f3 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = x + z;
+y = y + x;
+y = y + z;
+y = y + z;
+y = y + k;
+return y;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\* 2" 1 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\* 3" 1 "reassoc1" } } */
+
+unsigned f4 (unsigned x, unsigned z, unsigned k)
+{
+unsigned y = k + x;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+y = y + z;
+return y;
+}
+/* { dg-final { scan-tree-dump-times "\\\* 8" 1 "reassoc1" } } */
+
+unsigned f5 (unsigned x, unsigned y, unsigned z)
+{
+return x + y + y + y + y + y \
+  + y + z + z + z + z + z + z + z + z + z;
+}
+
+/* { dg-final { scan-tree-dump-times "\\\* 6" 1 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "\\\* 9" 1 "reassoc1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
index 62802d1..16ebc86 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-14.c
@@ -19,6 +19,7 @@ unsigned int test2 (unsigned int x, unsigned int y, unsigned 
int z,
   return tmp1 + tmp2 + tmp3;
 }
 
-/* There should be one multiplication left in test1 and three in test2.  */
+/* There should be two multiplication left in test1 (inculding one generated
+

Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-05-04 Thread kugan


Hi Richard,



+ int last = ops.length () - 1;
+ bool negate_result = false;

Do

   oe  = ops.last ();



Done.



+ if (rhs_code == MULT_EXPR
+ && ops.length () > 1
+ && ((TREE_CODE (ops[last]->op) == INTEGER_CST

and last.op here and below

+  && integer_minus_onep (ops[last]->op))
+ || ((TREE_CODE (ops[last]->op) == REAL_CST)
+ && real_equal (_REAL_CST
(ops[last]->op), 



Done.


Here the checks !HONOR_SNANS () && (!HONOS_SIGNED_ZEROS ||
!COMPLEX_FLOAT_TYPE_P)
are missing.  The * -1 might appear literally and you are only allowed
to turn it into a negate
under the above conditions.


Done.



+   {
+ ops.unordered_remove (last);

use ops.pop ();


Done.


+ negate_result = true;

Please move the whole thing under the else { } case of the ops.length
== 0, ops.length == 1 test chain
as you did for the actual emit of the negate.



I see your point. However, when we remove the (-1) from the ops list, 
that intern can result in ops.length becoming 1. Therefore, I moved the 
 the following  if (negate_result), outside the condition.





+ if (negate_result)
+   {
+ tree tmp = make_ssa_name (TREE_TYPE (lhs));
+ gimple_set_lhs (stmt, tmp);
+ gassign *neg_stmt = gimple_build_assign (lhs, NEGATE_EXPR,
+  tmp);
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ gsi_insert_after (, neg_stmt, GSI_NEW_STMT);
+ update_stmt (stmt);

I think that if powi_result is also built you end up using the wrong
stmt so you miss a

 stmt = SSA_NAME_DEF_STMT (lhs);


Yes, indeed. This can happen and I have added this.



here.  Also see the new_lhs handling of the powi_result case - again
you need sth
similar here (it's handling looks a bit fishy as well - this all needs
some comments
and possibly a (lot more) testcases).

So, please do the above requested changes and verify the 'lhs' issues I pointed
out by trying to add a few more testcase that also cover the case where a powi
is detected in addition to a negation.  Please also add a testcase that catches
(-y) * x * (-z).



Added this to the testcase.

Does this look better now?

Thanks,
Kugan



2016-04-23  Kugan Vivekanandarajah  

 PR middle-end/40921
 * gcc.dg/tree-ssa/pr40921.c: New test.

gcc/ChangeLog:

2016-04-23  Kugan Vivekanandarajah  

 PR middle-end/40921
 * tree-ssa-reassoc.c (try_special_add_to_ops): New.
 (linearize_expr_tree): Call try_special_add_to_ops.
 (reassociate_bb): Convert MULT_EXPR by (-1) to NEGATE_EXPR.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
index e69de29..3a5a23a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr40921.c
@@ -0,0 +1,26 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2  -fdump-tree-optimized -ffast-math" } */
+
+unsigned int foo (unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + (-y * z * z);
+}
+
+float bar (float x, float y, float z)
+{
+  return x + (-y * z * z);
+}
+
+float bar2 (float x, float y, float z)
+{
+  return x + (-y * z * z * 5.0f);
+}
+
+float bar3 (float x, float y, float z)
+{
+  return x + (-y * x * -z);
+}
+
+
+/* { dg-final { scan-tree-dump-times "_* = -y_" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 4e1251b..1df6681 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4112,6 +4112,45 @@ acceptable_pow_call (gimple *stmt, tree *base, 
HOST_WIDE_INT *exponent)
   return true;
 }
 
+/* Try to derive and add operand entry for OP to *OPS.  Return false if
+   unsuccessful.  */
+
+static bool
+try_special_add_to_ops (vec *ops,
+   enum tree_code code,
+   tree op, gimple* def_stmt)
+{
+  tree base = NULL_TREE;
+  HOST_WIDE_INT exponent = 0;
+
+  if (TREE_CODE (op) != SSA_NAME)
+return false;
+
+  if (code == MULT_EXPR
+  && acceptable_pow_call (def_stmt, , ))
+{
+  add_repeat_to_ops_vec (ops, base, exponent);
+  gimple_set_visited (def_stmt, true);
+  return true;
+}
+  else if (code == MULT_EXPR
+  && is_gimple_assign (def_stmt)
+  && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
+  && !HONOR_SNANS (TREE_TYPE (op))
+  && (!HONOR_SIGNED_ZEROS (TREE_TYPE (op))
+  || !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (op
+{
+  tree rhs1 = gimple_assign_rhs1 (def_stmt);
+  tree cst = build_minus_one_cst (TREE_TYPE (op));
+  add_to_ops_vec (ops, rhs1);
+  add_to_ops_vec (ops, cst);
+  gimple_set_visited (def_stmt,

Re: [MIPS,committed] Update MIPS P5600 processor definition to avoid IMADD

2016-05-04 Thread Maciej W. Rozycki

On Wed, 4 May 2016, Matthew Fortune wrote:

> diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-cpus.def
> index 17034f2..5df9807 100644
> --- a/gcc/config/mips/mips-cpus.def
> +++ b/gcc/config/mips/mips-cpus.def
> @@ -44,10 +44,7 @@ MIPS_CPU ("mips4", PROCESSOR_R1, 4, 0)
> isn't tuned to a specific processor.  */
>  MIPS_CPU ("mips32", PROCESSOR_4KC, 32, PTF_AVOID_BRANCHLIKELY)
>  MIPS_CPU ("mips32r2", PROCESSOR_74KF2_1, 33, PTF_AVOID_BRANCHLIKELY)
> -/* mips32r3 is micromips hense why it uses the M4K processor.
> -   mips32r5 should use the p5600 processor, but there is no definition 
> -   for this yet, so in the short term we will use the same processor entry 
> -   as mips32r2.  */
> +/* mips32r3 is micromips hense why it uses the M4K processor.  */

 Typo here -> s/hense/hence/ -- since you've reworked the comment and 
changed the line in the course anyway, you may have well taken the 
opportunity and fixed it.

> @@ -150,7 +147,8 @@ MIPS_CPU ("1004kf1_1", PROCESSOR_24KF1_1, 33, 0)
>  MIPS_CPU ("interaptiv", PROCESSOR_24KF2_1, 33, 0)
>  
>  /* MIPS32 Release 5 processors.  */
> -MIPS_CPU ("p5600", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY)
> +MIPS_CPU ("p5600", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY
> + | PTF_AVOID_IMADD)

 Not:

MIPS_CPU ("p5600", PROCESSOR_P5600, 36, (PTF_AVOID_BRANCHLIKELY
 | PTF_AVOID_IMADD))

?

  Maciej

Re: [RS6000] Rewrite rs6000_frame_related to use simplify_replace_rtx

2016-05-04 Thread Segher Boessenkool

On Thu, May 05, 2016 at 06:49:04AM +0930, Alan Modra wrote:
> >  And it's a better name anyway?
> 
> No, "real" seems silly to me.  "patt" is a common idiom used in lots
> of places for the pattern of an instruction.

"patt" is used only once (in fwprop), everything else uses "pat".

> What is "real" supposed
> to mean?  The real pattern vs. some imaginary one?

Yes exactly.  This function is making a note that does the same thing
as the real insn.

> The final pattern
> we want?  The last meaning might have made some sense in the very
> first implementation of rs6000_frame_related where the code did
> something like
>   real = replace_rtx (PATTERN (insn), ...);
> then made simplify_rtx calls.
> 
> I think "real" is confusing when we're making substitutions step by
> step.

True enough, okay, the later part of the function changes "real".

> > > -  if (REGNO (reg) == STACK_POINTER_REGNUM && reg2 == NULL_RTX)
> > > +  repl = NULL_RTX;
> > > +  if (REGNO (reg) == STACK_POINTER_REGNUM)
> > > +gcc_checking_assert (val == 0);
> > > +  else
> > > +repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, 
> > > STACK_POINTER_REGNUM),
> > > +  GEN_INT (val));
> > 
> > Put the NULL_RTX assignment in the first arm, please.
> 
> OK, I'll make that style change, but only because we have that
> gcc_checking_assert there.
> 
> Otherwise I would have written
>   repl = NULL_RTX;
>   if (REGNO (reg) != STACK_POINTER_REGNUM)
> repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
>GEN_INT (val));
> 
> which is better than
>   if (REGNO (reg) == STACK_POINTER_REGNUM)
> repl = NULL_RTX;
>   else
> repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
>GEN_INT (val));

I think we're supposed to use a ? : thing for that.  Sucks as well; I would
try to rewrite the whole code to avoid such nasties.  Or move on :-)


Segher

[C++ PATCH] PR c++/69855

2016-05-04 Thread Ville Voutilainen

Tested on Linux-PPC64. Comments very much welcomed on the change
to g++.old-deja/g++.pt/crash3.C, I'm not at all sure what that test
is trying to do; it looks like it may have never cared about the names
of the local functions, but rather about the fact that the function
bodies of the member functions of the class template are not instantiated
just because the member functions return types that are specializations
of the class template.

/cp
PR c++/69855.
name-lookup.c (pushdecl_maybe_friend_1): Push local function
decls into the global scope after stripping template bits
and setting DECL_ANTICIPATED.

/testsuite
PR c++/69855.
g++.dg/overload/69855.C: New.
g++.old-deja/g++.law/missed-error2.C: Adjust.
g++.old-deja/g++.pt/crash3.C: Likewise.


69855.diff4
Description: Binary data

Re: [RS6000] TARGET_RELOCATABLE

2016-05-04 Thread Alan Modra

On Wed, May 04, 2016 at 11:55:31AM -0500, Segher Boessenkool wrote:
> On Wed, May 04, 2016 at 02:21:18PM +0930, Alan Modra wrote:
> > Also, since flag_pic is set by -mrelocatable, a number of places that
> > currently test TARGET_RELOCATABLE can be simplified.  I also made
> > -mrelocatable set TARGET_NO_FP_IN_TOC, allowing TARGET_RELOCATABLE to
> > be removed from ASM_OUTPUT_SPECIAL_POOL_ENTRY_P.  Reducing occurrences
> > of TARGET_RELOCATABLE is a good thing.
> 
> Does this TARGET_NO_FP_IN_TOC setting need documenting somewhere?

It's not actually a change in behaviour.  We didn't put fp in toc
with -mrelocatable before this patch.

-- 
Alan Modra
Australia Development Lab, IBM

Re: Enabling -frename-registers?

2016-05-04 Thread Pat Haugen

On 05/04/2016 10:20 AM, Wilco Dijkstra wrote:
> Also when people claim they can't see any benefit, did they check the 
> codesize difference on SPEC2006?
> On AArch64 codesize reduced uniformly due to fewer moves (and in a few cases 
> significantly so). I expect
> that to be true for other RISC targets. Simply put, reduced codesize at no 
> performance loss = gain.

Comparing text size on powerpc64 for CPU2006 executables, 20 decreased in size, 
3 stayed the same, and 6 increased in size.

-Pat

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Samuel Thibault

Svante Signell, on Wed 04 May 2016 23:25:28 +0200, wrote:
> On Wed, 2016-05-04 at 23:06 +0200, Samuel Thibault wrote:
> > Svante Signell, on Wed 04 May 2016 19:43:27 +0200, wrote:
> > > May I comment on Debian way of apt-get source gcc-*: Doing that
> > > does
> > > not unpack the sources, neither does it apply the patches, you have
> > > to
> > > unpack and patch before you can change sources and update patches.
> > > Iv'e
> > > patched the sources several times and still find that the updated
> > > patches are not included in the next build. Really confusing.
> > 
> > Did you read debian/README.source?
> 
> Now I have read it, but still cannot find a convincing reason for doing
> things this way, sorry! Matthias, why? There should be very strong
> arguments for the present procedure.

See rules.patch. You can't get this behavior with the simple dpkg
patching.

Samuel

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Svante Signell

On Wed, 2016-05-04 at 23:06 +0200, Samuel Thibault wrote:
> Svante Signell, on Wed 04 May 2016 19:43:27 +0200, wrote:
> > May I comment on Debian way of apt-get source gcc-*: Doing that
> > does
> > not unpack the sources, neither does it apply the patches, you have
> > to
> > unpack and patch before you can change sources and update patches.
> > Iv'e
> > patched the sources several times and still find that the updated
> > patches are not included in the next build. Really confusing.
> 
> Did you read debian/README.source?

Now I have read it, but still cannot find a convincing reason for doing
things this way, sorry! Matthias, why? There should be very strong
arguments for the present procedure.

Re: [RS6000] Rewrite rs6000_frame_related to use simplify_replace_rtx

2016-05-04 Thread Alan Modra

On Wed, May 04, 2016 at 11:26:18AM -0500, Segher Boessenkool wrote:
> On Wed, May 04, 2016 at 11:14:41AM +0930, Alan Modra wrote:
> > * config/rs6000/rs6000.c (rs6000_frame_related): Rewrite.
> 
> > -  rtx real, temp;
> > +  rtx patt, repl;
> 
> If you don't rename "real" here it is probably easier to read?

Easier to read the diff, yes.

>  And it's a better name anyway?

No, "real" seems silly to me.  "patt" is a common idiom used in lots
of places for the pattern of an instruction.  What is "real" supposed
to mean?  The real pattern vs. some imaginary one?  The final pattern
we want?  The last meaning might have made some sense in the very
first implementation of rs6000_frame_related where the code did
something like
  real = replace_rtx (PATTERN (insn), ...);
then made simplify_rtx calls.

I think "real" is confusing when we're making substitutions step by
step.

> > -  if (REGNO (reg) == STACK_POINTER_REGNUM && reg2 == NULL_RTX)
> > +  repl = NULL_RTX;
> > +  if (REGNO (reg) == STACK_POINTER_REGNUM)
> > +gcc_checking_assert (val == 0);
> > +  else
> > +repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
> > +GEN_INT (val));
> 
> Put the NULL_RTX assignment in the first arm, please.

OK, I'll make that style change, but only because we have that
gcc_checking_assert there.

Otherwise I would have written
  repl = NULL_RTX;
  if (REGNO (reg) != STACK_POINTER_REGNUM)
repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
 GEN_INT (val));

which is better than
  if (REGNO (reg) == STACK_POINTER_REGNUM)
repl = NULL_RTX;
  else
repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
 GEN_INT (val));

-- 
Alan Modra
Australia Development Lab, IBM

Re: [C++ Patch] PR 68722

2016-05-04 Thread Jason Merrill

Agreed, I tend not to backport bugs on invalid code, definitely not if
we already give a useful diagnostic.

Jason


On Wed, May 4, 2016 at 4:10 PM, Paolo Carlini  wrote:
> Hi again,
>
> On 12/04/2016 15:53, Jason Merrill wrote:
>>
>> Let's go with the first patch.
>
> What about this one? Today I returned to it, and technically it still
> represents a regression in gcc-4_9-branch and gcc-5-branch, but personally
> I'd rather not backport the fix: in release-mode we just emit an additional
> "confused by earlier errors, bailing out" at the end of a rather long series
> of diagnostic messages, the first ones meaningful, the last redundant anyway
> and the snippet triggering it seems particularly broken to me...
>
> Thanks,
> Paolo.
>

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Samuel Thibault

Svante Signell, on Wed 04 May 2016 19:43:27 +0200, wrote:
> May I comment on Debian way of apt-get source gcc-*: Doing that does
> not unpack the sources, neither does it apply the patches, you have to
> unpack and patch before you can change sources and update patches. Iv'e
> patched the sources several times and still find that the updated
> patches are not included in the next build. Really confusing.

Did you read debian/README.source?

Samuel

[PATCH, i386]: Fix PR 70873 - 20% performance regression at 482.sphinx3 after r235442 with -O2 -m32 on Haswell.

2016-05-04 Thread Uros Bizjak

Hello!

This patch moves all TARGET_SSE_PARTIAL_REG_DEPENDENCY FP conversion
splitters to a later split pass. Plus, the patch substantially cleans
these and related patterns.

The functionality of post-reload conversion splitters goes this way:

- process FP conversions for TARGET_USE_VECTOR_FP_CONVERTS in an early
post-reload splitter. This pass will rewrite FP conversions to vector
insns and is thus incompatible with the next two passes. AMDFAM10
processors depend on this transformation.

- process FP conversions for TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS in
a peephole2 pass. This will transform mem->reg insns to reg->reg
insns, and these insn could be processed by the next pass. Some Intel
processors depend on this transformation.

- process FP conversions for TARGET_SSE_PARTIAL_REG_DEPENDENCY in a
late post-reload splitter, when allocated registers are stable. AMD
and Intel processors depend on this pass, so it is part of generic
tuning.

As mentioned by HJ in the PR, there looks to be a problem with the
generic splitting infrastructure. When a splitter is matched, but
FAILs in the preparatory statements , no other splitters with the same
pattern are executed. IMO, this is an implementation bug, after
splitter is FAILed, others should still be executed.

2016-05-04  Uros Bizjak  

PR target/70873
* config/i386/i386.md (extendsfdf2): Use nonimm_ssenomem_operand
as operand 0 predicate.
(TARGET_SSE_PARTIAL_REG_DEPENDENCY float_extend sf->df peephole2):
Change to post-epilogue_completed late splitter.  Use sse_reg_operand
as operand 0 predicate.
(TARGET_SSE_PARTIAL_REG_DEPENDENCY float_truncate df->sf peephole2):
Ditto.
(TARGET_SSE_PARTIAL_REG_DEPENDENCY float {si,di}->{sf,df} peephole2):
Ditto.  Emit the pattern using RTX.

(TARGET_USE_VECTOR_FP_CONVERTS float_extend sf->df splitter):
Use sse_reg_opreand as operand 0 predicate.  Do not use true_regnum in
the post-reload splitter.  Use lowpart_subreg instead of gen_rtx_REG.
(TARGET_USE_VECTOR_FP_CONVERTS float_truncate df->sf splitter):
Ditto.
(TARGET_USE_VECTOR_CONVERTS float si->{sf,df} splitter): Use
sse_reg_operand as operand 0 predicate.

(TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS float_extend sf->df peephole2):
Use sse_reg_opreand as operand 0 predicate.  Use lowpart_subreg
instead of gen_rtx_REG.
(TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS float_truncate sf->df peephole2):
Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ba1ff8b..dd56b05 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4231,12 +4231,12 @@
that might lead to ICE on 32bit target.  The sequence unlikely combine
anyway.  */
 (define_split
-  [(set (match_operand:DF 0 "register_operand")
+  [(set (match_operand:DF 0 "sse_reg_operand")
 (float_extend:DF
  (match_operand:SF 1 "nonimmediate_operand")))]
   "TARGET_USE_VECTOR_FP_CONVERTS
&& optimize_insn_for_speed_p ()
-   && reload_completed && SSE_REG_P (operands[0])
+   && reload_completed
&& (!EXT_REX_SSE_REG_P (operands[0])
|| TARGET_AVX512VL)"
[(set (match_dup 2)
@@ -4253,13 +4253,11 @@
 {
   /* If it is unsafe to overwrite upper half of source, we need
 to move to destination and unpack there.  */
-  if (((ORIGINAL_REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
-   || PSEUDO_REGNO_BYTES (ORIGINAL_REGNO (operands[1])) > 4)
-  && true_regnum (operands[0]) != true_regnum (operands[1]))
+  if (REGNO (operands[0]) != REGNO (operands[1])
  || (EXT_REX_SSE_REG_P (operands[1])
  && !TARGET_AVX512VL))
{
- rtx tmp = gen_rtx_REG (SFmode, true_regnum (operands[0]));
+ rtx tmp = lowpart_subreg (SFmode, operands[0], DFmode);
  emit_move_insn (tmp, operands[1]);
}
   else
@@ -4267,7 +4265,7 @@
   /* FIXME: vec_interleave_lowv4sf for AVX512VL should allow
 =v, v, then vbroadcastss will be only needed for AVX512F without
 AVX512VL.  */
-  if (!EXT_REX_SSE_REGNO_P (true_regnum (operands[3])))
+  if (!EXT_REX_SSE_REGNO_P (REGNO (operands[3])))
emit_insn (gen_vec_interleave_lowv4sf (operands[3], operands[3],
   operands[3]));
   else
@@ -4283,15 +4281,14 @@
 
 ;; It's more profitable to split and then extend in the same register.
 (define_peephole2
-  [(set (match_operand:DF 0 "register_operand")
+  [(set (match_operand:DF 0 "sse_reg_operand")
(float_extend:DF
  (match_operand:SF 1 "memory_operand")))]
   "TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS
-   && optimize_insn_for_speed_p ()
-   && SSE_REG_P (operands[0])"
+   && optimize_insn_for_speed_p ()"
   [(set (match_dup 2) (match_dup 1))
(set (match_dup 0) (float_extend:DF (match_dup 2)))]
-  "operands[2] =

Fix dangling reference in comment

2016-05-04 Thread Eric Botcazou

var_map_base_init is gone and all the machinery is now in tree-ssa-coalesce.c 
(and the 2 functions have explicit back references to gimple_can_coalesce_p).

Tested on x86_64-suse-linux, applied on the mainline and 6 branch as obvious.


2016-05-04  Eric Botcazou  

* tree-ssa-coalesce.c (gimple_can_coalesce_p): Fix reference in head
comment.
(compute_samebase_partition_bases): Fix typo.

-- 
Eric BotcazouIndex: tree-ssa-coalesce.c
===
--- tree-ssa-coalesce.c	(revision 235858)
+++ tree-ssa-coalesce.c	(working copy)
@@ -1505,7 +1505,8 @@ dump_part_var_map (FILE *f, partition pa
 /* Given SSA_NAMEs NAME1 and NAME2, return true if they are candidates for
coalescing together, false otherwise.
 
-   This must stay consistent with var_map_base_init in tree-ssa-live.c.  */
+   This must stay consistent with compute_samebase_partition_bases and 
+   compute_optimized_partition_bases.  */
 
 bool
 gimple_can_coalesce_p (tree name1, tree name2)
@@ -1759,7 +1760,7 @@ compute_samebase_partition_bases (var_ma
   else
 	/* This restricts what anonymous SSA names we can coalesce
 	   as it restricts the sets we compute conflicts for.
-	   Using TREE_TYPE to generate sets is the easies as
+	   Using TREE_TYPE to generate sets is the easiest as
 	   type equivalency also holds for SSA names with the same
 	   underlying decl.

[PATCH 3/4] Extract deferred-location handling from jit

2016-05-04 Thread David Malcolm

In order to faithfully load RTL dumps that contain references to
source locations, the RTL frontend needs to be able to parse file
and line information and turn then into location_t values.

Unfortunately, the libcpp API makes it rather fiddly to create
location_t values from a sequence of arbitrary file/line pairs: the
API assumes that the locations are created in ascending order as
if we were parsing the source file, but as we read an RTL dump,
the insns could be jumping forwards and backwards in lines and
between files.  Also, if we want to support column numbers, the
presence of a very high column number could exceed the bits available
in a line_map_ordinary for storing it.

The JIT has some code for handling this, in gcc/jit/jit-playback.[ch],
(since the JIT support source location information, and doesn't impose
any ordering requirement on users of the API).

This patch moves the relevant code from
  gcc/jit/jit-playback.[ch]
into a new pair of files:
  gcc/deferred-locations.[ch]

The idea is that a deferred_locations instances manages these
"deferred locations"; they are created, and then all of the location_t
values are created at once by calling
  deferred_locations::add_to_line_table
After this call, the actual location_t values can be read from out of
deferred_location instances.

There are some suboptimal parts of the code (some linear searches, and
the use of gc), but it's mostly a move of existing code from out of the
jit subdirectory and into "gcc" proper for reuse by the RTL frontend.

This is likely to be useful for the gimple frontend as well.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add deferred-locations.o.
* deferred-locations.c: New file, adapted from parts of
jit/jit-playback.c.
* deferred-locations.h: New file, adapted from parts of
jit/jit-playback.h.

gcc/jit/ChangeLog:
* jit-common.h: Include deferred-locations.h.
(gcc::jit::playback::source_file): Remove forward decl.
(gcc::jit::playback::source_line): Likewise.
(gcc::jit::playback::location): Replace forward decl, with
a typedef, aliasing deferred_location.
* jit-playback.c (gcc::jit::playback::context::context): Remove
create call on m_source_files.
(line_comparator): Move to deferred-locations.c.
(location_comparator): Likewise.
(handle_locations): Move logic to deferred-locations.c, as
deferred_locations::add_to_line_table.
(get_recording_loc): New function.
(gcc::jit::playback::context::add_error): Call get_recording_loc
as a function, rather than as a method.
(gcc::jit::playback::context::add_error_va): Likewise.
(gcc::jit::playback::context::get_source_file): Update return type
to reflect move of source_file to deferred-locations.h.
Replace body with a call to m_deferred_locations.get_source_file.
(gcc::jit::playback::source_file::source_file): Move to
deferred-locations.h, losing the namespaces.
(gcc::jit::playback::source_file::finalizer): Likewise.
(gcc::jit::playback::source_file::get_source_line): Likewise.
(gcc::jit::playback::source_line::source_line): Likewise.
(gcc::jit::playback::source_line::finalizer): Likewise.
(gcc::jit::playback::source_line::get_location): Likewise.
(gcc::jit::playback::location::location): Likewise, renaming to
deferred_location.
* jit-playback.h: Include deferred-locations.h.
(gcc::jit::playback::context::m_source_files): Replace field with
m_deferred_locations.
(gcc::jit::playback::source_file): Move to deferred-locations.h,
losing the namespaces.
(gcc::jit::playback::source_line): Likewise.
(gcc::jit::playback::location): Likewise, renaming to
deferred_location.  Eliminate get_recording_loc accessor and
m_recording_loc field in favor of get_user_data and m_user_data
respectively.
---
 gcc/Makefile.in  |   1 +
 gcc/deferred-locations.c | 240 +++
 gcc/deferred-locations.h | 139 +++
 gcc/jit/jit-common.h |   5 +-
 gcc/jit/jit-playback.c   | 194 --
 gcc/jit/jit-playback.h   |  73 +-
 6 files changed, 402 insertions(+), 250 deletions(-)
 create mode 100644 gcc/deferred-locations.c
 create mode 100644 gcc/deferred-locations.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6c5adc0..c61f303 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1239,6 +1239,7 @@ OBJS = \
dce.o \
ddg.o \
debug.o \
+   deferred-locations.o \
df-core.o \
df-problems.o \
df-scan.o \
diff --git a/gcc/deferred-locations.c b/gcc/deferred-locations.c
new file mode 100644
index 000..cd02938
--- /dev/null
+++ b/gcc/deferred-locations.c
@@ -0,0 +1,240 @@
+/* Dealing with the linemap API.
+

[PATCH 0/4] RFC: RTL frontend

2016-05-04 Thread David Malcolm

This patch kit introduces an RTL frontend, for the purpose
of unit testing: primarly for unit testing of RTL passes, and
possibly for unit testing of .md files.

It's very much a work-in-progress; I'm posting it now to get feedback.
I've successfully bootstrapped patches 1-3 of the kit on
x86_64-pc-linux-gnu, but patch 4 (which is the heart of the
implementation) doesn't survive bootstrap yet (dependency issues
in the Makefile).

The rest of this post is from gcc/rtl/notes.rst from patch 4; I'm
adding a duplicate copy up-front here to make it easier to get an
overview.

RTL frontend


Purpose
***

Historically GCC testing has been done by providing source files
to be built with various command-line options (via DejaGnu
directives), dumping state at pertinent places, and verifying
properties of the state via these dumps.

A strength of this approach is that we have excellent integration
testing, as every test case exercises the toolchain as a whole, but
it has the drawback that when testing a specific pass,
we have little control of the input to that specific pass.  We
provide input, and the various passes transform the state
of the internal representation::

  INPUT -> PASS-1 -> STATE-1 -> PASS-2 -> STATE-2 -> ...
-> etc ->
-> ... -> PASS-n-1 -> STATE-n-1 -> PASS-n -> STATE-n
  ^^ ^
  || Output from the pass
  |The pass we care about
  The actual input to the pass

so the intervening passes before "PASS-n" could make changes to the
IR that affect the input seen by our pass ("STATE-n-1" above).  This
can break our test cases, sometimes in a form that's visible,
sometimes invisibly (e.g. where a test case silently stops providing
coverage).

The aim of the RTL frontend is to provide a convenient way to test
individual passes in the backend, by loading dumps of specific RTL
state (possibly edited by hand), and then running just one specific
pass on them, so that we effectively have this::

  INPUT -> PASS-n -> OUTPUT

thus fixing the problem above.

My hope is that this makes it easy to write more fine-grained and
robust test coverage for the RTL phase of GCC.  However I see this
as *complementary* to the existing "integrated testing" approach:
patches should include both RTL frontend tests *and* integrated tests,
to avoid regressing the great integration testing we currently have.

The idea is to use the existing dump format as a input format, since
presumably existing GCC developers are very familiar with the dump
format.

One other potential benefit of this approach is to allow unit-testing
of machine descriptions - we could provide specific RTL fragments,
and have the rtl.dg testsuite directly verify that we recognize all
instructions and addressing modes that a given target ought to support.

Structure
*

The RTL frontend is similar to a regular frontend: a gcc/rtl
subdirectory within the source tree contains frontend-specific hooks.
These provide a new "rtl" frontend, which can be optionally
enabled at configuration time within --enable-languages.

If enabled, it builds an rtl1 binary, which is invoked by the
gcc driver on files with a .rtl extension.

The testsuite is below gcc/testsuite/rtl.dg.  There's also
a "roundtrip" subdirectory below this, in which every .rtl
file is loaded and then dumped; roundtrip.exp verifies that
the dump is identical to the original file, thus ensuring that
the RTL loaders faithfully rebuild the input dump.

Limitations
***

* It's a work-in-progress.  There will be bugs.

* The existing RTL code is structured around a single function being
  optimized, so, as a simplification, the RTL frontend can only handle
  one function per input file.  Also, the dump format currently uses
  comments to separate functions::

;; Function test_1 (test_1, funcdef_no=0, decl_uid=1758, cgraph_uid=0, 
symbol_order=0)

... various pass-specific things, sometimes expressed as comments,
sometimes not

;;
;; Full RTL generated for this function:
;;
(note 1 0 6 NOTE_INSN_DELETED)
;; etc, insns for function "test_1" go here
(insn 27 26 0 6 (use (reg/i:SI 0 ax)) 
../../src/gcc/testsuite/rtl.dg/test.c:7 -1
 (nil))

;; Function test_2 (test_2, funcdef_no=1, decl_uid=1765, cgraph_uid=1, 
symbol_order=1)
... various pass-specific things, sometimes expressed as comments,
sometimes not
;;
;; Full RTL generated for this function:
;;
(note 1 0 5 NOTE_INSN_DELETED)
;; etc, insns for function "test_2" go here
(insn 59 58 0 8 (use (reg/i:SF 21 xmm0)) 
../../src/gcc/testsuite/rtl.dg/test.c:31 -1
 (nil))

  so that there's no clear separation of the instructions between the
  two functions (and no metadata e.g. function names).

  This could be fixed by adding a new clause to the dump e.g.::

(function "test_1" [
  (note 1 0 6

[PATCH 2/4] Move name_to_pass_map into class pass_manager

2016-05-04 Thread David Malcolm

The RTL frontend needs to be able to lookup passes by name.

passes.c has global state name_to_pass_map (albeit static, scoped
to passes.c), for use by enable_disable_pass.

Move it to be a field of class pass_manager, and add
a get_pass_by_name method.

OK for trunk?

gcc/ChangeLog:
* pass_manager.h (pass_manager::register_pass_name): New method.
(pass_manager::get_pass_by_name): New method.
(pass_manager::create_pass_tab): New method.
(pass_manager::m_name_to_pass_map): New field.
* passes.c (name_to_pass_map): Delete global in favor of field
"m_name_to_pass_map" of pass_manager.
(register_pass_name): Rename from a function to...
(pass_manager::register_pass_name): ...this method, updating
for renaming of global "name_to_pass_map" to field
"m_name_to_pass_map".
(create_pass_tab): Rename from a function to...
(pass_manager::create_pass_tab): ...this method, updating
for renaming of global "name_to_pass_map" to field.
(get_pass_by_name): Rename from a function to...
(pass_manager::get_pass_by_name): ...this method.
(enable_disable_pass): Convert use of get_pass_by_name to
a method call, locating the pass_manager singleton.
---
 gcc/pass_manager.h |  6 ++
 gcc/passes.c   | 34 +++---
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 4f89d31..464e25f 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -78,6 +78,10 @@ public:
   opt_pass *get_pass_peephole2 () const { return pass_peephole2_1; }
   opt_pass *get_pass_profile () const { return pass_profile_1; }
 
+  void register_pass_name (opt_pass *pass, const char *name);
+
+  opt_pass *get_pass_by_name (const char *name);
+
 public:
   /* The root of the compilation pass tree, once constructed.  */
   opt_pass *all_passes;
@@ -95,9 +99,11 @@ public:
 private:
   void set_pass_for_id (int id, opt_pass *pass);
   void register_dump_files (opt_pass *pass);
+  void create_pass_tab () const;
 
 private:
   context *m_ctxt;
+  hash_map *m_name_to_pass_map;
 
   /* References to all of the individual passes.
  These fields are generated via macro expansion.
diff --git a/gcc/passes.c b/gcc/passes.c
index 2b70846..0565cfa 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -66,8 +66,6 @@ using namespace gcc;
The variable current_pass is also used for statistics and plugins.  */
 opt_pass *current_pass;
 
-static void register_pass_name (opt_pass *, const char *);
-
 /* Most passes are single-instance (within their context) and thus don't
need to implement cloning, but passes that support multiple instances
*must* provide their own implementation of the clone method.
@@ -844,21 +842,19 @@ pass_manager::register_dump_files (opt_pass *pass)
   while (pass);
 }
 
-static hash_map *name_to_pass_map;
-
 /* Register PASS with NAME.  */
 
-static void
-register_pass_name (opt_pass *pass, const char *name)
+void
+pass_manager::register_pass_name (opt_pass *pass, const char *name)
 {
-  if (!name_to_pass_map)
-name_to_pass_map = new hash_map (256);
+  if (!m_name_to_pass_map)
+m_name_to_pass_map = new hash_map (256);
 
-  if (name_to_pass_map->get (name))
+  if (m_name_to_pass_map->get (name))
 return; /* Ignore plugin passes.  */
 
-  const char *unique_name = xstrdup (name);
-  name_to_pass_map->put (unique_name, pass);
+  const char *unique_name = xstrdup (name);
+  m_name_to_pass_map->put (unique_name, pass);
 }
 
 /* Map from pass id to canonicalized pass name.  */
@@ -882,14 +878,14 @@ passes_pass_traverse (const char *const , opt_pass 
*const , void *)
 /* The function traverses NAME_TO_PASS_MAP and creates a pass info
table for dumping purpose.  */
 
-static void
-create_pass_tab (void)
+void
+pass_manager::create_pass_tab (void) const
 {
   if (!flag_dump_passes)
 return;
 
-  pass_tab.safe_grow_cleared (g->get_passes ()->passes_by_id_size + 1);
-  name_to_pass_map->traverse  (NULL);
+  pass_tab.safe_grow_cleared (passes_by_id_size + 1);
+  m_name_to_pass_map->traverse  (NULL);
 }
 
 static bool override_gate_status (opt_pass *, tree, bool);
@@ -960,10 +956,10 @@ pass_manager::dump_passes () const
 
 /* Returns the pass with NAME.  */
 
-static opt_pass *
-get_pass_by_name (const char *name)
+opt_pass *
+pass_manager::get_pass_by_name (const char *name)
 {
-  opt_pass **p = name_to_pass_map->get (name);
+  opt_pass **p = m_name_to_pass_map->get (name);
   if (p)
 return *p;
 
@@ -1025,7 +1021,7 @@ enable_disable_pass (const char *arg, bool is_enable)
   free (argstr);
   return;
 }
-  pass = get_pass_by_name (phase_name);
+  pass = g->get_passes ()->get_pass_by_name (phase_name);
   if (!pass || pass->static_pass_number == -1)
 {
   if

[PATCH 1/4] Make argv const char ** in read_md_files etc

2016-05-04 Thread David Malcolm

This patch makes the argv param to read_md_files const, needed
so that the RTL frontend can call it on a const char *.

While we're at it, it similarly makes const the argv for all
of the "main" functions of the various gen*.

OK for trunk?

gcc/ChangeLog:
* genattr-common.c (main): Convert argv from
char ** to const char **.
* genattr.c (main): Likewise.
* genattrtab.c (main): Likewise.
* genautomata.c (initiate_automaton_gen): Likewise.
(main): Likewise.
* gencodes.c (main): Likewise.
* genconditions.c (main): Likewise.
* genconfig.c (main): Likewise.
* genconstants.c (main): Likewise.
* genemit.c (main): Likewise.
* genenums.c (main): Likewise.
* genextract.c (main): Likewise.
* genflags.c (main): Likewise.
* genmddeps.c (main): Likewise.
* genopinit.c (main): Likewise.
* genoutput.c (main): Likewise.
* genpeep.c (main): Likewise.
* genpreds.c (main): Likewise.
* genrecog.c (main): Likewise.
* gensupport.c (init_rtx_reader_args_cb): Likewise.
(init_rtx_reader_args): Likewise.
* gensupport.h (init_rtx_reader_args_cb): Likewise.
(init_rtx_reader_args): Likewise.
* gentarget-def.c (main): Likewise.
* read-md.c (read_md_files): Likewise.
* read-md.h (read_md_files): Likewise.
---
 gcc/genattr-common.c | 2 +-
 gcc/genattr.c| 2 +-
 gcc/genattrtab.c | 2 +-
 gcc/genautomata.c| 4 ++--
 gcc/gencodes.c   | 2 +-
 gcc/genconditions.c  | 2 +-
 gcc/genconfig.c  | 2 +-
 gcc/genconstants.c   | 2 +-
 gcc/genemit.c| 2 +-
 gcc/genenums.c   | 2 +-
 gcc/genextract.c | 2 +-
 gcc/genflags.c   | 2 +-
 gcc/genmddeps.c  | 2 +-
 gcc/genopinit.c  | 2 +-
 gcc/genoutput.c  | 4 ++--
 gcc/genpeep.c| 4 ++--
 gcc/genpreds.c   | 2 +-
 gcc/genrecog.c   | 2 +-
 gcc/gensupport.c | 4 ++--
 gcc/gensupport.h | 5 +++--
 gcc/gentarget-def.c  | 2 +-
 gcc/read-md.c| 2 +-
 gcc/read-md.h| 2 +-
 23 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/gcc/genattr-common.c b/gcc/genattr-common.c
index e073faf..a11fbf7 100644
--- a/gcc/genattr-common.c
+++ b/gcc/genattr-common.c
@@ -61,7 +61,7 @@ gen_attr (md_rtx_info *info)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   bool have_delay = false;
   bool have_sched = false;
diff --git a/gcc/genattr.c b/gcc/genattr.c
index c6db37f..656a9a7 100644
--- a/gcc/genattr.c
+++ b/gcc/genattr.c
@@ -138,7 +138,7 @@ find_tune_attr (rtx exp)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   bool have_annul_true = false;
   bool have_annul_false = false;
diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index c956527..d39d4a7 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -5197,7 +5197,7 @@ handle_arg (const char *arg)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   struct attr_desc *attr;
   struct insn_def *id;
diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index e3a6c59..dcde604 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -9300,7 +9300,7 @@ parse_automata_opt (const char *str)
 /* The following is top level function to initialize the work of
pipeline hazards description translator.  */
 static void
-initiate_automaton_gen (char **argv)
+initiate_automaton_gen (const char **argv)
 {
   const char *base_name;
 
@@ -9592,7 +9592,7 @@ write_automata (void)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "genautomata";
 
diff --git a/gcc/gencodes.c b/gcc/gencodes.c
index e0dd32a..3b0fc5c 100644
--- a/gcc/gencodes.c
+++ b/gcc/gencodes.c
@@ -47,7 +47,7 @@ gen_insn (md_rtx_info *info)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "gencodes";
 
diff --git a/gcc/genconditions.c b/gcc/genconditions.c
index 8abf1c2..e4f45b0 100644
--- a/gcc/genconditions.c
+++ b/gcc/genconditions.c
@@ -212,7 +212,7 @@ write_writer (void)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "genconditions";
 
diff --git a/gcc/genconfig.c b/gcc/genconfig.c
index b6ca35a..815e30d 100644
--- a/gcc/genconfig.c
+++ b/gcc/genconfig.c
@@ -269,7 +269,7 @@ gen_peephole2 (md_rtx_info *info)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "genconfig";
 
diff --git a/gcc/genconstants.c b/gcc/genconstants.c
index b96bc50..c10e3e3 100644
--- a/gcc/genconstants.c
+++ b/gcc/genconstants.c
@@ -75,7 +75,7 @@ print_enum_type (void **slot, void *info ATTRIBUTE_UNUSED)
 }
 
 int
-main (int argc, char **argv)
+main (int argc, const char **argv)
 {
   progname = "genconstants";
 
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 87f5301..33040aa 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -745,7 +745,7 @@ output_peephole2_scratches (rtx split)

Re: [PATCH] Fix operand_equal_p hash checking (PR c++/70906, PR c++/70933)

2016-05-04 Thread Richard Biener

On May 4, 2016 9:29:37 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>These 2 PRs were DUPed, yet they are actually different, but somewhat
>related.
>One of the ICEs is due to the OEP_ADDRESS_OF consistency checks that
>both operand_equal_p and inchash::add_expr have (that want to verify
>that we don't e.g. have ADDR_EXPR of ADDR_EXPR).
>operand_equal_p never returns true for TARGET_EXPR though, unless there
>is pointer equality, but if we need to hash e.g. ADDR_EXPR of
>TARGET_EXPR with ADDR_EXPR inside of TARGET_EXPR_INITIAL, we currently
>ICE.  We could process TARGET_EXPR_{INITIAL,CLEANUP} with
>OEP_ADDRESS_OF masked off, but I believe different TARGET_EXPRs should
>use different TARGET_EXPR_SLOT variables and thus it should be enough
>to hash just the TARGET_EXPR_SLOT and ignore the other arguments.
>
>The second issue is that in the FEs, we can end up calling
>operand_equal_p
>and e.g. for not really equal, but similar (e.g. useless NOP_EXPR of
>SAVE_EXPR
>and the SAVE_EXPR itself) it can be on trees that contain various FE
>specific trees, including constants (like PTRMEM_CST), and others.
>
>The patch arranges to just not ICE in that case if called from the
>operand_equal_p checking, which has the advantage that we will still
>disallow it when people call inchash::add_expr otherwise.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

>2016-05-04  Jakub Jelinek  
>
>   PR c++/70906
>   PR c++/70933
>   * tree-core.h (enum operand_equal_flag): Add OEP_HASH_CHECK.
>   * tree.c (inchash::add_expr): If !IS_EXPR_CODE_CLASS (tclass),
>   assert flags & OEP_HASH_CHECK, instead of asserting it
>   never happens.  Handle TARGET_EXPR.
>   * fold-const.c (operand_equal_p): For hash verification,
>   or in OEP_HASH_CHECK into flags.
>
>   * g++.dg/opt/pr70906.C: New test.
>   * g++.dg/opt/pr70933.C: New test.
>
>--- gcc/tree-core.h.jj 2016-04-27 15:29:05.0 +0200
>+++ gcc/tree-core.h2016-05-04 12:13:59.361459074 +0200
>@@ -767,7 +767,9 @@ enum operand_equal_flag {
>   OEP_MATCH_SIDE_EFFECTS = 4,
>   OEP_ADDRESS_OF = 8,
>   /* Internal within operand_equal_p:  */
>-  OEP_NO_HASH_CHECK = 16
>+  OEP_NO_HASH_CHECK = 16,
>+  /* Internal within inchash::add_expr:  */
>+  OEP_HASH_CHECK = 32
> };
> 
> /* Enum and arrays used for tree allocation stats.
>--- gcc/tree.c.jj  2016-05-03 10:00:25.0 +0200
>+++ gcc/tree.c 2016-05-04 12:20:00.354569734 +0200
>@@ -7915,9 +7915,12 @@ add_expr (const_tree t, inchash::hash 
>  && integer_zerop (TREE_OPERAND (t, 1)))
>   inchash::add_expr (TREE_OPERAND (TREE_OPERAND (t, 0), 0),
>  hstate, flags);
>+  /* Don't ICE on FE specific trees, or their arguments etc.
>+   during operand_equal_p hash verification.  */
>+  else if (!IS_EXPR_CODE_CLASS (tclass))
>+  gcc_assert (flags & OEP_HASH_CHECK);
>   else
>   {
>-gcc_assert (IS_EXPR_CODE_CLASS (tclass));
> unsigned int sflags = flags;
> 
> hstate.add_object (code);
>@@ -7966,6 +7969,13 @@ add_expr (const_tree t, inchash::hash 
>   hstate.add_int (CALL_EXPR_IFN (t));
> break;
> 
>+  case TARGET_EXPR:
>+/* For TARGET_EXPR, just hash on the TARGET_EXPR_SLOT.
>+   Usually different TARGET_EXPRs just should use
>+   different temporaries in their slots.  */
>+inchash::add_expr (TARGET_EXPR_SLOT (t), hstate, flags);
>+return;
>+
>   default:
> break;
>   }
>--- gcc/fold-const.c.jj2016-05-02 18:16:00.0 +0200
>+++ gcc/fold-const.c   2016-05-04 12:14:33.188000923 +0200
>@@ -2758,8 +2758,8 @@ operand_equal_p (const_tree arg0, const_
> if (arg0 != arg1)
>   {
> inchash::hash hstate0 (0), hstate1 (0);
>-inchash::add_expr (arg0, hstate0, flags);
>-inchash::add_expr (arg1, hstate1, flags);
>+inchash::add_expr (arg0, hstate0, flags | OEP_HASH_CHECK);
>+inchash::add_expr (arg1, hstate1, flags | OEP_HASH_CHECK);
> hashval_t h0 = hstate0.end ();
> hashval_t h1 = hstate1.end ();
> gcc_assert (h0 == h1);
>--- gcc/testsuite/g++.dg/opt/pr70906.C.jj  2016-05-04 11:33:32.799387826
>+0200
>+++ gcc/testsuite/g++.dg/opt/pr70906.C 2016-05-04 11:33:02.0
>+0200
>@@ -0,0 +1,69 @@
>+// PR c++/70906
>+// { dg-do compile { target c++11 } }
>+// { dg-options "-Wall" }
>+
>+template  struct B;
>+template  struct F { typedef U *t; };
>+template  struct D {};
>+template  struct L {
>+  typedef VP np;
>+  typedef typename F::t cnp;
>+};
>+struct P { typedef L nt; };
>+template  struct I { typedef typename N::template A t;
>};
>+template  struct Q { typedef typename I::t t; };
>+template  struct G;
>+template 
>+struct mh {
>+  template  struct A { typedef G pvt; };
>+};
>+template

Re: [C++ Patch] PR 68722

2016-05-04 Thread Paolo Carlini


Hi again,

On 12/04/2016 15:53, Jason Merrill wrote:

Let's go with the first patch.
What about this one? Today I returned to it, and technically it still 
represents a regression in gcc-4_9-branch and gcc-5-branch, but 
personally I'd rather not backport the fix: in release-mode we just emit 
an additional "confused by earlier errors, bailing out" at the end of a 
rather long series of diagnostic messages, the first ones meaningful, 
the last redundant anyway and the snippet triggering it seems 
particularly broken to me...


Thanks,
Paolo.

[PATCH] Handle also switch for -Wdangling-else

2016-05-04 Thread Jakub Jelinek

Hi!

This patch let us warn about danling else even if there is a switch
without {}s around the body.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* c-parser.c (c_parser_switch_statement): Add IF_P argument,
parse it through to c_parser_c99_block_statement.
(c_parser_statement_after_labels): Adjust c_parser_switch_statement
caller.

* parser.c (cp_parser_selection_statement): For RID_SWITCH,
pass if_p instead of NULL to cp_parser_implicitly_scoped_statement.

* c-c++-common/Wdangling-else-4.c: New test.

--- gcc/c/c-parser.c.jj 2016-05-03 00:12:10.0 +0200
+++ gcc/c/c-parser.c2016-05-04 18:09:27.384953312 +0200
@@ -1305,7 +1305,7 @@ static void c_parser_statement (c_parser
 static void c_parser_statement_after_labels (c_parser *, bool *,
 vec * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
-static void c_parser_switch_statement (c_parser *);
+static void c_parser_switch_statement (c_parser *, bool *);
 static void c_parser_while_statement (c_parser *, bool, bool *);
 static void c_parser_do_statement (c_parser *, bool);
 static void c_parser_for_statement (c_parser *, bool, bool *);
@@ -5138,7 +5138,7 @@ c_parser_statement_after_labels (c_parse
  c_parser_if_statement (parser, if_p, chain);
  break;
case RID_SWITCH:
- c_parser_switch_statement (parser);
+ c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
  c_parser_while_statement (parser, false, if_p);
@@ -5570,7 +5570,7 @@ c_parser_if_statement (c_parser *parser,
 */
 
 static void
-c_parser_switch_statement (c_parser *parser)
+c_parser_switch_statement (c_parser *parser, bool *if_p)
 {
   struct c_expr ce;
   tree block, expr, body, save_break;
@@ -5605,7 +5605,7 @@ c_parser_switch_statement (c_parser *par
   c_start_case (switch_loc, switch_cond_loc, expr, explicit_cast_p);
   save_break = c_break_label;
   c_break_label = NULL_TREE;
-  body = c_parser_c99_block_statement (parser, NULL/*if??*/);
+  body = c_parser_c99_block_statement (parser, if_p);
   c_finish_case (body, ce.original_type);
   if (c_break_label)
 {
--- gcc/cp/parser.c.jj  2016-05-03 00:12:11.0 +0200
+++ gcc/cp/parser.c 2016-05-04 18:15:30.614109144 +0200
@@ -10978,7 +10978,7 @@ cp_parser_selection_statement (cp_parser
in_statement = parser->in_statement;
parser->in_switch_statement_p = true;
parser->in_statement |= IN_SWITCH_STMT;
-   cp_parser_implicitly_scoped_statement (parser, NULL,
+   cp_parser_implicitly_scoped_statement (parser, if_p,
   guard_tinfo);
parser->in_switch_statement_p = in_switch_statement_p;
parser->in_statement = in_statement;
--- gcc/testsuite/c-c++-common/Wdangling-else-4.c.jj2016-05-04 
18:40:17.628299460 +0200
+++ gcc/testsuite/c-c++-common/Wdangling-else-4.c   2016-05-04 
18:36:19.0 +0200
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Wdangling-else" } */
+
+void bar (int);
+
+void
+foo (int a, int b, int c)
+{
+  if (a)   /* { dg-warning "suggest explicit braces to avoid ambiguous 
.else." } */
+switch (b)
+  case 0:
+   if (c)
+ bar (1);
+  else
+bar (2);
+}
+
+void
+baz (int a, int b, int c)
+{
+  if (a)
+switch (b)
+  {
+  case 0:
+   if (c)
+ bar (1);
+  }
+  else
+bar (2);
+}
+

Jakub

[PATCH] Improve min/max

2016-05-04 Thread Jakub Jelinek

Hi!

AVX512BW has EVEX insns for these.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (*v8hi3, *v16qi3): Add
avx512bw alternative.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -10442,19 +10459,20 @@ (define_insn "*sse4_1_3v8hi3"
-  [(set (match_operand:V8HI 0 "register_operand" "=x,x")
+  [(set (match_operand:V8HI 0 "register_operand" "=x,x,v")
(smaxmin:V8HI
- (match_operand:V8HI 1 "vector_operand" "%0,x")
- (match_operand:V8HI 2 "vector_operand" "xBm,xm")))]
+ (match_operand:V8HI 1 "vector_operand" "%0,x,v")
+ (match_operand:V8HI 2 "vector_operand" "xBm,xm,vm")))]
   "TARGET_SSE2 && ix86_binary_operator_ok (, V8HImode, operands)"
   "@
pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}
vpw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix_extra" "*,1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix_extra" "*,1,1")
+   (set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "TI")])
 
 (define_expand "3"
@@ -10526,19 +10544,20 @@ (define_insn "*sse4_1_3v16qi3"
-  [(set (match_operand:V16QI 0 "register_operand" "=x,x")
+  [(set (match_operand:V16QI 0 "register_operand" "=x,x,v")
(umaxmin:V16QI
- (match_operand:V16QI 1 "vector_operand" "%0,x")
- (match_operand:V16QI 2 "vector_operand" "xBm,xm")))]
+ (match_operand:V16QI 1 "vector_operand" "%0,x,v")
+ (match_operand:V16QI 2 "vector_operand" "xBm,xm,vm")))]
   "TARGET_SSE2 && ix86_binary_operator_ok (, V16QImode, operands)"
   "@
pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}
vpb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseiadd")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix_extra" "*,1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix_extra" "*,1,1")
+   (set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "TI")])
 
 ;

Jakub

[PATCH] Improve whole vector right shift

2016-05-04 Thread Jakub Jelinek

Hi!

In this case the situation is more complicated, because for
V*HI we need avx512bw and avx512vl, while for V*SI only avx512vl
is needed and both are in the same pattern.  But we already have
a pattern that does the right thing right after the "ashr3"
- but as it is after it, the "ashr3" will win during recog
and will limit RA decisions.

The testcase shows that moving the pattern improves it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (ashr3): Move
before the ashr3 pattern.

* gcc.target/i386/avx512bw-vpsraw-3.c: New test.
* gcc.target/i386/avx512vl-vpsrad-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-04 16:54:31.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 16:55:31.155848054 +0200
@@ -10088,6 +10088,20 @@ (define_expand "usadv32qi"
   DONE;
 })
 
+(define_insn "ashr3"
+  [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
+   (ashiftrt:VI24_AVX512BW_1
+ (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
+ (match_operand:SI 2 "nonmemory_operand" "v,N")))]
+  "TARGET_AVX512VL"
+  "vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sseishft")
+   (set (attr "length_immediate")
+ (if_then_else (match_operand 2 "const_int_operand")
+   (const_string "1")
+   (const_string "0")))
+   (set_attr "mode" "")])
+
 (define_insn "ashr3"
   [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x")
(ashiftrt:VI24_AVX2
@@ -10107,20 +10121,6 @@ (define_insn "ashr3"
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_insn "ashr3"
-  [(set (match_operand:VI24_AVX512BW_1 0 "register_operand" "=v,v")
-   (ashiftrt:VI24_AVX512BW_1
- (match_operand:VI24_AVX512BW_1 1 "nonimmediate_operand" "v,vm")
- (match_operand:SI 2 "nonmemory_operand" "v,N")))]
-  "TARGET_AVX512VL"
-  "vpsra\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sseishft")
-   (set (attr "length_immediate")
- (if_then_else (match_operand 2 "const_int_operand")
-   (const_string "1")
-   (const_string "0")))
-   (set_attr "mode" "")])
-
 (define_insn "ashrv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=v,v")
(ashiftrt:V2DI
--- gcc/testsuite/gcc.target/i386/avx512bw-vpsraw-3.c.jj2016-05-04 
17:01:52.332810541 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpsraw-3.c   2016-05-04 
17:02:56.104966537 +0200
@@ -0,0 +1,44 @@
+/* { dg-do assemble { target { avx512bw && { avx512vl && { ! ia32 } } } } } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+#include 
+
+void
+f1 (__m128i x, int y)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_srai_epi16 (a, y);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f2 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_srai_epi16 (a, 16);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f3 (__m256i x, int y)
+{
+  register __m256i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm256_srai_epi16 (a, y);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f4 (__m256i x)
+{
+  register __m256i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm256_srai_epi16 (a, 16);
+  asm volatile ("" : "+v" (a));
+}
--- gcc/testsuite/gcc.target/i386/avx512vl-vpsrad-3.c.jj2016-05-04 
17:01:58.770725338 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vpsrad-3.c   2016-05-04 
17:00:16.0 +0200
@@ -0,0 +1,44 @@
+/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+#include 
+
+void
+f1 (__m128i x, int y)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_srai_epi32 (a, y);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f2 (__m128i x)
+{
+  register __m128i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm_srai_epi32 (a, 16);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f3 (__m256i x, int y)
+{
+  register __m256i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm256_srai_epi32 (a, y);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f4 (__m256i x)
+{
+  register __m256i a __asm ("xmm16");
+  a = x;
+  asm volatile ("" : "+v" (a));
+  a = _mm256_srai_epi32 (a, 16);
+  asm volatile ("" : "+v" (a));
+}

Jakub

[PATCH] Improve *pmaddwd

2016-05-04 Thread Jakub Jelinek

Hi!

As the testcase shows, we unnecessarily disallow xmm16+, even when
we can use them for -mavx512bw.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (*avx2_pmaddwd, *sse2_pmaddwd): Use
v instead of x in vex or maybe_vex alternatives, use
maybe_evex instead of vex in prefix.

* gcc.target/i386/avx512bw-vpmaddwd-3.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -9803,19 +9817,19 @@ (define_expand "avx2_pmaddwd"
   "ix86_fixup_binary_operands_no_copy (MULT, V16HImode, operands);")
 
 (define_insn "*avx2_pmaddwd"
-  [(set (match_operand:V8SI 0 "register_operand" "=x")
+  [(set (match_operand:V8SI 0 "register_operand" "=x,v")
(plus:V8SI
  (mult:V8SI
(sign_extend:V8SI
  (vec_select:V8HI
-   (match_operand:V16HI 1 "nonimmediate_operand" "%x")
+   (match_operand:V16HI 1 "nonimmediate_operand" "%x,v")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)
   (const_int 8) (const_int 10)
   (const_int 12) (const_int 14)])))
(sign_extend:V8SI
  (vec_select:V8HI
-   (match_operand:V16HI 2 "nonimmediate_operand" "xm")
+   (match_operand:V16HI 2 "nonimmediate_operand" "xm,vm")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)
   (const_int 8) (const_int 10)
@@ -9836,7 +9850,8 @@ (define_insn "*avx2_pmaddwd"
   "TARGET_AVX2 && ix86_binary_operator_ok (MULT, V16HImode, operands)"
   "vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseiadd")
-   (set_attr "prefix" "vex")
+   (set_attr "isa" "*,avx512bw")
+   (set_attr "prefix" "vex,evex")
(set_attr "mode" "OI")])
 
 (define_expand "sse2_pmaddwd"
@@ -9866,17 +9881,17 @@ (define_expand "sse2_pmaddwd"
   "ix86_fixup_binary_operands_no_copy (MULT, V8HImode, operands);")
 
 (define_insn "*sse2_pmaddwd"
-  [(set (match_operand:V4SI 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=x,x,v")
(plus:V4SI
  (mult:V4SI
(sign_extend:V4SI
  (vec_select:V4HI
-   (match_operand:V8HI 1 "vector_operand" "%0,x")
+   (match_operand:V8HI 1 "vector_operand" "%0,x,v")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)])))
(sign_extend:V4SI
  (vec_select:V4HI
-   (match_operand:V8HI 2 "vector_operand" "xBm,xm")
+   (match_operand:V8HI 2 "vector_operand" "xBm,xm,vm")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)]
  (mult:V4SI
@@ -9891,12 +9906,13 @@ (define_insn "*sse2_pmaddwd"
   "TARGET_SSE2 && ix86_binary_operator_ok (MULT, V8HImode, operands)"
   "@
pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}
vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseiadd")
(set_attr "atom_unit" "simul")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "TI")])
 
 (define_insn "avx512dq_mul3"
--- gcc/testsuite/gcc.target/i386/avx512bw-vpmaddwd-3.c.jj  2016-05-04 
16:37:21.196223424 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpmaddwd-3.c 2016-05-04 
16:37:51.867819502 +0200
@@ -0,0 +1,24 @@
+/* { dg-do assemble { target { avx512bw && { avx512vl && { ! ia32 } } } } } */
+/* { dg-options "-O2 -mavx512bw -mavx512vl" } */
+
+#include 
+
+void
+f1 (__m128i x, __m128i y)
+{
+  register __m128i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x; b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm_madd_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f2 (__m256i x, __m256i y)
+{
+  register __m256i a __asm ("xmm16"), b __asm ("xmm17");
+  a = x; b = y;
+  asm volatile ("" : "+v" (a), "+v" (b));
+  a = _mm256_madd_epi16 (a, b);
+  asm volatile ("" : "+v" (a));
+}

Jakub

[PATCH] Improve vec extraction

2016-05-04 Thread Jakub Jelinek

Hi!

While EVEX doesn't have vextracti128, we can use vextracti32x4;
unfortunately without avx512dq we need to use full zmm input operand,
but that shouldn't be a big deal when we hardcode 1 as immediate.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (*vec_extractv4sf_0, *sse4_1_extractps,
*vec_extractv4sf_mem, vec_extract_lo_v16hi, vec_extract_hi_v16hi,
vec_extract_lo_v32qi, vec_extract_hi_v32qi): Use v instead of x
in vex or maybe_vex alternatives, use maybe_evex instead of vex
in prefix.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -6613,9 +6613,9 @@ (define_expand "vec_set"
 })
 
 (define_insn_and_split "*vec_extractv4sf_0"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=x,m,f,r")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=v,m,f,r")
(vec_select:SF
- (match_operand:V4SF 1 "nonimmediate_operand" "xm,x,m,m")
+ (match_operand:V4SF 1 "nonimmediate_operand" "vm,v,m,m")
  (parallel [(const_int 0)])))]
   "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
@@ -6624,9 +6624,9 @@ (define_insn_and_split "*vec_extractv4sf
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,x,x")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
(vec_select:SF
- (match_operand:V4SF 1 "register_operand" "Yr,*x,0,x")
+ (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
  (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
@@ -6665,7 +6665,7 @@ (define_insn_and_split "*sse4_1_extractp
(set_attr "mode" "V4SF,V4SF,*,*")])
 
 (define_insn_and_split "*vec_extractv4sf_mem"
-  [(set (match_operand:SF 0 "register_operand" "=x,*r,f")
+  [(set (match_operand:SF 0 "register_operand" "=v,*r,f")
(vec_select:SF
  (match_operand:V4SF 1 "memory_operand" "o,o,o")
  (parallel [(match_operand 2 "const_0_to_3_operand" "n,n,n")])))]
@@ -7239,9 +7239,9 @@ (define_insn "vec_extract_hi_v32hi"
(set_attr "mode" "XI")])
 
 (define_insn_and_split "vec_extract_lo_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
+  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,m")
(vec_select:V8HI
- (match_operand:V16HI 1 "nonimmediate_operand" "xm,x")
+ (match_operand:V16HI 1 "nonimmediate_operand" "vm,v")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)
 (const_int 4) (const_int 5)
@@ -7253,20 +7253,27 @@ (define_insn_and_split "vec_extract_lo_v
   "operands[1] = gen_lowpart (V8HImode, operands[1]);")
 
 (define_insn "vec_extract_hi_v16hi"
-  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m")
+  [(set (match_operand:V8HI 0 "nonimmediate_operand" "=x,m,v,m,v,m")
(vec_select:V8HI
- (match_operand:V16HI 1 "register_operand" "x,x")
+ (match_operand:V16HI 1 "register_operand" "x,x,v,v,v,v")
  (parallel [(const_int 8) (const_int 9)
 (const_int 10) (const_int 11)
 (const_int 12) (const_int 13)
 (const_int 14) (const_int 15)])))]
   "TARGET_AVX"
-  "vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}"
+  "@
+   vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}
+   vextract%~128\t{$0x1, %1, %0|%0, %1, 0x1}
+   vextracti32x4\t{$0x1, %1, %0|%0, %1, 0x1}
+   vextracti32x4\t{$0x1, %1, %0|%0, %1, 0x1}
+   vextracti32x4\t{$0x1, %g1, %0|%0, %g1, 0x1}
+   vextracti32x4\t{$0x1, %g1, %0|%0, %g1, 0x1}"
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "memory" "none,store")
-   (set_attr "prefix" "vex")
+   (set_attr "isa" "*,*,avx512dq,avx512dq,avx512f,avx512f")
+   (set_attr "memory" "none,store,none,store,none,store")
+   (set_attr "prefix" "vex,vex,evex,evex,evex,evex")
(set_attr "mode" "OI")])
 
 (define_insn_and_split "vec_extract_lo_v64qi"
@@ -7325,9 +7332,9 @@ (define_insn "vec_extract_hi_v64qi"
(set_attr "mode" "XI")])
 
 (define_insn_and_split "vec_extract_lo_v32qi"
-  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=x,m")
+  [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,m")
(vec_select:V16QI
- (match_operand:V32QI 1 "nonimmediate_operand" "xm,x")
+ (match_operand:V32QI 1 "nonimmediate_operand" "vm,v")
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)
 (const_int 4) (const_int 5)
@@ -7343,9 +7350,9 @@ (define_insn_and_split "vec_extract_lo_v
   "operands[1] = gen_lowpart (V16QImode, operands[1]);")
 
 (define_insn "vec_extract_hi_v32qi"
-  [(set (match_operand:V16QI 0

[PATCH] Improve vec_concatv?sf*

2016-05-04 Thread Jakub Jelinek

Hi!

Another pair of define_insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (*vec_concatv2sf_sse4_1, *vec_concatv4sf): Use
v instead of x in vex or maybe_vex alternatives, use
maybe_evex instead of vex in prefix.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -6415,12 +6415,12 @@ (define_insn "avx512f_vec_dup_1"
 ;; unpcklps with register source since it is shorter.
 (define_insn "*vec_concatv2sf_sse4_1"
   [(set (match_operand:V2SF 0 "register_operand"
- "=Yr,*x,x,Yr,*x,x,x,*y ,*y")
+ "=Yr,*x,v,Yr,*x,v,v,*y ,*y")
(vec_concat:V2SF
  (match_operand:SF 1 "nonimmediate_operand"
- "  0, 0,x, 0,0, x,m, 0 , m")
+ "  0, 0,v, 0,0, v,m, 0 , m")
  (match_operand:SF 2 "vector_move_operand"
- " Yr,*x,x, m,m, m,C,*ym, C")))]
+ " Yr,*x,v, m,m, m,C,*ym, C")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
unpcklps\t{%2, %0|%0, %2}
@@ -6437,7 +6437,7 @@ (define_insn "*vec_concatv2sf_sse4_1"
(set_attr "prefix_data16" "*,*,*,1,1,*,*,*,*")
(set_attr "prefix_extra" "*,*,*,1,1,1,*,*,*")
(set_attr "length_immediate" "*,*,*,1,1,1,*,*,*")
-   (set_attr "prefix" "orig,orig,vex,orig,orig,vex,maybe_vex,orig,orig")
+   (set_attr "prefix" 
"orig,orig,maybe_evex,orig,orig,maybe_evex,maybe_vex,orig,orig")
(set_attr "mode" "V4SF,V4SF,V4SF,V4SF,V4SF,V4SF,SF,DI,DI")])
 
 ;; ??? In theory we can match memory for the MMX alternative, but allowing
@@ -6458,10 +6458,10 @@ (define_insn "*vec_concatv2sf_sse"
(set_attr "mode" "V4SF,SF,DI,DI")])
 
 (define_insn "*vec_concatv4sf"
-  [(set (match_operand:V4SF 0 "register_operand"   "=x,x,x,x")
+  [(set (match_operand:V4SF 0 "register_operand"   "=x,v,x,v")
(vec_concat:V4SF
- (match_operand:V2SF 1 "register_operand" " 0,x,0,x")
- (match_operand:V2SF 2 "nonimmediate_operand" " x,x,m,m")))]
+ (match_operand:V2SF 1 "register_operand" " 0,v,0,v")
+ (match_operand:V2SF 2 "nonimmediate_operand" " x,v,m,m")))]
   "TARGET_SSE"
   "@
movlhps\t{%2, %0|%0, %2}
@@ -6470,7 +6470,7 @@ (define_insn "*vec_concatv4sf"
vmovhps\t{%2, %1, %0|%0, %1, %q2}"
   [(set_attr "isa" "noavx,avx,noavx,avx")
(set_attr "type" "ssemov")
-   (set_attr "prefix" "orig,vex,orig,vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex")
(set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
 
 (define_expand "vec_init"

Jakub

[PATCH] Improve other 13 define_insns

2016-05-04 Thread Jakub Jelinek

Hi!

This patch tweaks more define_insns at once, again all the insns
should be already in AVX512F or AVX512VL.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (sse_shufps_, sse_storehps, sse_loadhps,
sse_storelps, sse_movss, avx2_vec_dup, avx2_vec_dupv8sf_1,
sse2_shufpd_, sse2_storehpd, sse2_storelpd, sse2_loadhpd,
sse2_loadlpd, sse2_movsd): Use v instead of x in vex or maybe_vex
alternatives, use maybe_evex instead of vex in prefix.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -6219,11 +6219,11 @@ (define_insn "sse_shufps_v4sf_mask"
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_shufps_"
-  [(set (match_operand:VI4F_128 0 "register_operand" "=x,x")
+  [(set (match_operand:VI4F_128 0 "register_operand" "=x,v")
(vec_select:VI4F_128
  (vec_concat:
-   (match_operand:VI4F_128 1 "register_operand" "0,x")
-   (match_operand:VI4F_128 2 "vector_operand" "xBm,xm"))
+   (match_operand:VI4F_128 1 "register_operand" "0,v")
+   (match_operand:VI4F_128 2 "vector_operand" "xBm,vm"))
  (parallel [(match_operand 3 "const_0_to_3_operand")
 (match_operand 4 "const_0_to_3_operand")
 (match_operand 5 "const_4_to_7_operand")
@@ -6250,13 +6250,13 @@ (define_insn "sse_shufps_"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseshuf")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_storehps"
-  [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,x,x")
+  [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,v,v")
(vec_select:V2SF
- (match_operand:V4SF 1 "nonimmediate_operand" "x,x,o")
+ (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o")
  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_SSE"
   "@
@@ -6288,12 +6288,12 @@ (define_expand "sse_loadhps_exp"
 })
 
 (define_insn "sse_loadhps"
-  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,o")
+  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,o")
(vec_concat:V4SF
  (vec_select:V2SF
-   (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
+   (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
(parallel [(const_int 0) (const_int 1)]))
- (match_operand:V2SF 2 "nonimmediate_operand"   " m,m,x,x,x")))]
+ (match_operand:V2SF 2 "nonimmediate_operand"   " m,m,x,v,v")))]
   "TARGET_SSE"
   "@
movhps\t{%2, %0|%0, %q2}
@@ -6303,13 +6303,13 @@ (define_insn "sse_loadhps"
%vmovlps\t{%2, %H0|%H0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
(set_attr "type" "ssemov")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V2SF,V2SF,V4SF,V4SF,V2SF")])
 
 (define_insn "sse_storelps"
-  [(set (match_operand:V2SF 0 "nonimmediate_operand"   "=m,x,x")
+  [(set (match_operand:V2SF 0 "nonimmediate_operand"   "=m,v,v")
(vec_select:V2SF
- (match_operand:V4SF 1 "nonimmediate_operand" " x,x,m")
+ (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m")
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_SSE"
   "@
@@ -6341,11 +6341,11 @@ (define_expand "sse_loadlps_exp"
 })
 
 (define_insn "sse_loadlps"
-  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,m")
+  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,m")
(vec_concat:V4SF
- (match_operand:V2SF 2 "nonimmediate_operand"   " 0,x,m,m,x")
+ (match_operand:V2SF 2 "nonimmediate_operand"   " 0,v,m,m,v")
  (vec_select:V2SF
-   (match_operand:V4SF 1 "nonimmediate_operand" " x,x,0,x,0")
+   (match_operand:V4SF 1 "nonimmediate_operand" " x,v,0,v,0")
(parallel [(const_int 2) (const_int 3)]]
   "TARGET_SSE"
   "@
@@ -6357,14 +6357,14 @@ (define_insn "sse_loadlps"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
(set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
(set_attr "length_immediate" "1,1,*,*,*")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
 (define_insn "sse_movss"
-  [(set (match_operand:V4SF 0 "register_operand"   "=x,x")
+  [(set (match_operand:V4SF 0 "register_operand"   "=x,v")
(vec_merge:V4SF
- (match_operand:V4SF 2 "register_operand" " x,x")
- (match_operand:V4SF 1 "register_operand" " 0,x")
+ (match_operand:V4SF 2 "register_operand" " x,v")
+ (match_operand:V4SF 1 "register_operand" " 0,v")
  (const_int 1)))]
   "TARGET_SSE"
   "@
@@

[PATCH] Improve vec_interleave*

2016-05-04 Thread Jakub Jelinek

Hi!

Another 3 define_insns that can handle xmm16+ operands.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (vec_interleave_lowv4sf,
*vec_interleave_highv2df, *vec_interleave_lowv2df): Use
v instead of x in vex or maybe_vex alternatives, use
maybe_evex instead of vex in prefix.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -5987,11 +5987,11 @@ (define_expand "vec_interleave_lowv8sf"
 })
 
 (define_insn "vec_interleave_lowv4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=x,v")
(vec_select:V4SF
  (vec_concat:V8SF
-   (match_operand:V4SF 1 "register_operand" "0,x")
-   (match_operand:V4SF 2 "vector_operand" "xBm,xm"))
+   (match_operand:V4SF 1 "register_operand" "0,v")
+   (match_operand:V4SF 2 "vector_operand" "xBm,vm"))
  (parallel [(const_int 0) (const_int 4)
 (const_int 1) (const_int 5)])))]
   "TARGET_SSE"
@@ -6000,7 +6000,7 @@ (define_insn "vec_interleave_lowv4sf"
vunpcklps\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sselog")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "V4SF")])
 
 ;; These are modeled with the same vec_concat as the others so that we
@@ -7480,11 +7494,11 @@ (define_expand "vec_interleave_highv2df"
 })
 
 (define_insn "*vec_interleave_highv2df"
-  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,x,x,x,x,m")
+  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,m")
(vec_select:V2DF
  (vec_concat:V4DF
-   (match_operand:V2DF 1 "nonimmediate_operand" " 0,x,o,o,o,x")
-   (match_operand:V2DF 2 "nonimmediate_operand" " x,x,1,0,x,0"))
+   (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,o,o,o,v")
+   (match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,0,v,0"))
  (parallel [(const_int 1)
 (const_int 3)])))]
   "TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 1)"
@@ -7498,7 +7512,7 @@ (define_insn "*vec_interleave_highv2df"
   [(set_attr "isa" "noavx,avx,sse3,noavx,avx,*")
(set_attr "type" "sselog,sselog,sselog,ssemov,ssemov,ssemov")
(set_attr "prefix_data16" "*,*,*,1,*,1")
-   (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
 
 (define_expand "avx512f_movddup512"
@@ -7639,11 +7653,11 @@ (define_expand "vec_interleave_lowv2df"
 })
 
 (define_insn "*vec_interleave_lowv2df"
-  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,x,x,x,x,o")
+  [(set (match_operand:V2DF 0 "nonimmediate_operand" "=x,v,v,x,v,o")
(vec_select:V2DF
  (vec_concat:V4DF
-   (match_operand:V2DF 1 "nonimmediate_operand" " 0,x,m,0,x,0")
-   (match_operand:V2DF 2 "nonimmediate_operand" " x,x,1,m,m,x"))
+   (match_operand:V2DF 1 "nonimmediate_operand" " 0,v,m,0,v,0")
+   (match_operand:V2DF 2 "nonimmediate_operand" " x,v,1,m,m,v"))
  (parallel [(const_int 0)
 (const_int 2)])))]
   "TARGET_SSE2 && ix86_vec_interleave_v2df_operator_ok (operands, 0)"
@@ -7657,7 +7671,7 @@ (define_insn "*vec_interleave_lowv2df"
   [(set_attr "isa" "noavx,avx,sse3,noavx,avx,*")
(set_attr "type" "sselog,sselog,sselog,ssemov,ssemov,ssemov")
(set_attr "prefix_data16" "*,*,*,1,*,1")
-   (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,maybe_vex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")])
 
 (define_split

Jakub

Re: [PATCH] Clean up vec_interleave* expanders

2016-05-04 Thread Uros Bizjak

On Wed, May 4, 2016 at 9:40 PM, Jakub Jelinek  wrote:
> Hi!
>
> When looking for constraints that only have x's and not v's, these
> useless constraints caught my search too.  In define_expand, constraints
> aren't really needed, they are needed only on define_insn* etc.
>
> So, I'd like to kill these.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-04  Jakub Jelinek  
>
> * config/i386/sse.md (vec_interleave_highv8sf,
> vec_interleave_lowv8sf, vec_interleave_highv4df,
> vec_interleave_lowv4df): Remove constraints from expanders.

OK as a trivial patch.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
> @@ -5851,8 +5851,8 @@ (define_expand "vec_interleave_highv8sf"
>[(set (match_dup 3)
> (vec_select:V8SF
>   (vec_concat:V16SF
> -   (match_operand:V8SF 1 "register_operand" "x")
> -   (match_operand:V8SF 2 "nonimmediate_operand" "xm"))
> +   (match_operand:V8SF 1 "register_operand")
> +   (match_operand:V8SF 2 "nonimmediate_operand"))
>   (parallel [(const_int 0) (const_int 8)
>  (const_int 1) (const_int 9)
>  (const_int 4) (const_int 12)
> @@ -5956,8 +5956,8 @@ (define_expand "vec_interleave_lowv8sf"
>[(set (match_dup 3)
> (vec_select:V8SF
>   (vec_concat:V16SF
> -   (match_operand:V8SF 1 "register_operand" "x")
> -   (match_operand:V8SF 2 "nonimmediate_operand" "xm"))
> +   (match_operand:V8SF 1 "register_operand")
> +   (match_operand:V8SF 2 "nonimmediate_operand"))
>   (parallel [(const_int 0) (const_int 8)
>  (const_int 1) (const_int 9)
>  (const_int 4) (const_int 12)
> @@ -7424,8 +7438,8 @@ (define_expand "vec_interleave_highv4df"
>[(set (match_dup 3)
> (vec_select:V4DF
>   (vec_concat:V8DF
> -   (match_operand:V4DF 1 "register_operand" "x")
> -   (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
> +   (match_operand:V4DF 1 "register_operand")
> +   (match_operand:V4DF 2 "nonimmediate_operand"))
>   (parallel [(const_int 0) (const_int 4)
>  (const_int 2) (const_int 6)])))
> (set (match_dup 4)
> @@ -7584,8 +7598,8 @@ (define_expand "vec_interleave_lowv4df"
>[(set (match_dup 3)
> (vec_select:V4DF
>   (vec_concat:V8DF
> -   (match_operand:V4DF 1 "register_operand" "x")
> -   (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
> +   (match_operand:V4DF 1 "register_operand")
> +   (match_operand:V4DF 2 "nonimmediate_operand"))
>   (parallel [(const_int 0) (const_int 4)
>  (const_int 2) (const_int 6)])))
> (set (match_dup 4)
>
> Jakub

[PATCH] Clean up vec_interleave* expanders

2016-05-04 Thread Jakub Jelinek

Hi!

When looking for constraints that only have x's and not v's, these
useless constraints caught my search too.  In define_expand, constraints
aren't really needed, they are needed only on define_insn* etc.

So, I'd like to kill these.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (vec_interleave_highv8sf,
vec_interleave_lowv8sf, vec_interleave_highv4df,
vec_interleave_lowv4df): Remove constraints from expanders.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -5851,8 +5851,8 @@ (define_expand "vec_interleave_highv8sf"
   [(set (match_dup 3)
(vec_select:V8SF
  (vec_concat:V16SF
-   (match_operand:V8SF 1 "register_operand" "x")
-   (match_operand:V8SF 2 "nonimmediate_operand" "xm"))
+   (match_operand:V8SF 1 "register_operand")
+   (match_operand:V8SF 2 "nonimmediate_operand"))
  (parallel [(const_int 0) (const_int 8)
 (const_int 1) (const_int 9)
 (const_int 4) (const_int 12)
@@ -5956,8 +5956,8 @@ (define_expand "vec_interleave_lowv8sf"
   [(set (match_dup 3)
(vec_select:V8SF
  (vec_concat:V16SF
-   (match_operand:V8SF 1 "register_operand" "x")
-   (match_operand:V8SF 2 "nonimmediate_operand" "xm"))
+   (match_operand:V8SF 1 "register_operand")
+   (match_operand:V8SF 2 "nonimmediate_operand"))
  (parallel [(const_int 0) (const_int 8)
 (const_int 1) (const_int 9)
 (const_int 4) (const_int 12)
@@ -7424,8 +7438,8 @@ (define_expand "vec_interleave_highv4df"
   [(set (match_dup 3)
(vec_select:V4DF
  (vec_concat:V8DF
-   (match_operand:V4DF 1 "register_operand" "x")
-   (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
+   (match_operand:V4DF 1 "register_operand")
+   (match_operand:V4DF 2 "nonimmediate_operand"))
  (parallel [(const_int 0) (const_int 4)
 (const_int 2) (const_int 6)])))
(set (match_dup 4)
@@ -7584,8 +7598,8 @@ (define_expand "vec_interleave_lowv4df"
   [(set (match_dup 3)
(vec_select:V4DF
  (vec_concat:V8DF
-   (match_operand:V4DF 1 "register_operand" "x")
-   (match_operand:V4DF 2 "nonimmediate_operand" "xm"))
+   (match_operand:V4DF 1 "register_operand")
+   (match_operand:V4DF 2 "nonimmediate_operand"))
  (parallel [(const_int 0) (const_int 4)
 (const_int 2) (const_int 6)])))
(set (match_dup 4)

Jakub

[PATCh] Improve sse_mov{hl,lh}ps

2016-05-04 Thread Jakub Jelinek

Hi!

Another pair of define_insns where all the VEX insns have EVEX variant
in AVX512VL.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (sse_movhlps, sse_movlhps): Use
v instead of x in vex or maybe_vex alternatives, use
maybe_evex instead of vex in prefix.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -5744,11 +5744,11 @@ (define_expand "sse_movhlps_exp"
 })
 
 (define_insn "sse_movhlps"
-  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,m")
+  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,m")
(vec_select:V4SF
  (vec_concat:V8SF
-   (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
-   (match_operand:V4SF 2 "nonimmediate_operand" " x,x,o,o,x"))
+   (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
+   (match_operand:V4SF 2 "nonimmediate_operand" " x,v,o,o,v"))
  (parallel [(const_int 6)
 (const_int 7)
 (const_int 2)
@@ -5762,7 +5762,7 @@ (define_insn "sse_movhlps"
%vmovhps\t{%2, %0|%q0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
(set_attr "type" "ssemov")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
 (define_expand "sse_movlhps_exp"
@@ -5789,11 +5789,11 @@ (define_expand "sse_movlhps_exp"
 })
 
 (define_insn "sse_movlhps"
-  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,x,x,x,o")
+  [(set (match_operand:V4SF 0 "nonimmediate_operand" "=x,v,x,v,o")
(vec_select:V4SF
  (vec_concat:V8SF
-   (match_operand:V4SF 1 "nonimmediate_operand" " 0,x,0,x,0")
-   (match_operand:V4SF 2 "nonimmediate_operand" " x,x,m,m,x"))
+   (match_operand:V4SF 1 "nonimmediate_operand" " 0,v,0,v,0")
+   (match_operand:V4SF 2 "nonimmediate_operand" " x,v,m,v,v"))
  (parallel [(const_int 0)
 (const_int 1)
 (const_int 4)
@@ -5807,7 +5807,7 @@ (define_insn "sse_movlhps"
%vmovlps\t{%2, %H0|%H0, %2}"
   [(set_attr "isa" "noavx,avx,noavx,avx,*")
(set_attr "type" "ssemov")
-   (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
+   (set_attr "prefix" "orig,maybe_evex,orig,maybe_evex,maybe_vex")
(set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
 
 (define_insn "avx512f_unpckhps512"

Jakub

[PATCH] Improve *avx_cvtp?2??256_2

2016-05-04 Thread Jakub Jelinek

Hi!

Not sure how to easily construct a testcase for this (these insns are
usually used for vectorization, and then it really depends on register
pressure).
But in any case, looking at documentation it seems all the used insns are
available (generally even for further patches, what I'm looking for is
whether the insns are available already in AVX512F, or, if all the operands
are 128-bit or 256-bit vectors, in AVX512VL, or if they need further ISA
extensions; HARD_REGNO_MODE_OK should guarantee that the 128-bit and 256-bit
vectors would not be assigned to xmm16+ unless -mavx512vl).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (*avx_cvtpd2dq256_2, *avx_cvtps2pd256_2): Use
v constraint instead of x.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -4735,9 +4735,9 @@ (define_expand "avx_cvtpd2dq256_2"
   "operands[2] = CONST0_RTX (V4SImode);")
 
 (define_insn "*avx_cvtpd2dq256_2"
-  [(set (match_operand:V8SI 0 "register_operand" "=x")
+  [(set (match_operand:V8SI 0 "register_operand" "=v")
(vec_concat:V8SI
- (unspec:V4SI [(match_operand:V4DF 1 "nonimmediate_operand" "xm")]
+ (unspec:V4SI [(match_operand:V4DF 1 "nonimmediate_operand" "vm")]
   UNSPEC_FIX_NOTRUNC)
  (match_operand:V4SI 2 "const0_operand")))]
   "TARGET_AVX"
@@ -5050,10 +5050,10 @@ (define_insn "_cvtps2p
(set_attr "mode" "")])
 
 (define_insn "*avx_cvtps2pd256_2"
-  [(set (match_operand:V4DF 0 "register_operand" "=x")
+  [(set (match_operand:V4DF 0 "register_operand" "=v")
(float_extend:V4DF
  (vec_select:V4SF
-   (match_operand:V8SF 1 "nonimmediate_operand" "xm")
+   (match_operand:V8SF 1 "nonimmediate_operand" "vm")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)]]
   "TARGET_AVX"

Jakub

[PATCH] Improve _fmadd__mask3

2016-05-04 Thread Jakub Jelinek

Hi!

As the testcase can show, we should be using v constraint and generate
better code that way.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

* config/i386/sse.md (_fmadd__mask3): Use
v constraint instead of x.

* gcc.target/i386/avx512f-vfmadd-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-04 14:36:08.0 +0200
+++ gcc/config/i386/sse.md  2016-05-04 15:16:44.180894303 +0200
@@ -3327,10 +3327,10 @@ (define_insn "_fmadd__mask
(set_attr "mode" "")])
 
 (define_insn "_fmadd__mask3"
-  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=x")
+  [(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(vec_merge:VF_AVX512VL
  (fma:VF_AVX512VL
-   (match_operand:VF_AVX512VL 1 "register_operand" "x")
+   (match_operand:VF_AVX512VL 1 "register_operand" "v")
(match_operand:VF_AVX512VL 2 "nonimmediate_operand" 
"")
(match_operand:VF_AVX512VL 3 "register_operand" "0"))
  (match_dup 3)
--- gcc/testsuite/gcc.target/i386/avx512f-vfmadd-1.c.jj 2016-05-04 
15:35:54.919506742 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-vfmadd-1.c2016-05-04 
15:36:08.648326113 +0200
@@ -0,0 +1,24 @@
+/* { dg-do assemble { target { avx512f && { ! ia32 } } } } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include 
+
+void
+f1 (__m512d x, __m512d y, __m512d z, __mmask8 m)
+{
+  register __m512d a __asm ("xmm16"), b __asm ("xmm17"), c __asm ("xmm18");
+  a = x; b = y; c = z;
+  asm volatile ("" : "+v" (a), "+v" (b), "+v" (c));
+  a = _mm512_mask3_fmadd_round_pd (c, b, a, m, _MM_FROUND_TO_NEG_INF | 
_MM_FROUND_NO_EXC);
+  asm volatile ("" : "+v" (a));
+}
+
+void
+f2 (__m512 x, __m512 y, __m512 z, __mmask8 m)
+{
+  register __m512 a __asm ("xmm16"), b __asm ("xmm17"), c __asm ("xmm18");
+  a = x; b = y; c = z;
+  asm volatile ("" : "+v" (a), "+v" (b), "+v" (c));
+  a = _mm512_mask3_fmadd_round_ps (c, b, a, m, _MM_FROUND_TO_NEG_INF | 
_MM_FROUND_NO_EXC);
+  asm volatile ("" : "+v" (a));
+}

Jakub

[PATCH] Fix operand_equal_p hash checking (PR c++/70906, PR c++/70933)

2016-05-04 Thread Jakub Jelinek

Hi!

These 2 PRs were DUPed, yet they are actually different, but somewhat
related.
One of the ICEs is due to the OEP_ADDRESS_OF consistency checks that
both operand_equal_p and inchash::add_expr have (that want to verify
that we don't e.g. have ADDR_EXPR of ADDR_EXPR).
operand_equal_p never returns true for TARGET_EXPR though, unless there
is pointer equality, but if we need to hash e.g. ADDR_EXPR of
TARGET_EXPR with ADDR_EXPR inside of TARGET_EXPR_INITIAL, we currently
ICE.  We could process TARGET_EXPR_{INITIAL,CLEANUP} with
OEP_ADDRESS_OF masked off, but I believe different TARGET_EXPRs should
use different TARGET_EXPR_SLOT variables and thus it should be enough
to hash just the TARGET_EXPR_SLOT and ignore the other arguments.

The second issue is that in the FEs, we can end up calling operand_equal_p
and e.g. for not really equal, but similar (e.g. useless NOP_EXPR of SAVE_EXPR
and the SAVE_EXPR itself) it can be on trees that contain various FE
specific trees, including constants (like PTRMEM_CST), and others.

The patch arranges to just not ICE in that case if called from the
operand_equal_p checking, which has the advantage that we will still
disallow it when people call inchash::add_expr otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-04  Jakub Jelinek  

PR c++/70906
PR c++/70933
* tree-core.h (enum operand_equal_flag): Add OEP_HASH_CHECK.
* tree.c (inchash::add_expr): If !IS_EXPR_CODE_CLASS (tclass),
assert flags & OEP_HASH_CHECK, instead of asserting it
never happens.  Handle TARGET_EXPR.
* fold-const.c (operand_equal_p): For hash verification,
or in OEP_HASH_CHECK into flags.

* g++.dg/opt/pr70906.C: New test.
* g++.dg/opt/pr70933.C: New test.

--- gcc/tree-core.h.jj  2016-04-27 15:29:05.0 +0200
+++ gcc/tree-core.h 2016-05-04 12:13:59.361459074 +0200
@@ -767,7 +767,9 @@ enum operand_equal_flag {
   OEP_MATCH_SIDE_EFFECTS = 4,
   OEP_ADDRESS_OF = 8,
   /* Internal within operand_equal_p:  */
-  OEP_NO_HASH_CHECK = 16
+  OEP_NO_HASH_CHECK = 16,
+  /* Internal within inchash::add_expr:  */
+  OEP_HASH_CHECK = 32
 };
 
 /* Enum and arrays used for tree allocation stats.
--- gcc/tree.c.jj   2016-05-03 10:00:25.0 +0200
+++ gcc/tree.c  2016-05-04 12:20:00.354569734 +0200
@@ -7915,9 +7915,12 @@ add_expr (const_tree t, inchash::hash 
   && integer_zerop (TREE_OPERAND (t, 1)))
inchash::add_expr (TREE_OPERAND (TREE_OPERAND (t, 0), 0),
   hstate, flags);
+  /* Don't ICE on FE specific trees, or their arguments etc.
+during operand_equal_p hash verification.  */
+  else if (!IS_EXPR_CODE_CLASS (tclass))
+   gcc_assert (flags & OEP_HASH_CHECK);
   else
{
- gcc_assert (IS_EXPR_CODE_CLASS (tclass));
  unsigned int sflags = flags;
 
  hstate.add_object (code);
@@ -7966,6 +7969,13 @@ add_expr (const_tree t, inchash::hash 
hstate.add_int (CALL_EXPR_IFN (t));
  break;
 
+   case TARGET_EXPR:
+ /* For TARGET_EXPR, just hash on the TARGET_EXPR_SLOT.
+Usually different TARGET_EXPRs just should use
+different temporaries in their slots.  */
+ inchash::add_expr (TARGET_EXPR_SLOT (t), hstate, flags);
+ return;
+
default:
  break;
}
--- gcc/fold-const.c.jj 2016-05-02 18:16:00.0 +0200
+++ gcc/fold-const.c2016-05-04 12:14:33.188000923 +0200
@@ -2758,8 +2758,8 @@ operand_equal_p (const_tree arg0, const_
  if (arg0 != arg1)
{
  inchash::hash hstate0 (0), hstate1 (0);
- inchash::add_expr (arg0, hstate0, flags);
- inchash::add_expr (arg1, hstate1, flags);
+ inchash::add_expr (arg0, hstate0, flags | OEP_HASH_CHECK);
+ inchash::add_expr (arg1, hstate1, flags | OEP_HASH_CHECK);
  hashval_t h0 = hstate0.end ();
  hashval_t h1 = hstate1.end ();
  gcc_assert (h0 == h1);
--- gcc/testsuite/g++.dg/opt/pr70906.C.jj   2016-05-04 11:33:32.799387826 
+0200
+++ gcc/testsuite/g++.dg/opt/pr70906.C  2016-05-04 11:33:02.0 +0200
@@ -0,0 +1,69 @@
+// PR c++/70906
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wall" }
+
+template  struct B;
+template  struct F { typedef U *t; };
+template  struct D {};
+template  struct L {
+  typedef VP np;
+  typedef typename F::t cnp;
+};
+struct P { typedef L nt; };
+template  struct I { typedef typename N::template A t; };
+template  struct Q { typedef typename I::t t; };
+template  struct G;
+template 
+struct mh {
+  template  struct A { typedef G pvt; };
+};
+template  struct B { static T pt(T); };
+struct R : public D { typedef P ht; };
+class lmh : public R {};
+template  struct G {
+  typedef Hook Ht;
+  typedef typename Ht::ht::nt nt;
+

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Eric Botcazou

> I.e. the proposed change below.

Applied on mainline and 6 branch.  Please post patches as attachments instead 
of plain text though, this avoids nasty surprises from mail readers.

-- 
Eric Botcazou

Re: Enabling -frename-registers?

2016-05-04 Thread Eric Botcazou

> I do not see that working unfortunately - Thumb-2 codesize increases by a
> few percent even with -Os. This is primarily due to replacing a low
> register with IP, which often changes a 16-bit instruction like:
> 
> movsr2, #8
> 
> into a 32-bit one:
> 
> mov ip, #8
> 
> This will also affect other targets with multiple instruction sizes. So I
> think it should check the size of the new instruction patterns and only
> accept a rename if it is not larger (certainly with -Os).

I'd rather let the back-end do that, either through preferred_rename_class or 
another hook.

-- 
Eric Botcazou

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Svante Signell

On Wed, 2016-05-04 at 18:43 +0200, Svante Signell wrote:
> OnSamuel Thibault, on Wed 04 May 2016 17:29:48 +0200, wrote:
> > 
> > > -   --  From: /usr/include/unistd.h __getpagesize or
> > > getpagesize??
> > > -   function Get_Page_Size return int;
> > > +   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or
> > > getpagesize??
> > > +   function Get_Page_Size return size_t;
> > > +   function Get_Page_Size return Address;
> > > 
> > > Why using size_t and Address?  Other OSes use int, and the
> > > prototype for
> > > getpagesize is returning int.
> > > 
> > > Also, don't use the __ versions of the glibc functions, they are
> > > internal aliases, the API is without __.
> > > 
> > 
> I thought I did change that, but apparently not. I did such a change,
> but it was probably somewhere else. Please submit your patch upstream
> and to Debian.

No it was not somewhere else. I did change that, see Debian bug
#811063. I even built gcc-6 to make sure everything was OK. And still
it did not get into the updated patch, strange.

May I comment on Debian way of apt-get source gcc-*: Doing that does
not unpack the sources, neither does it apply the patches, you have to
unpack and patch before you can change sources and update patches. Iv'e
patched the sources several times and still find that the updated
patches are not included in the next build. Really confusing.

Thanks!

Re: Enabling -frename-registers?

2016-05-04 Thread Segher Boessenkool

On Wed, May 04, 2016 at 12:11:04PM +0200, Bernd Schmidt wrote:
> Given how many latent bugs it has shown up I think that alone would make 
> it valuable to have enabled at -O2.

It is finding so many latent bugs simply because it is changing register
allocation so much, and very aggressively.  That does not mean enabling
it by default is a good thing, quite the opposite, if there is no
performance (or code size, etc.) advantage to doing so.

Segher

[PATCH] tail merge ICE

2016-05-04 Thread Nathan Sidwell


This patch fixes an ICE Thomas observed in tree-ssa-tail-merge.c:

On 05/03/16 06:34, Thomas Schwinge wrote:


I'm also seeing the following regression for C and C++,
libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:

source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: In 
function 'vector_1._omp_fn.0':
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9: 
internal compiler error: Segmentation fault
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
 ^

#4  0x00f73d46 in internal_error (gmsgid=gmsgid@entry=0x105be63 
"%s")
at [...]/source-gcc/gcc/diagnostic.c:1270
#5  0x009fccb0 in crash_signal (signo=)
at [...]/source-gcc/gcc/toplev.c:333
#6  
#7  0x00beaf2e in same_succ_flush_bb (bb=, 
bb=)
at [...]/source-gcc/gcc/hash-table.h:919
#8  0x00bec499 in same_succ_flush_bbs (bbs=)
at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823


What's happening is we're trying to delete an object from a hash table, and 
asserting that we did indeed find the object.  The hash's equality function 
compares gimple sequences and ends up calling gimple_call_same_target_p.  That 
returns false if the call is IFN_UNIQUE, and so the deletion fails to find 
anything.  IFN_UNIQUE function calls should not compare equal, but they should 
compare eq (in the lispy sense).


The local fix is to augment the hash compare function with a check for pointer 
equality.  That way deleting items from the table works and comparing different 
sequences functions as before.


The more general fix is to augment gimple_call_same_target_p so that unique fns 
are eq but not equal.  A cursory look at the other users of that function did 
not indicate this currently causes a problem, but IMHO it is odd for a value to 
not compare the same as itself -- though IEEE NaNs do that :)


I placed the pointer equality comparison in gimple_call_same_target_p after the 
check for unique_fn_p, as I suspect that it is the rare case for that to be 
called with the same gimple call object for both parameters.  Although pointer 
equality would be applicable to all cases, in most instances it's going to be false.


Of course, the gimple_call_same_target_p change fixes the problem on its own, 
but the local change to same_succ::equal seems beneficial on its own merits.


ok?

nathan
--
Nathan Sidwell
2016-05-04  Nathan Sidwell  

	* gimple.c (gimple_call_same_target_p): Unique functions are eq.
	* tree-ssa-tail-merge.c (same_succ::equal): Check pointer eq
	equality first.

Index: gimple.c
===
--- gimple.c	(revision 235871)
+++ gimple.c	(working copy)
@@ -1355,7 +1355,8 @@ gimple_call_same_target_p (const gimple
   if (gimple_call_internal_p (c1))
 return (gimple_call_internal_p (c2)
 	&& gimple_call_internal_fn (c1) == gimple_call_internal_fn (c2)
-	&& !gimple_call_internal_unique_p (as_a  (c1)));
+	&& (!gimple_call_internal_unique_p (as_a  (c1))
+		|| c1 == c2));
   else
 return (gimple_call_fn (c1) == gimple_call_fn (c2)
 	|| (gimple_call_fndecl (c1)
Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c	(revision 235871)
+++ tree-ssa-tail-merge.c	(working copy)
@@ -538,6 +538,9 @@ same_succ::equal (const same_succ *e1, c
   gimple *s1, *s2;
   basic_block bb1, bb2;
 
+  if (e1 == e2)
+return 1;
+
   if (e1->hashval != e2->hashval)
 return 0;

Re: [RS6000] TARGET_RELOCATABLE

2016-05-04 Thread Segher Boessenkool

On Wed, May 04, 2016 at 02:21:18PM +0930, Alan Modra wrote:
> For ABI_V4, -mrelocatable and -fPIC both generate position independent
> code, with some extra "fixup" output for -mrelocatable.  The
> similarity of these two options has led to the situation where the
> sysv4.h SUBTARGET_OVERRIDE_OPTIONS sets flag_pic on seeing
> -mrelocatable, and sets TARGET_RELOCATABLE on seeing -fPIC.  That
> prevents LTO from properly optimizing position dependent executables,
> because the mutual dependence of the flags and the fact that LTO
> streaming records the state of rs6000_isa_flags, result in flag_pic
> being set when it shouldn't be.
> 
> So, don't set TARGET_RELOCATABLE when -fPIC.  Places that currently
> test TARGET_RELOCATABLE can instead test
> TARGET_RELOCATABLE || (DEFAULT_ABI == ABI_V4 && flag_pic > 1)
> or since TARGET_RELOCATABLE can only be enabled when ABI_V4,
> DEFAULT_ABI == ABI_V4 && (TARGET_RELOCATABLE || flag_pic > 1).

That last one is even readable!  :-)

> Also, since flag_pic is set by -mrelocatable, a number of places that
> currently test TARGET_RELOCATABLE can be simplified.  I also made
> -mrelocatable set TARGET_NO_FP_IN_TOC, allowing TARGET_RELOCATABLE to
> be removed from ASM_OUTPUT_SPECIAL_POOL_ENTRY_P.  Reducing occurrences
> of TARGET_RELOCATABLE is a good thing.

Does this TARGET_NO_FP_IN_TOC setting need documenting somewhere?

> Bootstrapped and regression tested powerpc64-linux.  OK?

Okay for trunk, one nit...

> @@ -23868,7 +23869,9 @@ rs6000_stack_info (void)
> && !TARGET_PROFILE_KERNEL)
>|| (DEFAULT_ABI == ABI_V4 && cfun->calls_alloca)
>  #ifdef TARGET_RELOCATABLE
> -  || (TARGET_RELOCATABLE && (get_pool_size () != 0))
> +  || (DEFAULT_ABI == ABI_V4
> +   && (TARGET_RELOCATABLE || flag_pic > 1)
> +   && (get_pool_size () != 0))

Superfluous parens on that last line.


Segher

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Svante Signell

On Wed, 2016-05-04 at 17:34 +0200, Samuel Thibault wrote:
> Samuel Thibault, on Wed 04 May 2016 17:29:48 +0200, wrote:
> > The gcc-6 build failed. I see that one of the change is:
> > 
> > -   --  From: /usr/include/unistd.h __getpagesize or getpagesize??
> > -   function Get_Page_Size return int;
> > +   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or
> > getpagesize??
> > +   function Get_Page_Size return size_t;
> > +   function Get_Page_Size return Address;
> > 
> > Why using size_t and Address?  Other OSes use int, and the
> > prototype for
> > getpagesize is returning int.
> > 
> > Also, don't use the __ versions of the glibc functions, they are
> > internal aliases, the API is without __.
> 
> I.e. the proposed change below.
> 
> Samuel
> 
> 
> 2016-05-04  Samuel Thibault  
> 
> * s-osinte-gnu.ads: Make Get_Page_Size return int, and make it
> usehome/srs/DEBs/gcc-5/gcc-5-5.3.1/debian/patches/ada-hurd.diff
> getpagesize instead of __getpagesize.
> 
> --- a/src/gcc/ada/s-osinte-gnu.ads
> +++ b/src/gcc/ada/s-osinte-gnu.ads
> @@ -344,10 +344,9 @@ package System.OS_Interface is
> --  returns the stack base of the specified thread. Only call
> this function
> --  when Stack_Base_Available is True.
>  
> -   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or
> getpagesize??
> -   function Get_Page_Size return size_t;
> -   function Get_Page_Size return Address;
> -   pragma Import (C, Get_Page_Size, "__getpagesize");
> +   --  From: /usr/include/i386-gnu/bits/shm.h
> +   function Get_Page_Size return int;
> +   pragma Import (C, Get_Page_Size, "getpagesize");
> --  Returns the size of a page
>  
> --  From /usr/include/i386-gnu/bits/mman.h

I thought I did change that, but apparently not. I did such a change,
but it was probably somewhere else. Please submit your patch upstream
and to Debian.

Thanks!

[PATCH] add myself to MAINTAINERS

2016-05-04 Thread Aaron Sawdey

Hi,

Having submitted my first patch, I need to add myself to MAINTAINERS.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 235841)
+++ MAINTAINERS (working copy)
@@ -560,6 +560,7 @@
 Duncan Sands   
 Sujoy Saraswati

 Trevor Saunders
+Aaron Sawdey   
 William Schmidt

 Tilo Schwarz   
 Martin Sebor   

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: [RS6000] Rewrite rs6000_frame_related to use simplify_replace_rtx

2016-05-04 Thread Segher Boessenkool

On Wed, May 04, 2016 at 11:14:41AM +0930, Alan Modra wrote:
>   * config/rs6000/rs6000.c (rs6000_frame_related): Rewrite.

> -  rtx real, temp;
> +  rtx patt, repl;

If you don't rename "real" here it is probably easier to read?  And it's
a better name anyway?

> -  if (REGNO (reg) == STACK_POINTER_REGNUM && reg2 == NULL_RTX)
> +  repl = NULL_RTX;
> +  if (REGNO (reg) == STACK_POINTER_REGNUM)
> +gcc_checking_assert (val == 0);
> +  else
> +repl = gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM),
> +  GEN_INT (val));

Put the NULL_RTX assignment in the first arm, please.

Okay for trunk with those changes, thanks,


Segher

[PATCH] add reassociation width target function for power8

2016-05-04 Thread Aaron Sawdey

Hi,

This patch enables TARGET_SCHED_REASSOCIATION_WIDTH for power8 and up.
The widths returned are derived from testing with SPEC 2006 and some
simple tests on power8.

Bootstrapped and regtested on powerpc64le-unknown-linux-gnu, ok for
trunk?

2016-05-04  Aaron Sawdey 

* config/rs6000/rs6000.c (rs6000_reassociation_width): Add
function for TARGET_SCHED_REASSOCIATION_WIDTH to enable
parallel reassociation for power8 and forward.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 235841)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1755,6 +1755,9 @@
 #undef TARGET_CONDITIONAL_REGISTER_USAGE
 #define TARGET_CONDITIONAL_REGISTER_USAGE rs6000_conditional_register_usage

+#undef TARGET_SCHED_REASSOCIATION_WIDTH
+#define TARGET_SCHED_REASSOCIATION_WIDTH rs6000_reassociation_width
+
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT rs6000_trampoline_init

@@ -8633,6 +8636,40 @@
 true, worst_case);
 }

+/* Determine the reassociation width to be used in reassociate_bb.
+   This takes into account how many parallel operations we
+   can actually do of a given type, and also the latency.
+   P8:
+ int add/sub 6/cycle
+ mul 2/cycle
+ vect add/sub/mul 2/cycle
+ fp   add/sub/mul 2/cycle
+ dfp  1/cycle
+*/
+
+static int
+rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
+enum machine_mode mode)
+{
+  switch (rs6000_cpu)
+{
+case PROCESSOR_POWER8:
+case PROCESSOR_POWER9:
+  if (DECIMAL_FLOAT_MODE_P (mode))
+   return 1;
+  if (VECTOR_MODE_P (mode))
+   return 4;
+  if (INTEGRAL_MODE_P (mode))
+   return opc == MULT_EXPR ? 4 : 6;
+  if (FLOAT_MODE_P (mode))
+   return 4;
+  break;
+default:
+  break;
+}
+  return 1;
+}
+
 /* Change register usage conditional on target flags.  */
 static void
 rs6000_conditional_register_usage (void)


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: [patch] libstdc++/69703 ignore endianness in codecvt_utf8

2016-05-04 Thread Andre Vieira (lists)

On 20/04/16 18:40, Jonathan Wakely wrote:
> On 19/04/16 19:07 +0100, Jonathan Wakely wrote:
>> This was reported as a bug in the Filesystem library, but it's
>> actually a problem in the codecvt_utf8 facet that it uses.
> 
> The fix had a silly typo meaning it didn't work for big endian
> targets, which was revealed by the improved tests I added.
> 
> Tested x86_64-linux and powerpc64-linux, committed to trunk.
> 
> 
Hi Jonathan,

We are seeing experimental/filesystem/path/native/string.cc fail on
baremetal targets. I'm guessing this is missing a
'dg-require-filesystem-ts', as seen on other tests like
experimental/filesystem/path/modifiers/swap.cc.

Cheers,
Andre

Re: [PING][PATCH] New plugin event when evaluating a constexpr call

2016-05-04 Thread Jason Merrill


On 05/02/2016 03:28 PM, Andres Tiraboschi wrote:

+  constexpr_call_info call_info;
+  call_info.function = t;
+  call_info.call_stack = call_stack;
+  call_info.ctx = ctx;
+  call_info.lval_p = lval;
+  call_info.non_constant_p = non_constant_p;
+  call_info.overflow_p = overflow_p;
+  call_info.result = NULL_TREE;
+
+  invoke_plugin_callbacks (PLUGIN_EVAL_CALL_CONSTEXPR, _info);


Let's move this into a separate function so that it doesn't increase the 
stack footprint of cxx_eval_call_expression.


Jason

Re: [PATCH], Add PowerPC ISA 3.0 vector d-form addressing

2016-05-04 Thread Segher Boessenkool

Hi Mike,

On Tue, May 03, 2016 at 06:39:55PM -0400, Michael Meissner wrote:
> With this patch, I enable -mlra if the user did not specify either -mlra or
> -mno-lra on the command line, and -mcpu=power9 or -mpower9-dform-vector were
> used. I also enabled -mvsx-timode if LRA was used, which also is a RELOAD
> issue, that works with LRA.

I don't like enabling LRA if the user didn't ask for it; it is a bit too
surprising.  What do you do if there is -mno-lra explicitly?  You can just
do the same if no-lra is implicit?

>   * doc/md.texi (wO constraint): Likewise.

Everything is "likewise", that isn't very helpful.  Writing big changelogs
is annoying, I totally agree, but please try a bit harder.

> --- gcc/config/rs6000/rs6000.opt  
> (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> (revision 235831)
> +++ gcc/config/rs6000/rs6000.opt  (.../gcc/config/rs6000) (working copy)
> @@ -470,8 +470,8 @@ Target RejectNegative Joined UInteger Va
>  -mlong-double-Specify size of long double (64 or 128 bits).
>  
>  mlra
> -Target Report Var(rs6000_lra_flag) Init(0) Save
> -Use LRA instead of reload.
> +Target Undocumented Mask(LRA) Var(rs6000_isa_flags)
> +Use the LRA register allocator instead of the reload register allocator.

It wasn't "undocumented" before?  Why the change to a mask bit btw?

> +mpower9-dform-scalar
> +Target Report Mask(P9_DFORM_SCALAR) Var(rs6000_isa_flags)
> +Use/do not use scalar register+offset memory instructions added in ISA 3.0.
> +
> +mpower9-dform-vector
> +Target Report Mask(P9_DFORM_VECTOR) Var(rs6000_isa_flags)
> +Use/do not use vector register+offset memory instructions added in ISA 3.0.
> +
>  mpower9-dform
> -Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
> -Use/do not use vector and scalar instructions added in ISA 3.0.
> +Target Report Var(TARGET_P9_DFORM_BOTH) Init(-1) Save
> +Use/do not use register+offset memory instructions added in ISA 3.0.

These should probably all be undocumented, though (they're not something
users should use).

> +/* Return true if the ADDR is an acceptiable address for a quad memory
^ spelling

> +   if (((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0)
> +   || !quad_address_p (addr, mode, false))

You can lose some parens here, i.e.

+ if ((addr_mask & RELOAD_REG_QUAD_OFFSET) == 0
+ || !quad_address_p (addr, mode, false))


Segher

Re: [PATCH] [FIX PR c/48116] -Wreturn-type does not work as advertised

2016-05-04 Thread Joseph Myers

On Mon, 11 Apr 2016, Prasad Ghangal wrote:

> Hi!
> 
> This is proposed patch for
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48116 (-Wreturn-type does
> not work as advertised)

I think that this is actually a documentation bug and should be fixed by 
changing the documentation, not by changing the code.  That is, the patch 
 that added the 
assertion about return with a value from functions returning void was 
mistaken: if the expression is non-void, the diagnostic is mandatory, and 
if it's a void expression, this is a legitimate, long-standing extension 
to ISO C, taken from ISO C++, and should be documented as such and not 
warned for except with -pedantic.  So there should be two documentation 
changes: (a) correct the description of -Wreturn-type and (b) explicitly 
document return of void expressions from functions returning void as a GNU 
C extension taken from ISO C++.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Arnaud Charlet

> 2016-05-04  Samuel Thibault  
> 
> * s-osinte-gnu.ads: Make Get_Page_Size return int, and make it use
> getpagesize instead of __getpagesize.
> 
> --- a/src/gcc/ada/s-osinte-gnu.ads
> +++ b/src/gcc/ada/s-osinte-gnu.ads
> @@ -344,10 +344,9 @@ package System.OS_Interface is
> --  returns the stack base of the specified thread. Only call this
> function
> --  when Stack_Base_Available is True.
>  
> -   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or
> getpagesize??
> -   function Get_Page_Size return size_t;
> -   function Get_Page_Size return Address;
> -   pragma Import (C, Get_Page_Size, "__getpagesize");
> +   --  From: /usr/include/i386-gnu/bits/shm.h
> +   function Get_Page_Size return int;
> +   pragma Import (C, Get_Page_Size, "getpagesize");
> --  Returns the size of a page
>  
> --  From /usr/include/i386-gnu/bits/mman.h

Yes, something like the above would be needed to bring the old patch on par
with GCC 6. Above patch is OK with me, assuming testing is OK.

Arno

Re: [PATCH] Allow xmm16-xmm31 in sse2_movq128

2016-05-04 Thread Kirill Yukhin

On 03 May 13:28, Jakub Jelinek wrote:
> Hi!
> 
> Another insn where we can just unconditionally use v constraint instead of x
> - for V2DImode HARD_REGNO_MODE_OK will only allow it for AVX512VL for the
> ext regs, the insn is actually already available in AVX512F, but probably
> not worth spending too much time on this to allow it even for xmm16-xmm31
> for -mavx512f -mno-avx512vl.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK

--
Thanks, K
> 
> 2016-05-03  Jakub Jelinek  
> 
>   * config/i386/sse.md (sse2_movq128): Use v constraint instead of x.
> 
>   * gcc.target/i386/avx512vl-vmovq-1.c: New test.
> 
> --- gcc/config/i386/sse.md.jj 2016-05-03 00:12:09.210351372 +0200
> +++ gcc/config/i386/sse.md2016-05-03 10:04:41.560546790 +0200
> @@ -1076,10 +1076,10 @@ (define_insn "_store_mask"
> (set_attr "mode" "")])
>  
>  (define_insn "sse2_movq128"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x")
> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
>   (vec_concat:V2DI
> (vec_select:DI
> - (match_operand:V2DI 1 "nonimmediate_operand" "xm")
> + (match_operand:V2DI 1 "nonimmediate_operand" "vm")
>   (parallel [(const_int 0)]))
> (const_int 0)))]
>"TARGET_SSE2"
> --- gcc/testsuite/gcc.target/i386/avx512vl-vmovq-1.c.jj   2016-05-03 
> 10:09:20.930749746 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512vl-vmovq-1.c  2016-05-03 
> 10:10:40.673665926 +0200
> @@ -0,0 +1,16 @@
> +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
> +/* { dg-options "-O2 -mavx512vl" } */
> +
> +#include 
> +
> +void
> +foo (__m128i x, __m128i *y)
> +{
> +  register __m128i a __asm ("xmm16");
> +  a = x;
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_move_epi64 (a);
> +  asm volatile ("" : "+v" (a));
> +  a = _mm_move_epi64 (*y);
> +  asm volatile ("" : "+v" (a));
> +}
> 
>   Jakub

Re: [PATCH] Better location info for "incomplete type" error msg (PR c/70756)

2016-05-04 Thread Jason Merrill

On Wed, May 4, 2016 at 9:00 AM, Marek Polacek  wrote:
> On Tue, May 03, 2016 at 08:05:47PM -0400, Jason Merrill wrote:
>> Looks good.
>>
>> But I don't see a C++ testcase; can the test go into c-c++-common?
>
> Sadly, no.  As of now, the patch doesn't improve things for C++ (?).  Seems
> we'd need to pass better locations down to pointer_int_sum / size_in_bytes.
> It cascades :(.

Sure.  But can you fix that, too, while you're thinking about it?
Passing the location to cp_pointer_int_sum and pointer_diff seems
pretty simple.

Jason

Re: C/C++ PATCH to add -Wdangling-else option

2016-05-04 Thread Marek Polacek

On Wed, May 04, 2016 at 03:39:19PM +, Joseph Myers wrote:
> On Wed, 4 May 2016, Marek Polacek wrote:
> 
> > On Tue, Apr 26, 2016 at 03:03:25PM +0200, Bernd Schmidt wrote:
> > > On 04/26/2016 02:39 PM, Jakub Jelinek wrote:
> > > > I support that change, and -Wparentheses will still enable this, it just
> > > > gives more fine-grained control and be in line with what clang does.
> > > > 
> > > > Bernd, how much are you against this change?
> > > 
> > > Don't really care that much, I just don't quite see the point. Don't let 
> > > me
> > > stop you though.
> > 
> > So Joseph, what do you think about this patch? :)
> 
> I support adding the option for more fine-grained control of these 
> warnings.

Thanks.  I'll commit the patch then.

Marek

Re: C/C++ PATCH to add -Wdangling-else option

2016-05-04 Thread Joseph Myers

On Wed, 4 May 2016, Marek Polacek wrote:

> On Tue, Apr 26, 2016 at 03:03:25PM +0200, Bernd Schmidt wrote:
> > On 04/26/2016 02:39 PM, Jakub Jelinek wrote:
> > > I support that change, and -Wparentheses will still enable this, it just
> > > gives more fine-grained control and be in line with what clang does.
> > > 
> > > Bernd, how much are you against this change?
> > 
> > Don't really care that much, I just don't quite see the point. Don't let me
> > stop you though.
> 
> So Joseph, what do you think about this patch? :)

I support adding the option for more fine-grained control of these 
warnings.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] MIPS: Ensure that lo_sums do not contain an unaligned symbol

2016-05-04 Thread Andrew Bennett

Hi,

In MIPS (and similarly for other RISC architectures) to load an absolute 
address of an object
requires a two instruction sequence: one instruction to load the high part of 
the object's address, 
and one instruction to load the low part of the object's address.  Typically 
the result from the 
calculation of the high part of the address will only be used by one 
instruction to load 
the low part of the address.  However, in certain situations (for example when 
loading or
storing double word values) the result of computing the high part of the 
address can be 
used by multiple instructions to load the low parts of an address at different 
offsets.  Lets 
show this with an example C program.

struct
{
  short s;
  unsigned long long l;
} h;

void foo (void)
{
  h.l = 0;
}

When this is compiled for MIPS it produces the following assembly:

lui $2,%hi(h+8)
sw  $0,%lo(h+8)($2)
jr  $31
sw  $0,%lo(h+12)($2)

  ...

.globl  h
.section.bss,"aw",@nobits
.align  3
.type   h, @object
.size   h, 16
h:
.space  16


Notice here that the high part of the address of object h is loaded into 
register $2,
and this is then used as part of the low part calculation by two the sw 
instructions which each
have different offsets.  In MIPS the value of a low part calculation is treated 
as a signed value.
It is therefore valid to use the result of a high part calculation with 
multiple low part calculations
containing different offsets so long as when adding the result of the high part 
to the each of the
sign extended low parts we get valid addresses.

However, if we add the packed attribute to the h structure, the fields of the 
structure will 
not be naturally aligned and we can break the previous condition.  Lets explain 
this in more 
detail with the following C program.

struct __attribute__((packed))
{
 short s;
 unsigned long long l;
} h;

void foo (void)
{
 h.l = 0;
}

When this is compiled for MIPS it produces the following assembly:

  lui $2,%hi(h)
addiu   $4,$2,%lo(h+2)
addiu   $3,$2,%lo(h+6)
swl $0,3($4)
swr $0,%lo(h+2)($2)
swl $0,3($3)
jr  $31
swr $0,%lo(h+6)($2)

...

  .globl  h
.section.bss,"aw",@nobits
.align  2
.type   h, @object
.size   h, 10
h:
.space  10


There are two things to highlight here.  Firstly the alignment of the h 
structure has decreased
from 8 bytes to 4 bytes.  Secondly we have a low part calculation which adds an 
offset of 6
to the address of the h structure which is greater than its alignment.

When the MIPS linker resolves a HI relocation (i.e. %hi(h)) it finds the next 
LO 
relocation (i.e. %lo(h+2)) in the relocation table and using the information 
from both 
of these relocations it computes the object's address and extracts its high 
part.  Then, when the 
MIPS linker resolves a LO relocation it adds the offset to the object's address 
and then extracts 
the low part.

Lets assume that object h has an address of 0x80007ffc.  When the MIPS linker 
resolves the value 
of the HI relocation for object h, it will also use the value of the LO 
relocation for object h 
with an offset of 2.  The high part value is therefore:

HIGH (0x80007ffc + 2) = HIGH (0x80007ffe) = 0x8000


Then the MIPS linker resolves the value of LO relocation for object h with an 
offset of 2:

LO (0x80007ffc + 2) = LO (0x80007ffe) = 0x7ffe


Finally the MIPS linker resolves the value of the LO relocation for object h 
with an offset of 6:

LO (0x80007ffc + 6) = LO (0x80008002) = 0x8002

In MIPS the value of a LO relocation is treated as a signed value, so when the 
program is run the address 
of h+6 will be 0x7fff8002 when it should be 0x80008002.



To fix this issue I have changed the mips_split_symbol function in the case 
when it generates a set of 
instructions to compute the high and low part of a symbol's address.  If the 
symbol is unaligned the 
low-part calculation is forced into a register.  I have also added a condition 
into mips_classify_address 
that prevents lo_sums from being used if they contain an unaligned symbol.  
This stops GCC from trying to 
merge the result of a lo_sum that is currently in register back into an address 
calculation.

I have tested the patch on the mips-mti-elf toolchain and there have been no 
regressions.

The patch and ChangeLog are below.


Ok to commit?

Many thanks,



Andrew



gcc/
* config/mips/mips.c (mips_valid_lo_sum_p): New function.
(mips_classify_address): Call mips_valid_lo_sum_p to check
if we have a valid lo_sum.
(mips_split_symbol): Force the lo_sum to a register if
mips_valid_lo_sum_p is false.

testsuite/
* gcc.target/mips/hi-lo-reloc-offset1: New test.
* gcc.target/mips/hi-lo-reloc-offset2: New test.
*

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Samuel Thibault

Samuel Thibault, on Wed 04 May 2016 17:29:48 +0200, wrote:
> The gcc-6 build failed. I see that one of the change is:
> 
> -   --  From: /usr/include/unistd.h __getpagesize or getpagesize??
> -   function Get_Page_Size return int;
> +   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or getpagesize??
> +   function Get_Page_Size return size_t;
> +   function Get_Page_Size return Address;
> 
> Why using size_t and Address?  Other OSes use int, and the prototype for
> getpagesize is returning int.
> 
> Also, don't use the __ versions of the glibc functions, they are
> internal aliases, the API is without __.

I.e. the proposed change below.

Samuel


2016-05-04  Samuel Thibault  

* s-osinte-gnu.ads: Make Get_Page_Size return int, and make it use
getpagesize instead of __getpagesize.

--- a/src/gcc/ada/s-osinte-gnu.ads
+++ b/src/gcc/ada/s-osinte-gnu.ads
@@ -344,10 +344,9 @@ package System.OS_Interface is
--  returns the stack base of the specified thread. Only call this function
--  when Stack_Base_Available is True.
 
-   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or getpagesize??
-   function Get_Page_Size return size_t;
-   function Get_Page_Size return Address;
-   pragma Import (C, Get_Page_Size, "__getpagesize");
+   --  From: /usr/include/i386-gnu/bits/shm.h
+   function Get_Page_Size return int;
+   pragma Import (C, Get_Page_Size, "getpagesize");
--  Returns the size of a page
 
--  From /usr/include/i386-gnu/bits/mman.h

Re: Please include ada-hurd.diff upstream (try2)

2016-05-04 Thread Samuel Thibault

Hello Svante,

The gcc-6 build failed. I see that one of the change is:

-   --  From: /usr/include/unistd.h __getpagesize or getpagesize??
-   function Get_Page_Size return int;
+   --  From: /usr/include/i386-gnu/bits/shm.h __getpagesize or getpagesize??
+   function Get_Page_Size return size_t;
+   function Get_Page_Size return Address;

Why using size_t and Address?  Other OSes use int, and the prototype for
getpagesize is returning int.

Also, don't use the __ versions of the glibc functions, they are
internal aliases, the API is without __.

Samuel

Re: [RS6000] out-of-line exit register restore funcs

2016-05-04 Thread Segher Boessenkool

On Wed, May 04, 2016 at 01:45:28PM +0930, Alan Modra wrote:
> This fixes the regression from gcc-4.5 for -m32 -Os shown by
> gcc.target/powerpc/savres.c:s_r31.  Bootstrap and regression tests
> on powerpc64le-linux and powerpc64-linux in progress.  OK assuming no
> regressions?
> 
>   * config/rs6000/rs6000.c (rs6000_savres_strategy): Don't use
>   out-of-line gpr restore for one or two regs if that would add
>   a save of lr.

Okay for trunk.  Thanks,


Segher

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-04 Thread Jan Hubicka

> On 04/29/16 11:08, Aaron Conole wrote:
> 
> >Perhaps I've poorly explained what I want. I want to be able to pipe
> >gcov error messages to a different file for post-processing / reporting
> >elsewhere. I don't want them mixed with the application's messages. Do
> >you think this kind of generic flexibility is not a good thing, when it
> >comes at such little cost?
> 
> Thanks for clarifying your rationale.  I'm not convinced, but I'm
> not (yet) saying no.
> 
> Jan, do you have any thoughts?

I can imagine this to be useful - if your application is outputting stuff into
error output during its training run, it may be quite disturbing having gcov
diagnostics randomly mixed in. (in particular I run myself into cases missing
the diagnostics) So I am fine with the feature.

Honza
> 
> nathan

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-05-04 Thread Nathan Sidwell


On 04/29/16 11:08, Aaron Conole wrote:


Perhaps I've poorly explained what I want. I want to be able to pipe
gcov error messages to a different file for post-processing / reporting
elsewhere. I don't want them mixed with the application's messages. Do
you think this kind of generic flexibility is not a good thing, when it
comes at such little cost?


Thanks for clarifying your rationale.  I'm not convinced, but I'm not (yet) 
saying no.


Jan, do you have any thoughts?

nathan

Fix regrename compare-debug issue

2016-05-04 Thread Bernd Schmidt

When scanning addresses inside a debug insn, we shouldn't use normal 
base/index classes. This shows as a compare-debug issue on Alpha, where 
INDEX_REG_CLASS is NO_REGS, and this prevented a chain from being 
renamed with debugging turned on.


Uros has reported that this patch resolves the issues he was seeing on 
Alpha, and I've bootstrapped and tested it on x86_64-linux. Ok?



Bernd
	* regrename.c (base_reg_class_for_rename): New static function.
	(scan_rtx_address, scan_rtx): Use it instead of base_reg_class.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 235808)
+++ gcc/regrename.c	(working copy)
@@ -1238,6 +1238,19 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc,
 }
 }
 
+/* A wrapper around base_reg_class which returns ALL_REGS if INSN is a
+   DEBUG_INSN.  The arguments MODE, AS, CODE and INDEX_CODE are as for
+   base_reg_class.  */
+
+static reg_class
+base_reg_class_for_rename (rtx_insn *insn, machine_mode mode, addr_space_t as,
+			   rtx_code code, rtx_code index_code)
+{
+  if (DEBUG_INSN_P (insn))
+return ALL_REGS;
+  return base_reg_class (mode, as, code, index_code);
+}
+
 /* Adapted from find_reloads_address_1.  CL is INDEX_REG_CLASS or
BASE_REG_CLASS depending on how the register is being considered.  */
 
@@ -1343,12 +1356,16 @@ scan_rtx_address (rtx_insn *insn, rtx *l
 	  }
 
 	if (locI)
-	  scan_rtx_address (insn, locI, INDEX_REG_CLASS, action, mode, as);
+	  {
+	reg_class iclass = DEBUG_INSN_P (insn) ? ALL_REGS : INDEX_REG_CLASS;
+	scan_rtx_address (insn, locI, iclass, action, mode, as);
+	  }
 	if (locB)
-	  scan_rtx_address (insn, locB,
-			base_reg_class (mode, as, PLUS, index_code),
-			action, mode, as);
-
+	  {
+	reg_class bclass = base_reg_class_for_rename (insn, mode, as, PLUS,
+			  index_code);
+	scan_rtx_address (insn, locB, bclass, action, mode, as);
+	  }
 	return;
   }
 
@@ -1366,10 +1383,13 @@ scan_rtx_address (rtx_insn *insn, rtx *l
   break;
 
 case MEM:
-  scan_rtx_address (insn,  (x, 0),
-			base_reg_class (GET_MODE (x), MEM_ADDR_SPACE (x),
-	MEM, SCRATCH),
-			action, GET_MODE (x), MEM_ADDR_SPACE (x));
+  {
+	reg_class bclass = base_reg_class_for_rename (insn, GET_MODE (x),
+		  MEM_ADDR_SPACE (x),
+		  MEM, SCRATCH);
+	scan_rtx_address (insn,  (x, 0), bclass, action, GET_MODE (x),
+			  MEM_ADDR_SPACE (x));
+  }
   return;
 
 case REG:
@@ -1416,10 +1436,14 @@ scan_rtx (rtx_insn *insn, rtx *loc, enum
   return;
 
 case MEM:
-  scan_rtx_address (insn,  (x, 0),
-			base_reg_class (GET_MODE (x), MEM_ADDR_SPACE (x),
-	MEM, SCRATCH),
-			action, GET_MODE (x), MEM_ADDR_SPACE (x));
+  {
+	reg_class bclass = base_reg_class_for_rename (insn, GET_MODE (x),
+		  MEM_ADDR_SPACE (x),
+		  MEM, SCRATCH);
+
+	scan_rtx_address (insn,  (x, 0), bclass, action, GET_MODE (x),
+			  MEM_ADDR_SPACE (x));
+  }
   return;
 
 case SET:

Improve pure/const propagation across interposable function with non-interposable aliases

2016-05-04 Thread Jan Hubicka

Hi,
the API dealing with aliases is somewhow flawed by fact it makes difference 
between
the main symbol (defining function) and its aliases. There is nothing special 
about
the main symbol. This is fixed by this patch.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* cgraph.c (cgraph_node::call_for_symbol_thunks_and_aliases):
Check availability on NODE, too.
* cgraph.h (symtab_node::call_for_symbol_and_aliases): Likewise.
(cgraph_node::call_for_symbol_and_aliases): Likewise.
(varpool_node::call_for_symbol_and_aliase): Likewise.
* ipa-pure-const.c (add_new_function): Analyze all bodies.
(propagate_pure_const): Propagate across interposable functions, too.
(skip_function_for_local_pure_const): Do not skip interposable bodies
with aliases.
(pass_local_pure_const::execute): Update.

* gcc.dg/ipa/pure-const-3.c: New testcase.
Index: cgraph.c
===
--- cgraph.c(revision 235839)
+++ cgraph.c(working copy)
@@ -2289,7 +2289,7 @@ cgraph_node::can_be_local_p (void)
 }
 
 /* Call callback on cgraph_node, thunks and aliases associated to cgraph_node.
-   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+   When INCLUDE_OVERWRITABLE is false, overwritable symbols are
skipped.  When EXCLUDE_VIRTUAL_THUNKS is true, virtual thunks are
skipped.  */
 bool
@@ -2301,9 +2301,14 @@ cgraph_node::call_for_symbol_thunks_and_
 {
   cgraph_edge *e;
   ipa_ref *ref;
+  enum availability avail = AVAIL_AVAILABLE;
 
-  if (callback (this, data))
-return true;
+  if (include_overwritable
+  || (avail = get_availability ()) > AVAIL_INTERPOSABLE)
+{
+  if (callback (this, data))
+return true;
+}
   FOR_EACH_ALIAS (this, ref)
 {
   cgraph_node *alias = dyn_cast  (ref->referring);
@@ -2314,7 +2319,7 @@ cgraph_node::call_for_symbol_thunks_and_
 exclude_virtual_thunks))
  return true;
 }
-  if (get_availability () <= AVAIL_INTERPOSABLE)
+  if (avail <= AVAIL_INTERPOSABLE)
 return false;
   for (e = callers; e; e = e->next_caller)
 if (e->caller->thunk.thunk_p
Index: cgraph.h
===
--- cgraph.h(revision 235837)
+++ cgraph.h(working copy)
@@ -3096,8 +3096,7 @@ symtab_node::get_availability (symtab_no
 }
 
 /* Call calback on symtab node and aliases associated to this node.
-   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
-   skipped. */
+   When INCLUDE_OVERWRITABLE is false, overwritable symbols are skipped. */
 
 inline bool
 symtab_node::call_for_symbol_and_aliases (bool (*callback) (symtab_node *,
@@ -3105,15 +3104,19 @@ symtab_node::call_for_symbol_and_aliases
  void *data,
  bool include_overwritable)
 {
-  if (callback (this, data))
-return true;
+  if (include_overwritable
+  || get_availability () > AVAIL_INTERPOSABLE)
+{
+  if (callback (this, data))
+return true;
+}
   if (has_aliases_p ())
 return call_for_symbol_and_aliases_1 (callback, data, 
include_overwritable);
   return false;
 }
 
 /* Call callback on function and aliases associated to the function.
-   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+   When INCLUDE_OVERWRITABLE is false, overwritable symbols are
skipped.  */
 
 inline bool
@@ -3122,15 +3125,19 @@ cgraph_node::call_for_symbol_and_aliases
  void *data,
  bool include_overwritable)
 {
-  if (callback (this, data))
-return true;
+  if (include_overwritable
+  || get_availability () > AVAIL_INTERPOSABLE)
+{
+  if (callback (this, data))
+return true;
+}
   if (has_aliases_p ())
 return call_for_symbol_and_aliases_1 (callback, data, 
include_overwritable);
   return false;
 }
 
 /* Call calback on varpool symbol and aliases associated to varpool symbol.
-   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
+   When INCLUDE_OVERWRITABLE is false, overwritable symbols are
skipped. */
 
 inline bool
@@ -3139,8 +3146,12 @@ varpool_node::call_for_symbol_and_aliase
   void *data,
   bool include_overwritable)
 {
-  if (callback (this, data))
-return true;
+  if (include_overwritable
+  || get_availability () > AVAIL_INTERPOSABLE)
+{
+  if (callback (this, data))
+return true;
+}
   if (has_aliases_p ())
 return call_for_symbol_and_aliases_1 (callback, data, 
include_overwritable);
   return false;
Index: ipa-pure-const.c
===
--- ipa-pure-const.c(revision

Re: [PATCH 12/18] haifa-sched.c: make insn_queue[] a vec

2016-05-04 Thread Bernd Schmidt


On 05/03/2016 02:41 PM, Trevor Saunders wrote:


I guess the usual tool for that is contrib/compare-all-tests? is there a
simpler one?


Not sure. I have a collection of .i files (from building things like gcc 
and the kernel with -save-temps) and a script that compiles them all, 
then I just use diff to look at interesting changes. Doesn't make sense 
to post the whole thing since it's rather specific to my setup, but it's 
basically


CFLAGS="-quiet -O2 -fno-reorder-blocks -fno-schedule-insns2"
y=/your/output/directory
mkdir -p $y
cp cc1 $y
( cd /local/data; find $FINDCMD -name '*.i' -print0 ) | nice -n15 xargs 
-n1 -0 dirname |parallel "cd $y; mkdir -p {}"
( cd /local/data; find $FINDCMD -name '*.i' -print0 ) | nice -n15 
parallel -j10% -v -0 "./cc1 $CFLAGS $2 /local/data/{} -o $y/{.}.s"


The CFLAGS are chosen to avoid scheduling/reordering differences that 
most of the time aren't actually interesting when not testing thingsl 
ike scheduler patches.



Bernd

Re: Enabling -frename-registers?

2016-05-04 Thread Wilco Dijkstra

Bernd Schmidt wrote:
> On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote:
>> On ARM / AArch32 I haven't seen any performance data yet - the one place we 
>> are concerned 
>> about the impact is on Thumb2 code size as regrename may end up 
>> inadvertently putting more 
>> things in high registers. 
>
> In theory at least arm_preferred_rename_class is designed to make the  
> opposite happen. Bernd  

I do not see that working unfortunately - Thumb-2 codesize increases by a few 
percent even with -Os.
This is primarily due to replacing a low register with IP, which often changes 
a 16-bit instruction like:

movsr2, #8

into a 32-bit one:

mov ip, #8

This will also affect other targets with multiple instruction sizes. So I think 
it should check the 
size of the new instruction patterns and only accept a rename if it is not 
larger (certainly with -Os).

Also when people claim they can't see any benefit, did they check the codesize 
difference on SPEC2006?
On AArch64 codesize reduced uniformly due to fewer moves (and in a few cases 
significantly so). I expect
that to be true for other RISC targets. Simply put, reduced codesize at no 
performance loss = gain.

Wilco

Re: C, C++: Fix PR 69733 (bad location for ignored qualifiers warning)

2016-05-04 Thread Bernd Schmidt


On 04/25/2016 10:18 PM, Joseph Myers wrote:

On Fri, 22 Apr 2016, Bernd Schmidt wrote:


+/* Returns the smallest location != UNKNOWN_LOCATION in LOCATIONS,
+   considering only those c_declspec_words found in LIST, which
+   must be terminated by cdw_number_of_elements.  */
+
+static location_t
+smallest_type_quals_location (const location_t* locations,
+ c_declspec_word *list)


I'd expect list to be a pointer to const...


@@ -6101,6 +6122,18 @@ grokdeclarator (const struct c_declarato
   qualify the return type, not the function type.  */
if (type_quals)
  {
+   enum c_declspec_word ignored_quals_list[] =
+ {
+   cdw_const, cdw_volatile, cdw_restrict, cdw_address_space,
+   cdw_number_of_elements
+ };


  ... and ignored_quals_list to be static const here.


How's this? Fully retested on x86_64-linux.


Bernd
c/
	PR c++/69733
	* c-decl.c (smallest_type_quals_location): New static function.
	(grokdeclarator): Try to find the correct location for an ignored
	qualifier.
cp/
	PR c++/69733
	* decl.c (grokdeclarator): Try to find the correct location for an
	ignored qualifier.
testsuite/
	PR c++/69733
	* c-c++-common/pr69733.c: New test.
	* gcc.target/i386/pr69733.c: New test.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c	(revision 235808)
+++ gcc/c/c-decl.c	(working copy)
@@ -5321,6 +5321,27 @@ warn_defaults_to (location_t location, i
   va_end (ap);
 }
 
+/* Returns the smallest location != UNKNOWN_LOCATION in LOCATIONS,
+   considering only those c_declspec_words found in LIST, which
+   must be terminated by cdw_number_of_elements.  */
+
+static location_t
+smallest_type_quals_location (const location_t* locations,
+			  const c_declspec_word *list)
+{
+  location_t loc = UNKNOWN_LOCATION;
+  while (*list != cdw_number_of_elements)
+{
+  location_t newloc = locations[*list];
+  if (loc == UNKNOWN_LOCATION
+	  || (newloc != UNKNOWN_LOCATION && newloc < loc))
+	loc = newloc;
+  list++;
+}
+
+  return loc;
+}
+
 /* Given declspecs and a declarator,
determine the name and type of the object declared
and construct a ..._DECL node for it.
@@ -6142,6 +6163,18 @@ grokdeclarator (const struct c_declarato
 	   qualify the return type, not the function type.  */
 	if (type_quals)
 	  {
+		const enum c_declspec_word ignored_quals_list[] =
+		  {
+		cdw_const, cdw_volatile, cdw_restrict, cdw_address_space,
+		cdw_number_of_elements
+		  };
+		location_t specs_loc
+		  = smallest_type_quals_location (declspecs->locations,
+		  ignored_quals_list);
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = declspecs->locations[cdw_typedef];
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = loc;
 		/* Type qualifiers on a function return type are
 		   normally permitted by the standard but have no
 		   effect, so give a warning at -Wreturn-type.
@@ -6149,10 +6182,10 @@ grokdeclarator (const struct c_declarato
 		   function definitions in ISO C; GCC used to used
 		   them for noreturn functions.  */
 		if (VOID_TYPE_P (type) && really_funcdef)
-		  pedwarn (loc, 0,
+		  pedwarn (specs_loc, 0,
 			   "function definition has qualified void return type");
 		else
-		  warning_at (loc, OPT_Wignored_qualifiers,
+		  warning_at (specs_loc, OPT_Wignored_qualifiers,
 			   "type qualifiers ignored on function return type");
 
 		type = c_build_qualified_type (type, type_quals);
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 235808)
+++ gcc/cp/decl.c	(working copy)
@@ -10065,8 +10065,15 @@ grokdeclarator (const cp_declarator *dec
 	if (type_quals != TYPE_UNQUALIFIED)
 	  {
 		if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
-		  warning (OPT_Wignored_qualifiers,
-			   "type qualifiers ignored on function return type");
+		  {
+		location_t loc;
+		loc = smallest_type_quals_location (type_quals,
+			declspecs->locations);
+		if (loc == UNKNOWN_LOCATION)
+		  loc = declspecs->locations[ds_type_spec];
+		warning_at (loc, OPT_Wignored_qualifiers, "type "
+"qualifiers ignored on function return type");
+		  }
 		/* We now know that the TYPE_QUALS don't apply to the
 		   decl, but to its return type.  */
 		type_quals = TYPE_UNQUALIFIED;
Index: gcc/testsuite/c-c++-common/pr69733.c
===
--- gcc/testsuite/c-c++-common/pr69733.c	(revision 0)
+++ gcc/testsuite/c-c++-common/pr69733.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-W -fdiagnostics-show-caret" } */
+
+typedef const double cd;
+double val;
+
+const double val0() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ const double val0() {return val;}
+ ^
+{

[MIPS, committed] microMIPS testsuite cleanup

2016-05-04 Thread Moore, Catherine


2016-05-04  Kwok Cheung Yeung  

* gcc.target/mips/mips16-attributes.c: Skip if -mmicromips
flag is present.

Index: gcc.target/mips/mips16-attributes.c
===
--- gcc.target/mips/mips16-attributes.c (revision 235880)
+++ gcc.target/mips/mips16-attributes.c (working copy)
@@ -3,6 +3,7 @@
function.  */
 /* { dg-do run } */
 /* { dg-options "(-mips16)" } */
+/* { dg-skip-if "" { *-*-* } { "-mmicromips" } { "" } } */

 #include

Re: [PATCH][genrecog] Fix warning about potentially uninitialised use of label

2016-05-04 Thread Jeff Law


On 05/03/2016 10:28 AM, Kyrill Tkachov wrote:

After experimenting a bit, I note that the warning goes away when I
compile with -O2.
In the cross compiler build I'm doing genrecog.c is compiler with -O1,
which exhibits the warning.
So I suppose DOM/VRP does catch, but only at the appropriate
optimisation level.
That makes sense.  Thanks for digging further into this.  I think this 
should be considered a non-issue.


Jeff

[PING*2][PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-05-04 Thread Pierre-Marie de Rodat

Ping for the patch submitted at 
. It applies 
just fine on the current trunk and still bootstrapps and regtests 
successfuly on x86_64-linux.


Thank you in advance,

--
Pierre-Marie de Rodat

Re: Enabling -frename-registers?

2016-05-04 Thread Ramana Radhakrishnan

On Wed, May 4, 2016 at 2:37 PM, Bernd Schmidt  wrote:
> On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote:
>>
>> On ARM / AArch32 I haven't seen any performance data yet - the one
>> place we are concerned about the impact is on Thumb2 code size as
>> regrename may end up inadvertently putting more things in high
>> registers.
>
>
> In theory at least arm_preferred_rename_class is designed to make the
> opposite happen.

Indeed, yes - I'd forgotten that hook - but yes we should see if
something sticks out.

regards
Ramana
>
>
> Bernd

Re: Enabling -frename-registers?

2016-05-04 Thread Bernd Schmidt


On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote:

On ARM / AArch32 I haven't seen any performance data yet - the one
place we are concerned about the impact is on Thumb2 code size as
regrename may end up inadvertently putting more things in high
registers.


In theory at least arm_preferred_rename_class is designed to make the 
opposite happen.



Bernd

Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-05-04 Thread Thomas Preudhomme

On Friday 29 April 2016 16:07:23 Kyrill Tkachov wrote:

> 
> Ok for trunk.
> Thanks,
> Kyrill

Committed with the following obvious fix:


> 
> >>> diff --git a/gcc/config.gcc b/gcc/config.gcc
> >>> index 59aee2c..be3c720 100644
> >>> --- a/gcc/config.gcc
> >>> +++ b/gcc/config.gcc
> >>> @@ -3772,38 +3772,40 @@ case "${target}" in
> >>> 
> >>>   # Add extra multilibs
> >>>   if test "x$with_multilib_list" != x; then
> >>>   
> >>>   arm_multilibs=`echo $with_multilib_list | sed -e
> >>> 
> >>> 's/,/ /g'`
> >>> - for arm_multilib in ${arm_multilibs}; do
> >>> - case ${arm_multilib} in
> >>> - aprofile)
> >>> + case ${arm_multilibs} in
> >>> + aprofile)
> >>> 
> >>>   # Note that arm/t-aprofile is a
> >>>   # stand-alone make file fragment to be
> >>>   # used only with itself.  We do not
> >>>   # specifically use the
> >>>   # TM_MULTILIB_OPTION framework
> >>> 
> >>> because
> >>> 
> >>>   # this shorthand is more
> >>> 
> >>> - # pragmatic. Additionally it is only
> >>> - # designed to work without any
> >>> - # with-cpu, with-arch with-mode
> >>> + # pragmatic.
> >>> + tmake_profile_file="arm/t-aprofile"
> >>> + ;;
> >>> + default)
> >>> + ;;
> >>> + *)
> >>> + echo "Error: --with-multilib-
> >>> list=${with_multilib_list} not supported." 1>&2
> >>> + exit 1
> >>> + ;;
> >>> + esac
> >>> +
> >>> + if test "x${tmake_profile_file}" != x ; then
> >>> + # arm/t-aprofile is only designed to work
> >>> + # without any with-cpu, with-arch, with-
> >>> mode,
> >>> 
> >>>   # with-fpu or with-float options.
> >>> 
> >>> - if test "x$with_arch" != x \
> >>> - || test "x$with_cpu" != x \
> >>> - || test "x$with_float" != x \
> >>> - || test "x$with_fpu" != x \
> >>> - || test "x$with_mode" != x ;
> >>> then
> >>> - echo "Error: You cannot use
> >>> any of --with-arch/cpu/fpu/float/mode with
> >>> --with-multilib-list=aprofile"
> >>> 1>&2
> >>> - exit 1
> >>> - fi
> >>> - tmake_file="${tmake_file}
> >>> arm/t-aprofile"
> >>> - break
> >>> - ;;
> >>> - default)
> >>> - ;;
> >>> - *)
> >>> - echo "Error: --with-multilib-
> >>> list=${with_multilib_list} not supported." 1>&2
> >>> - exit 1
> >>> - ;;
> >>> - esac
> >>> - done
> >>> + if test "x$with_arch" != x \
> >>> + || test "x$with_cpu" != x \
> >>> + || test "x$with_float" != x \
> >>> + || test "x$with_fpu" != x \
> >>> + || test "x$with_mode" != x ; then
> >>> + echo "Error: You cannot use any of --
> >>> with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}"
> >>> 1>&2

Use ${with_multilib_list} instead of ${arm_multilib} which is not in scope 
here.

Best regards,

Thomas

Re: Enabling -frename-registers?

2016-05-04 Thread Ramana Radhakrishnan

On 04/05/16 11:26, Eric Botcazou wrote:
>> Given how many latent bugs it has shown up I think that alone would make
>> it valuable to have enabled at -O2.
> 
> It might be worthwhile to test it on embedded architectures because modern 
> x86 
> and PowerPC processors are probably not very sensitive to this kind of tweaks.
> 

On ARM / AArch32 I haven't seen any performance data yet - the one place we are 
concerned about the impact is on Thumb2 code size as regrename may end up 
inadvertently putting more things in high registers. We'll check on that 
separately, having been away recently I haven't done any recent measurements 
with CSiBe - I'll try and get that done this week. Our bots have been a bit too 
flaky recently for me to say this with any certainty at the minute.

On AArch64 (thanks to Wilco for some benchmarking), we've generally seen a 
small upside in our benchmarks (a couple of proprietary suites that I cannot 
name and SPEC2k(6)) aside from one major regression which is arguably an issue 
in the backend pattern for that particular intrinsic (aes) and would have been 
visible with -frename-regs anyway ! That needs to be fixed irrespective of 
turning this on by default. Wilco tells me that on SPEC2k6 we see a code size 
improvement by removing quite a lot of redundant moves though the performance 
difference appears to be in the noise on SPEC2k6. It does appear to look 
positive for aarch64 at first glance.

regards
Ramana

Re: C/C++ PATCH to add -Wdangling-else option

2016-05-04 Thread Marek Polacek

On Tue, Apr 26, 2016 at 03:03:25PM +0200, Bernd Schmidt wrote:
> On 04/26/2016 02:39 PM, Jakub Jelinek wrote:
> > I support that change, and -Wparentheses will still enable this, it just
> > gives more fine-grained control and be in line with what clang does.
> > 
> > Bernd, how much are you against this change?
> 
> Don't really care that much, I just don't quite see the point. Don't let me
> stop you though.

So Joseph, what do you think about this patch? :)

Marek

[RS6000] Stop regrename twiddling with split-stack prologue

2016-05-04 Thread Alan Modra

Bootstrap and regression tested powerpc64le-linux.  Fixes 771 Go
testsuite regressions.  OK to apply everywhere?

The alternative of adding all parameter regs used by cfun to the
__morestack CALL_INSN_FUNCTION_USAGE and uses for cfun return value
regs seems overkill when all we need to do is protect a very small
sequence of insns.

PR target/70947
* config/rs6000/rs6000.c (rs6000_expand_split_stack_prologue): Stop
regrename modifying insns saving lr before __morestack call.
* config/rs6000/rs6000.md (split_stack_return): Similarly for
insns restoring lr after __morestack call.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ceb3705..0660427 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27970,6 +27970,11 @@ rs6000_expand_split_stack_prologue (void)
   const0_rtx, const0_rtx));
   call_fusage = NULL_RTX;
   use_reg (_fusage, r12);
+  /* Say the call uses r0, even though it doesn't, to stop regrename
+ from twiddling with the insns saving lr, trashing args for cfun.
+ The insns restoring lr are similarly protected by making
+ split_stack_return use r0.  */
+  use_reg (_fusage, r0);
   add_function_usage_to (insn, call_fusage);
   emit_insn (gen_frame_load (r0, r1, info->lr_save_offset));
   insn = emit_move_insn (lr, r0);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a6f219c..87e7879 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12587,8 +12587,10 @@
(set_attr "indexed" "no")])
 
 ;; A return instruction which the middle-end doesn't see.
+;; Use r0 to stop regrename twiddling with lr restore insns emitted
+;; after the call to __morestack.
 (define_insn "split_stack_return"
-  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_RETURN)]
+  [(unspec_volatile [(use (reg:SI 0))] UNSPECV_SPLIT_STACK_RETURN)]
   ""
   "blr"
   [(set_attr "type" "jmpreg")])

-- 
Alan Modra
Australia Development Lab, IBM

[MIPS,committed] Update MIPS P5600 processor definition to avoid IMADD

2016-05-04 Thread Matthew Fortune

The P5600 processor has a penalty for using integer multiply-add similar to
the 74k so mark it to avoid the instruction by default.

Committed as r235873.

Matthew

gcc/

* config/mips/mips-cpus.def (p5600): Avoid IMADD by default.
Clean up p5600 comments.

---
 gcc/ChangeLog | 5 +
 gcc/config/mips/mips-cpus.def | 8 +++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 623b269..ea32ba5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-04  Matthew Fortune  
+
+   * config/mips/mips-cpus.def (p5600): Avoid IMADD by default.
+   Clean up p5600 comments.
+
 2016-05-04  Richard Biener  
 
* match.pd: Add BIT_FIELD_REF canonicalizations and vector
diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-cpus.def
index 17034f2..5df9807 100644
--- a/gcc/config/mips/mips-cpus.def
+++ b/gcc/config/mips/mips-cpus.def
@@ -44,10 +44,7 @@ MIPS_CPU ("mips4", PROCESSOR_R1, 4, 0)
isn't tuned to a specific processor.  */
 MIPS_CPU ("mips32", PROCESSOR_4KC, 32, PTF_AVOID_BRANCHLIKELY)
 MIPS_CPU ("mips32r2", PROCESSOR_74KF2_1, 33, PTF_AVOID_BRANCHLIKELY)
-/* mips32r3 is micromips hense why it uses the M4K processor.
-   mips32r5 should use the p5600 processor, but there is no definition 
-   for this yet, so in the short term we will use the same processor entry 
-   as mips32r2.  */
+/* mips32r3 is micromips hense why it uses the M4K processor.  */
 MIPS_CPU ("mips32r3", PROCESSOR_M4K, 34, PTF_AVOID_BRANCHLIKELY)
 MIPS_CPU ("mips32r5", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY)
 MIPS_CPU ("mips32r6", PROCESSOR_I6400, 37, 0)
@@ -150,7 +147,8 @@ MIPS_CPU ("1004kf1_1", PROCESSOR_24KF1_1, 33, 0)
 MIPS_CPU ("interaptiv", PROCESSOR_24KF2_1, 33, 0)
 
 /* MIPS32 Release 5 processors.  */
-MIPS_CPU ("p5600", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY)
+MIPS_CPU ("p5600", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY
+   | PTF_AVOID_IMADD)
 MIPS_CPU ("m5100", PROCESSOR_M5100, 36, PTF_AVOID_BRANCHLIKELY)
 MIPS_CPU ("m5101", PROCESSOR_M5100, 36, PTF_AVOID_BRANCHLIKELY)
 
-- 
2.2.1

Re: [RS6000] Align .toc section

2016-05-04 Thread David Edelsohn

On Wed, May 4, 2016 at 1:07 AM, Alan Modra  wrote:
> Lack of any .toc section alignment causes kexec and kdump failure
> when linking without the usual linker script.  This of course is
> really a kexec-tools error, now fixed, but it is also true that .toc
> ought to always be word aligned.
>
> Bootstrapped and regression tested powerpc64le-linux and
> powerpc64-linux.  OK to apply?
>
> * config/rs6000/rs6000.c (rs6000_elf_output_toc_section_asm_op):
> Align .toc.

Okay.

Thanks, David

Re: [PATCH] Better location info for "incomplete type" error msg (PR c/70756)

2016-05-04 Thread Marek Polacek

On Tue, May 03, 2016 at 08:05:47PM -0400, Jason Merrill wrote:
> Looks good.
> 
> But I don't see a C++ testcase; can the test go into c-c++-common?

Sadly, no.  As of now, the patch doesn't improve things for C++ (?).  Seems
we'd need to pass better locations down to pointer_int_sum / size_in_bytes.
It cascades :(.

Marek

Re: [RS6000] Simplify sysv4.h TARGET_TOC

2016-05-04 Thread David Edelsohn

On Wed, May 4, 2016 at 1:28 AM, Alan Modra  wrote:
> We can use the TARGET_* defines here.  There isn't any reason to use
> the underlying variable and masks.  (The only reason I'm aware of to
> use them is when a target config file redefines some TARGET_* macro,
> say to 0 or 1, but you need to report an error in override_options if
> a user selects a command line option that attempts to change the
> value.)  Also, TARGET_RELOCATABLE implies TARGET_MINIMAL_TOC, so there
> is no need to test TARGET_RELOCATABLE.
>
> Bootstrapped and regression tested powerpc64le-linux and
> powerpc64-linux.  OK to apply?
>
> * config/rs6000/sysv4.h (TARGET_TOC): Simplify.
>
> diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
> index a4009c3..46d2b4b 100644
> --- a/gcc/config/rs6000/sysv4.h
> +++ b/gcc/config/rs6000/sysv4.h
> @@ -40,10 +40,8 @@
>  #undef ASM_DEFAULT_SPEC
>  #defineASM_DEFAULT_SPEC "-mppc"
>
> -#defineTARGET_TOC  ((rs6000_isa_flags & 
> OPTION_MASK_64BIT) \
> -|| ((rs6000_isa_flags  \
> - & (OPTION_MASK_RELOCATABLE\
> -| OPTION_MASK_MINIMAL_TOC))\
> +#defineTARGET_TOC  (TARGET_64BIT 
>   \
> +|| (TARGET_MINIMAL_TOC \
>  && flag_pic > 1)   \
>  || DEFAULT_ABI != ABI_V4)

Okay.

Thanks, David

Re: [RS6000] Correct PIC_OFFSET_TABLE_REGNUM

2016-05-04 Thread David Edelsohn

On Wed, May 4, 2016 at 1:30 AM, Alan Modra  wrote:
> Leaving this as r30 results in pic_offset_table_rtx of (reg 30)
> for -m64, which is completely bogus.  Various rtl analysis predicate
> functions treat pic_offset_table_rtx specially..
>
> Bootsrapped etc.  OK to apply?
>
> * config/rs6000/rs6000.h (PIC_OFFSET_TABLE_REGNUM): Correct.
>
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 230ca43..9647106 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2050,7 +2050,10 @@ do {   
>\
> to allocate such a register (if necessary).  */
>
>  #define RS6000_PIC_OFFSET_TABLE_REGNUM 30
> -#define PIC_OFFSET_TABLE_REGNUM (flag_pic ? RS6000_PIC_OFFSET_TABLE_REGNUM : 
> INVALID_REGNUM)
> +#define PIC_OFFSET_TABLE_REGNUM \
> +  (TARGET_TOC ? TOC_REGISTER   \
> +   : flag_pic ? RS6000_PIC_OFFSET_TABLE_REGNUM \
> +   : INVALID_REGNUM)
>
>  #define TOC_REGISTER (TARGET_MINIMAL_TOC ? RS6000_PIC_OFFSET_TABLE_REGNUM : 
> 2)

Okay.

Thanks, David

[PATCH] Move BIT_FIELD_REF folding to match.pd

2016-05-04 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-05-04  Richard Biener  

* match.pd: Add BIT_FIELD_REF canonicalizations and vector
constructor simplifications.
* fold-const.c (fold_ternary_loc): Remove duplicate functionality
here.

Index: gcc/match.pd
===
*** gcc/match.pd.orig   2016-05-03 15:15:58.002994741 +0200
--- gcc/match.pd2016-05-04 09:35:57.656880256 +0200
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 3256,3258 
--- 3256,3354 
 WARN_STRICT_OVERFLOW_COMPARISON);
}
(cmp @0 { res; })
+ 
+ /* Canonicalizations of BIT_FIELD_REFs.  */
+ 
+ (simplify
+  (BIT_FIELD_REF @0 @1 @2)
+  (switch
+   (if (TREE_CODE (TREE_TYPE (@0)) == COMPLEX_TYPE
+&& tree_int_cst_equal (@1, TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0)
+(switch
+ (if (integer_zerop (@2))
+  (view_convert (realpart @0)))
+ (if (tree_int_cst_equal (@2, TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0)
+  (view_convert (imagpart @0)
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+&& INTEGRAL_TYPE_P (type)
+/* A bit-field-ref that referenced the full argument can be stripped.  
*/
+&& ((compare_tree_int (@1, TYPE_PRECISION (TREE_TYPE (@0))) == 0
+   && integer_zerop (@2))
+  /* Low-parts can be reduced to integral conversions.
+ ???  The following doesn't work for PDP endian.  */
+  || (BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
+  /* Don't even think about BITS_BIG_ENDIAN.  */
+  && TYPE_PRECISION (TREE_TYPE (@0)) % BITS_PER_UNIT == 0
+  && TYPE_PRECISION (type) % BITS_PER_UNIT == 0
+  && compare_tree_int (@2, (BYTES_BIG_ENDIAN
+? (TYPE_PRECISION (TREE_TYPE (@0))
+   - TYPE_PRECISION (type))
+: 0)) == 0)))
+(convert @0
+ 
+ /* Simplify vector extracts.  */
+ 
+ (simplify
+  (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)
+  (if (VECTOR_TYPE_P (TREE_TYPE (@0))
+   && (types_match (type, TREE_TYPE (TREE_TYPE (@0)))
+   || (VECTOR_TYPE_P (type)
+ && types_match (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0))
+   (with
+{
+  tree ctor = (TREE_CODE (@0) == SSA_NAME
+ ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0);
+  tree eltype = TREE_TYPE (TREE_TYPE (ctor));
+  unsigned HOST_WIDE_INT width = tree_to_uhwi (TYPE_SIZE (eltype));
+  unsigned HOST_WIDE_INT n = tree_to_uhwi (@1);
+  unsigned HOST_WIDE_INT idx = tree_to_uhwi (@2);
+}
+(if (n != 0
+   && (idx % width) == 0
+   && (n % width) == 0
+   && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (ctor)))
+ (with
+  {
+idx = idx / width;
+n = n / width;
+/* Constructor elements can be subvectors.  */
+unsigned HOST_WIDE_INT k = 1;
+if (CONSTRUCTOR_NELTS (ctor) != 0)
+  {
+tree cons_elem = TREE_TYPE (CONSTRUCTOR_ELT (ctor, 0)->value);
+  if (TREE_CODE (cons_elem) == VECTOR_TYPE)
+k = TYPE_VECTOR_SUBPARTS (cons_elem);
+}
+  }
+  (switch
+   /* We keep an exact subset of the constructor elements.  */
+   (if ((idx % k) == 0 && (n % k) == 0)
+(if (CONSTRUCTOR_NELTS (ctor) == 0)
+ { build_constructor (type, NULL); }
+   (with
+{
+  idx /= k;
+  n /= k;
+}
+(if (n == 1)
+ (if (idx < CONSTRUCTOR_NELTS (ctor))
+  { CONSTRUCTOR_ELT (ctor, idx)->value; }
+  { build_zero_cst (type); })
+ {
+   vec *vals;
+   vec_alloc (vals, n);
+   for (unsigned i = 0;
+i < n && idx + i < CONSTRUCTOR_NELTS (ctor); ++i)
+ CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
+ CONSTRUCTOR_ELT (ctor, idx + i)->value);
+   build_constructor (type, vals);
+ }
+   /* The bitfield references a single constructor element.  */
+   (if (idx + n <= (idx / k + 1) * k)
+(switch
+ (if (CONSTRUCTOR_NELTS (ctor) <= idx / k)
+{ build_zero_cst (type); })
+   (if (n == k)
+{ CONSTRUCTOR_ELT (ctor, idx / k)->value; })
+   (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; }
+  @1 { bitsize_int ((idx % k) * width); })
Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2016-05-03 14:20:49.209109405 +0200
--- gcc/fold-const.c2016-05-03 16:17:58.209422083 +0200
*** fold_ternary_loc (location_t loc, enum t
*** 11729,11737 
gcc_unreachable ();
  
  case BIT_FIELD_REF:
!   if ((TREE_CODE (arg0) ==

Re: [PATCH 0/3] Simplify the simple_return handling

2016-05-04 Thread Bernd Schmidt


On 05/04/2016 02:10 AM, Segher Boessenkool wrote:


Is this sufficient explanation, is it okay with the fprintf's fixed?


Yeah, I suppose. From looking at some of the examples I have here I 
think there's still room for doubt whether all the alignment choices 
make perfect sense, but it's probably not something your patchkit is 
doing wrong.



Bernd

[patch] libstdc++/70940 Start fixing polymorphic memory resources

2016-05-04 Thread Jonathan Wakely


This fixes the first errors noted in the PR, so that resource_adaptor
doesn't rely on anything that isn't guaranteed by the Allocator
requirements, and __null_memory_resource::do_is_equal doesn't have a
missing return.

More work is needed to solve the alignment bugs, which I'll work on
soon.

Tested x86_64-linux, committed to trunk.

I'll backport these fixes to gcc-6 as well once they're complete.


commit 0b7946fb5b97b6c0beffb1326e3e455a12b8b99f
Author: Jonathan Wakely 
Date:   Wed May 4 12:23:15 2016 +0100

libstdc++/70940 Start fixing polymorphic memory resources

	PR libstdc++/70940
	* include/experimental/memory_resource
	(__resource_adaptor_imp::do_allocate): Do not default-construct
	rebound allocator.
	(__resource_adaptor_imp::do_deallocate): Likewise. Use
	allocator_traits to get pointer type.
	(__null_memory_resource::do_allocate): Remove unused parameters.
	(__null_memory_resource::do_deallocate): Likewise.
	(__null_memory_resource::do_is_equal): Likewise. Add return statement.
	* testsuite/experimental/type_erased_allocator/1.cc: Combine with ...
	* testsuite/experimental/type_erased_allocator/1_neg.cc: This, and
	move to ...
	* testsuite/experimental/memory_resource/1.cc: Here.
	* testsuite/experimental/memory_resource/null_memory_resource.cc: New.
	* testsuite/experimental/memory_resource/resource_adaptor.cc: New.

diff --git a/libstdc++-v3/include/experimental/memory_resource b/libstdc++-v3/include/experimental/memory_resource
index ccdf5e6..ea8afb8 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -282,7 +282,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	size_t __new_size = _S_aligned_size(__bytes,
 	_S_supported(__alignment) ?
 	__alignment : _S_max_align);
-	return _Aligned_alloc().allocate(__new_size);
+	return _Aligned_alloc(_M_alloc).allocate(__new_size);
   }
 
   virtual void
@@ -292,9 +292,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	size_t __new_size = _S_aligned_size(__bytes,
 	_S_supported(__alignment) ?
 	__alignment : _S_max_align);
-	_Aligned_alloc().deallocate(static_cast(__p),
-__new_size);
+	using _Ptr = typename allocator_traits<_Aligned_alloc>::pointer;
+	_Aligned_alloc(_M_alloc).deallocate(static_cast<_Ptr>(__p),
+	__new_size);
   }
 
   virtual bool
@@ -306,8 +306,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 private:
   // Calculate Aligned Size
-  // Returns a size that is larger than or equal to __size and divided by
-  // __alignment, where __alignment is required to be the power of 2.
+  // Returns a size that is larger than or equal to __size and divisible
+  // by __alignment, where __alignment is required to be the power of 2.
   static size_t
   _S_aligned_size(size_t __size, size_t __alignment)
   { return ((__size - 1)|(__alignment - 1)) + 1; }
@@ -342,16 +342,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
 protected:
   void*
-  do_allocate(size_t __bytes, size_t __alignment)
+  do_allocate(size_t, size_t)
   { std::__throw_bad_alloc(); }
 
   void
-  do_deallocate(void* __p, size_t __bytes, size_t __alignment)
+  do_deallocate(void*, size_t, size_t) noexcept
   { }
 
   bool
   do_is_equal(const memory_resource& __other) const noexcept
-  { }
+  { return this == &__other; }
 
   friend memory_resource* null_memory_resource() noexcept;
 };
diff --git a/libstdc++-v3/testsuite/experimental/memory_resource/1.cc b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
new file mode 100644
index 000..38cbd27
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/memory_resource/1.cc
@@ -0,0 +1,161 @@
+// { dg-options "-std=gnu++14" }
+
+// Copyright (C) 2015-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+using std::experimental::pmr::polymorphic_allocator;
+using std::experimental::pmr::memory_resource;
+using std::experimental::pmr::new_delete_resource;
+using std::experimental::pmr::get_default_resource;
+using std::experimental::pmr::set_default_resource;
+
+struct A
+{
+  A() { ++ctor_count; }
+

RE: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-05-04 Thread Kaushik Phatak


Hi Nick,
I have modified and updated this patch as per your comments.
Apologies, as it has taken me awhile for me to get back to this.
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00702.html

>> +/* Structure for G13 MDUC registers.  */ struct mduc_reg_type {
>> +  unsigned int address;
>> +  enum machine_mode mode;
>> +  bool is_volatile;

>> +struct mduc_reg_type  mduc_regs[NUM_OF_MDUC_REGS] =
>> +  {{0xf00e8, QImode, true},

> If the is_volatile field is true for all members of this array, why bother 
> having it at all ?

I have got rid of the unnecessary volatile field here.

>> +check_mduc_usage ()
>Add a void type to the declaration.

Done.

> You should have a blank line between the end of the variable 
> declarations and the start of the code.

Done.

>> > +  if (get_attr_is_g13_muldiv_insn (insn) == IS_G13_MULDIV_INSN_YES)
> I am not sure - but it might be safer to check INSN_P(insn) first

Added an INSN_P check here.

>> > +if (mduc_regs[i].mode == QImode)
>> +{
> Indentation.

Hopefully this is fixed. Some issue in editor when moving from tabs to spaces.

>> +  emit_insn (gen_movqi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
>> +}
>In the else case you are using gen_movqi to move an HImode value...
>Also you could simplify the above code like this:

Fixed this, also used a simpler logic as suggested.

>> > +fs = fs + NUM_OF_MDUC_REGS * 2;
>> if (fs > 254 * 3)
> No - this is wrong.  "fs" is the amount of extra space needed in the

Sorry, I think I misread this. I have added code at top or prologue which
will update cfun->machine->framesize.

>> +#define NUM_OF_MDUC_REGS 6
> Why define this here ?  It is only ever used in rl78,c and it can be 
> computed automatically by applying the ARRAY_SIZE macro

I have removed this marco and used ARRAY_SIZE to compute the size instead.

>> +msave-mduc-in-interrupts
>> +Target Mask(SAVE_MDUC_REGISTERS)
>> +Stores the MDUC registers in interrupt handlers for G13 target.
>>
>> +mno-save-mduc-in-interrupts
>> +Target RejectNegative Mask(NO_SAVE_MDUC_REGISTERS)
>> +Does not save the MDUC registers in interrupt handlers for G13 target.

>This looks wrong.  Surely you only need the msave-mduc-in-interrupts
>definition.  That will automatically allow -mno-save-mduc-in-interrupts, 
>since it does not have the RejectNegative attribute.  Also these is no 
>need to have two separate target mask bits.  Just SAVE_MDUC_REGISTERS 
>will do.

Well, the earlier idea was to save the MDUC registers by default for G13 
targets.
Hence the '-mno-' was introduced, but I can go with your suggestion if it 
reduces
any confusion.

>> +++ gcc/doc/invoke.texi(working copy)
> You should also add the name of the new option to the Machine Dependent

Added the new option to the list.

>> +@item -msave-mduc-in-interrupts
>Still not quite right.  The last sentence should be:
>  The MDUC registers will only be saved

Update the last line of the manual as per your suggestion.

>> My review comment is still outstanding. - from Mike Stump

The current RL78 ABI does not contain specific information about these registers
from the G13 variant of the RL78 target. We can try and request Renesas to add 
information
about the same along with the option required for this.
Nick, do you have any thoughts on this? (assuming this version of patch is 
closer to acceptance)

The patch is regression tested for "-msim -mg13 -msave-mduc-in-interrupts".

Best Regards,
Kaushik 

gcc/ChangeLog
2016-05-04  Kaushik Phatak  

* config/rl78/rl78.c (rl78_expand_prologue): Save the MDUC related
registers in all interrupt handlers if necessary.
(rl78_option_override): Add warning.
(MUST_SAVE_MDUC_REGISTERS): New macro.
(rl78_expand_epilogue): Restore the MDUC registers if necessary.
* config/rl78/rl78.c (check_mduc_usage): New function.
* config/rl78/rl78.c (mduc_regs): New structure to hold MDUC register 
data.
* config/rl78/rl78.md (is_g13_muldiv_insn): New attribute.
* config/rl78/rl78.md (mulsi3_g13): Add is_g13_muldiv_insn attribute.
* config/rl78/rl78.md (udivmodsi4_g13): Add is_g13_muldiv_insn 
attribute.
* config/rl78/rl78.md (mulhi3_g13): Add is_g13_muldiv_insn attribute.
* config/rl78/rl78.opt (msave-mduc-in-interrupts): New option.
* doc/invoke.texi (RL78 Options): Add -msave-mduc-in-interrupts.


Index: gcc/config/rl78/rl78.c
===
--- gcc/config/rl78/rl78.c  (revision 235865)
+++ gcc/config/rl78/rl78.c  (working copy)
@@ -76,6 +76,21 @@
   "sp", "ap", "psw", "es", "cs"
 };
 
+/* Structure for G13 MDUC registers.  */
+struct mduc_reg_type
+{
+  unsigned int address;
+  enum machine_mode mode;
+};
+
+struct mduc_reg_type  mduc_regs[] =
+  {{0xf00e8, QImode},
+   {0x0, HImode},
+   {0x2, HImode},
+   {0xf2224, HImode},
+   {0xf00e0, HImode},
+   {0xf00e2, HImode}};
+
 struct GTY(())

Re: Enabling -frename-registers?

2016-05-04 Thread Eric Botcazou

> Given how many latent bugs it has shown up I think that alone would make
> it valuable to have enabled at -O2.

It might be worthwhile to test it on embedded architectures because modern x86 
and PowerPC processors are probably not very sensitive to this kind of tweaks.

-- 
Eric Botcazou

Re: Update GCC 6 release page

2016-05-04 Thread Gerald Pfeifer

On Tue, 3 May 2016, Damian Rouson wrote:
> The patch below expands the list of new Fortran support for the GCC 6 
> Release Series Changes, New Features, Fixes page at 
> .https://gcc.gnu.org/gcc-6/changes.html.  Please let me know
> whether this is acceptable and will be applied.

Based on FY's approval I applied this patch on your behalf,
just wrapping a long line and adding two missing s.

Thanks for noting those additions and even preparing a patch!

Gerald

PS: Below the updated patch as applied.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.80
@@ -335,6 +336,13 @@
 
 Fortran
   
+Fortran 2008 SUBMODULE support.
+Fortran 2015 EVENT_TYPE, EVENT_POST,
+EVENT_WAIT, and EVENT_QUERY
+support.
+Improved support for Fortran 2003 deferred-length character
+variables.
+Improved support for OpenMP and OpenACC.
 The MATMUL intrinsic is now inlined for straightforward
   cases if front-end optimization is active.  The maximum size for
   inlining can be set to n with the

Re: Enabling -frename-registers?

2016-05-04 Thread Bernd Schmidt


On 05/04/2016 12:03 PM, Eric Botcazou wrote:

The IBM LTC team has tested the benefit of -frename-registers at -O2
and sees no net benefit for PowerPC -- some benchmarks improve
slightly but others degrade slightly (a few percent).  You mentioned
no overall benefit for x86.  Although you mentioned benefit for
Itanium, it is not a primary nor secondary architecture target for GCC
and continues to have limited adoption.  Andreas also reported a
bootstrap comparison failure for Itanium due to the change.


...which had nothing to do with -frename-registers but was a latent issue in
the speculation support of the scheduler, see the audit trail.


Yeah, I'd forgotten to add this to the list of issues found. Also, 
Alan's morestack representation issue.


Given how many latent bugs it has shown up I think that alone would make 
it valuable to have enabled at -O2.



Bernd

Re: Enabling -frename-registers?

2016-05-04 Thread Eric Botcazou

> The IBM LTC team has tested the benefit of -frename-registers at -O2
> and sees no net benefit for PowerPC -- some benchmarks improve
> slightly but others degrade slightly (a few percent).  You mentioned
> no overall benefit for x86.  Although you mentioned benefit for
> Itanium, it is not a primary nor secondary architecture target for GCC
> and continues to have limited adoption.  Andreas also reported a
> bootstrap comparison failure for Itanium due to the change.

...which had nothing to do with -frename-registers but was a latent issue in 
the speculation support of the scheduler, see the audit trail.

-- 
Eric Botcazou

Re: [testuite,AArch64] Make scan for 'br' more robust

2016-05-04 Thread Christophe Lyon

On 4 May 2016 at 10:43, Kyrill Tkachov  wrote:
>
> Hi Christophe,
>
>
> On 02/05/16 12:50, Christophe Lyon wrote:
>>
>> Hi,
>>
>> I've noticed a "regression" of AArch64's noplt_3.c in the gcc-6-branch
>> because my validation script adds the branch name to gcc/REVISION.
>>
>> As a result scan-assembler-times "br" also matched "gcc-6-branch",
>> hence the failure.
>>
>> The small attached patch replaces "br" by "br\t" to fix the problem.
>>
>> I've also made a similar change to tail_indirect_call_1 although the
>> problem did not happen for this test because it uses scan-assembler
>> instead of scan-assembler-times. I think it's better to make it more
>> robust too.
>>
>> OK?
>>
>> Christophe
>
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> index ef6e65d..a382618 100644
> --- a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
> @@ -16,5 +16,5 @@ cal_novalue (int a)
>dec (a);
>  }
>  -/* { dg-final { scan-assembler-times "br" 2 } } */
> +/* { dg-final { scan-assembler-times "br\t" 2 } } */
>  /* { dg-final { scan-assembler-not "b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> index 4759d20..e863323 100644
> --- a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
> @@ -3,7 +3,7 @@
>   typedef void FP (int);
>  -/* { dg-final { scan-assembler "br" } } */
> +/* { dg-final { scan-assembler "br\t" } } */
>
> Did you mean to make this scan-assembler-times as well?
>

I kept the changes minimal, but you are right, it would be more robust
as attached.

OK for trunk and gcc-6 branch?

Thanks

Christophe

> Kyrill
>
>
>
2016-05-04  Christophe Lyon  

* gcc.target/aarch64/noplt_3.c: Scan for "br\t".
* gcc.target/aarch64/tail_indirect_call_1.c: Scan for "br\t",
"blr\t" and switch to scan-assembler-times.
diff --git a/gcc/testsuite/gcc.target/aarch64/noplt_3.c 
b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
index ef6e65d..a382618 100644
--- a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
@@ -16,5 +16,5 @@ cal_novalue (int a)
   dec (a);
 }
 
-/* { dg-final { scan-assembler-times "br" 2 } } */
+/* { dg-final { scan-assembler-times "br\t" 2 } } */
 /* { dg-final { scan-assembler-not "b\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c 
b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
index 4759d20..de8f12d 100644
--- a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
@@ -3,8 +3,8 @@
 
 typedef void FP (int);
 
-/* { dg-final { scan-assembler "br" } } */
-/* { dg-final { scan-assembler-not "blr" } } */
+/* { dg-final { scan-assembler-times "br\t" 2 } } */
+/* { dg-final { scan-assembler-not "blr\t" } } */
 void
 f1 (FP fp, int n)
 {

Re: Enabling -frename-registers?

2016-05-04 Thread Alan Modra

On Wed, May 04, 2016 at 11:08:47AM +0200, Bernd Schmidt wrote:
> On 05/04/2016 11:05 AM, Alan Modra wrote:
> >I agree it's good to find these things..  Another nasty bug to add to
> >the list is complete breakage of gccgo on powerpc64le.  I see register
> >renaming around the prologue call to __morestack, which trashes
> >function arguments.
> 
> How does this come about, is there something incorrect about the RTL
> representation of __morestack?

That's highly likely.  call_fusage is just r12, it's non-standard
parameter.

-- 
Alan Modra
Australia Development Lab, IBM

Re: Enabling -frename-registers?

2016-05-04 Thread Bernd Schmidt


On 05/04/2016 11:05 AM, Alan Modra wrote:

I agree it's good to find these things..  Another nasty bug to add to
the list is complete breakage of gccgo on powerpc64le.  I see register
renaming around the prologue call to __morestack, which trashes
function arguments.


How does this come about, is there something incorrect about the RTL 
representation of __morestack?



Bernd

Re: Enabling -frename-registers?

2016-05-04 Thread Alan Modra

On Wed, May 04, 2016 at 12:52:47AM +0200, Bernd Schmidt wrote:
> I must say I find the argumentation about the fallout not compelling. It's a
> normal consequence of development work, and by enabling it at -O2, we have
> found:
>  * a Linux kernel bug
>  * a rs6000 testsuite bug
>  * some i386.md issues that can cause performance problems
>  * and a compare-debug problem in regrename itself.
> All of these are _good_ things. If we don't want to run into such issues
> we'll have to cease all development work.

I agree it's good to find these things..  Another nasty bug to add to
the list is complete breakage of gccgo on powerpc64le.  I see register
renaming around the prologue call to __morestack, which trashes
function arguments.

Dump of assembler code for function main:
   0x100014e0 <+0>: lis r2,4098
   0x100014e4 <+4>: addir2,r2,-18176
   0x100014e8 <+8>: ld  r0,-28736(r13)
   0x100014ec <+12>:addir12,r1,-16496
   0x100014f0 <+16>:nop
   0x100014f4 <+20>:cmpld   cr7,r12,r0
   0x100014f8 <+24>:blt cr7,0x10001550 
   0x100014fc <+28>:mflrr5
   0x10001500 <+32>:std r30,-16(r1)
   0x10001504 <+36>:std r31,-8(r1)
   0x10001508 <+40>:li  r8,0
   0x1000150c <+44>:nop
   0x10001510 <+48>:ld  r10,-32664(r2)
   0x10001514 <+52>:nop
   0x10001518 <+56>:ld  r6,-32672(r2)
   0x1000151c <+60>:std r5,16(r1)
   0x10001520 <+64>:stdur1,-112(r1)
   0x10001524 <+68>:stb r8,0(r6)
   0x10001528 <+72>:lbz r9,0(r10)
   0x1000152c <+76>:cmpwi   r9,0
   0x10001530 <+80>:beq 0x1000156c 
   0x10001534 <+84>:li  r3,0
   0x10001538 <+88>:addir1,r1,112
   0x1000153c <+92>:ld  r0,16(r1)
   0x10001540 <+96>:ld  r30,-16(r1)
   0x10001544 <+100>:   ld  r31,-8(r1)
   0x10001548 <+104>:   mtlrr0
   0x1000154c <+108>:   blr
=> 0x10001550 <+112>:   mflrr3  # argc trashed
   0x10001554 <+116>:   std r3,16(r1)
   0x10001558 <+120>:   bl  0x10001818 <__morestack>
   0x1000155c <+124>:   ld  r4,16(r1)
   0x10001560 <+128>:   mtlrr4
   0x10001564 <+132>:   blr
   0x10001568 <+136>:   b   0x100014fc 

__morestack has a non-standard calling convention.  After buying more
stack, it calls its caller!  In this case that means a call to
0x10001568.  So regrename can't use r3 at 0x10001550 as it is a
parameter passing reg for main.  The use of r4 at 0x1000155c would
also be wrong for a function that returns a value in r4.

-- 
Alan Modra
Australia Development Lab, IBM

[Ping ^ 2] Re: [ARM] Add support for overflow add, sub, and neg operations

2016-05-04 Thread Michael Collison


Ping. Previous Patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01472.html

Re: [testuite,AArch64] Make scan for 'br' more robust

2016-05-04 Thread Kyrill Tkachov



Hi Christophe,

On 02/05/16 12:50, Christophe Lyon wrote:

Hi,

I've noticed a "regression" of AArch64's noplt_3.c in the gcc-6-branch
because my validation script adds the branch name to gcc/REVISION.

As a result scan-assembler-times "br" also matched "gcc-6-branch",
hence the failure.

The small attached patch replaces "br" by "br\t" to fix the problem.

I've also made a similar change to tail_indirect_call_1 although the
problem did not happen for this test because it uses scan-assembler
instead of scan-assembler-times. I think it's better to make it more
robust too.

OK?

Christophe


diff --git a/gcc/testsuite/gcc.target/aarch64/noplt_3.c 
b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
index ef6e65d..a382618 100644
--- a/gcc/testsuite/gcc.target/aarch64/noplt_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/noplt_3.c
@@ -16,5 +16,5 @@ cal_novalue (int a)
   dec (a);
 }
 
-/* { dg-final { scan-assembler-times "br" 2 } } */

+/* { dg-final { scan-assembler-times "br\t" 2 } } */
 /* { dg-final { scan-assembler-not "b\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c 
b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
index 4759d20..e863323 100644
--- a/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/tail_indirect_call_1.c
@@ -3,7 +3,7 @@
 
 typedef void FP (int);
 
-/* { dg-final { scan-assembler "br" } } */

+/* { dg-final { scan-assembler "br\t" } } */

Did you mean to make this scan-assembler-times as well?

Kyrill

Re: Enabling -frename-registers?

2016-05-04 Thread Richard Biener

On Wed, May 4, 2016 at 12:52 AM, Bernd Schmidt  wrote:
> On 05/03/2016 11:26 PM, David Edelsohn wrote:
>>
>> Optimizations enabled by default at -O2 should show an overall net
>> benefit -- that is the general justification that we have used in the
>> past.  I request that this change be reverted until more compelling
>> evidence of benefit is presented.
>
>
> Shrug. Done. I was going to look at adding some more smarts to it to clean
> up after the register allocator; I guess I shan't bother.
>
> I must say I find the argumentation about the fallout not compelling. It's a
> normal consequence of development work, and by enabling it at -O2, we have
> found:
>  * a Linux kernel bug
>  * a rs6000 testsuite bug
>  * some i386.md issues that can cause performance problems
>  * and a compare-debug problem in regrename itself.
> All of these are _good_ things. If we don't want to run into such issues
> we'll have to cease all development work.
> I'll still submit final versions of the fixes for the i386 and compare-debug
> issues.

I agree with you here, enabling at -O2 was fine.  Reverting will have
regressed PR38825 again (curiously also some SSE scheduling issue).
Thus I wonder if regrename could be integrated with sched2 itself.

Richard.

>
> Bernd

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-05-04 Thread Richard Biener

On Tue, 3 May 2016, Bernd Edlinger wrote:

> On 28.04.2016 09:09, Richard Biener wrote:
> >
> > As said elsewhere the main reason for all of this is to make the
> > in-tree builds work better for newer archs that are not happy with
> > the versions provided by download_prerequesites.  This should come
> > with a documentation adjustment that the only tested in-tree
> > versions are those downloaded by dowload_prerequesites.
> 
> That patch is installed now, and so far nothing bad has happened.
> 
> > Please address updating the minimum supported _installed_ version
> > separately (in fact I do maintain a patch to disable stuff to be
> > able to go back to even older mpfr versions ... :/).
> >
> > SLES 11 ships with mpfr 2.3.2, mpc 0.8 and gmp 4.2.3 while SLES 12
> > and openSUSE Leap have gmp 5.1.3, mpfr 3.1.2 and mpc 1.0.2.
> 
> Of course updating to a more recent gmp version is not the most
> important thing in the world, and I am not trying to make gcc emit
> an error when compiling gmp 5.1.3, but I think emitting a warning
> for this code would be fair.
> 
> So here is the next step.
> 
> This patch raises the installed gmp version to 6.0.0 or higher,
> the installed mpfr version to 3.1.1 or higher and the
> installed mpc version to 0.9 or higher.  So Jakub's currently
> installed versions will be fine, even a bit older versions
> will work, but unfortunately gmp 5.1.3 will not work.
> 
> I also attached a sketch of what I'd like to propose on the
> libstdc++ list, once we can rely on the gmp.h not to trigger
> these new warnings in cstddef.  With the gmp 5.1.3 or earlier
> one of this warnings made the boot-strap fail in stage2.
> 
> Boot-strapped and reg-tested on x86_64-linux-gnu.
> Is it OK for trunk?

No, I don't see a compelling reason to force the minimum installed
version to be higher than today.

Richard.

Re: Fix tree-inlinine ICE with uninitializer return value

2016-05-04 Thread Richard Biener

On Tue, 3 May 2016, Jan Hubicka wrote:

> Hi,
> the code path handling the case where callee is missing return statement but 
> calle
> statement has LHS is broken in tree-inline since anonymous SSA_NAMEs was 
> introduced.
> This code is not not used because all those inlines are disabled by 
> gimple_check_call_matching_types, but since I would like to drop the checks 
> it seems
> good idea to fix the code path rather than dropping it.  This copies what 
> Jakub's
> code does when redirecting to unreachable.
> 
> Bootstrapped/regtesetd x86_64-linux, OK?

LGTM.  [we should eventually simply allow "anonymous" default defs
which would simply be not registered with the var->default-def map
but just be marked SSA_NAME_IS_DEFAULT_DEF and having a GIMPLE_NOP def.
Not sure where having this would break today, maybe some checking bits
in verify-ssa are not happy with such default defs]

Thanks,
Richard.

> Honza
> 
>   * tree-inline.c (expand_call_inline): Fix path inlining function
>   with no return statement.
> Index: tree-inline.c
> ===
> --- tree-inline.c (revision 235839)
> +++ tree-inline.c (working copy)
> @@ -4708,7 +4708,7 @@ expand_call_inline (basic_block bb, gimp
>   {
> tree name = gimple_call_lhs (stmt);
> tree var = SSA_NAME_VAR (name);
> -   tree def = ssa_default_def (cfun, var);
> +   tree def = var ? ssa_default_def (cfun, var) : NULL;
>  
> if (def)
>   {
> @@ -4719,6 +4719,11 @@ expand_call_inline (basic_block bb, gimp
>   }
> else
>   {
> +   if (!var)
> + {
> +   tree var = create_tmp_reg_fn (cfun, TREE_TYPE (name), NULL);
> +   SET_SSA_NAME_VAR_OR_IDENTIFIER (name, var);
> + }
> /* Otherwise make this variable undefined.  */
> gsi_remove (_gsi, true);
> set_ssa_default_def (cfun, var, name);

[SH][committed] Add support for additional SH2A post-inc/pre-dec addressing modes

2016-05-04 Thread Oleg Endo

Hi,

The attached patch adds support for the following SH2A addressing
modes:
mov.b   @-Rm,R0
mov.w   @-Rm,R0
mov.l   @-Rm,R0
mov.b   R0,@Rn+
mov.w   R0,@Rn+
mov.l   R0,@Rn+

The patch also tweaks the post-inc/pre-dec addressing mode usage on non
-SH2A targets.  CSiBE shows a total code size reduction of 1568 bytes (
-0.047074 %) for non-SH2A and a code size increase of +247 (+0.007505
%) for SH2A.

The code size increase on SH2A is due to increased register
usage/pressure.  Where before displacement modes were used (reg +
disp), it now tends to use post-inc stores.  To do that, the address
register (often the stack pointer) is often copied into another
register first.  Hopefully this issue can be improved later by the AMS
optimization.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235859.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/predicates (post_inc_mem, pre_dec_mem): New predicates.
* config/sh/sh-protos.h (sh_find_set_of_reg): Return null result if
result.set_rtx is null instead of aborting.
* config/sh/sh.h (USE_LOAD_POST_INCREMENT, USE_STORE_PRE_DECREMENT):
Always enable.
(USE_LOAD_PRE_DECREMENT, USE_STORE_POST_INCREMENT): Enable for SH2A.
* config/sh/sh.md (*extendsi2_predec, *mov_load_predec,
*mov_store_postinc): New patterns.
diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md
index 3e69d88..b582637 100644
--- a/gcc/config/sh/predicates.md
+++ b/gcc/config/sh/predicates.md
@@ -230,6 +230,18 @@
(match_test "sh_disp_addr_displacement (op)
 		<= sh_max_mov_insn_displacement (GET_MODE (op), false)")))
 
+;; Returns true if OP is a post-increment addressing mode memory reference.
+(define_predicate "post_inc_mem"
+  (and (match_code "mem")
+   (match_code "post_inc" "0")
+   (match_code "reg" "00")))
+
+;; Returns true if OP is a pre-decrement addressing mode memory reference.
+(define_predicate "pre_dec_mem"
+  (and (match_code "mem")
+   (match_code "pre_dec" "0")
+   (match_code "reg" "00")))
+
 ;; Returns 1 if the operand can be used in an SH2A movu.{b|w} insn.
 (define_predicate "zero_extend_movu_operand"
   (and (ior (match_operand 0 "displacement_mem_operand")
diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h
index ea7e847..c47e2ea 100644
--- a/gcc/config/sh/sh-protos.h
+++ b/gcc/config/sh/sh-protos.h
@@ -224,8 +224,12 @@ sh_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
 	}
 }
 
-  if (result.set_src != NULL)
-gcc_assert (result.insn != NULL && result.set_rtx != NULL);
+  /* If the searched reg is found inside a (mem (post_inc:SI (reg))), set_of
+ will return NULL and set_rtx will be NULL.
+ In this case report a 'not found'.  result.insn will always be non-null
+ at this point, so no need to check it.  */
+  if (result.set_src != NULL && result.set_rtx == NULL)
+result.set_src = NULL;
 
   return result;
 }
diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index 60c6250..16b4a8e 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1307,12 +1307,10 @@ struct sh_args {
 #define HAVE_POST_INCREMENT  TARGET_SH1
 #define HAVE_PRE_DECREMENT   TARGET_SH1
 
-#define USE_LOAD_POST_INCREMENT(mode)((mode == SImode || mode == DImode) \
-	  ? 0 : TARGET_SH1)
-#define USE_LOAD_PRE_DECREMENT(mode) 0
-#define USE_STORE_POST_INCREMENT(mode)   0
-#define USE_STORE_PRE_DECREMENT(mode)((mode == SImode || mode == DImode) \
-	  ? 0 : TARGET_SH1)
+#define USE_LOAD_POST_INCREMENT(mode) TARGET_SH1
+#define USE_LOAD_PRE_DECREMENT(mode) TARGET_SH2A
+#define USE_STORE_POST_INCREMENT(mode) TARGET_SH2A
+#define USE_STORE_PRE_DECREMENT(mode) TARGET_SH1
 
 /* If a memory clear move would take CLEAR_RATIO or more simple
move-instruction pairs, we will do a setmem instead.  */
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 2d9502b..2a8fbc8 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -4820,6 +4820,15 @@
   [(set_attr "type" "load")
(set_attr "length" "2,2,4")])
 
+;; The pre-dec and post-inc mems must be captured by the '<' and '>'
+;; constraints, otherwise wrong code might get generated.
+(define_insn "*extendsi2_predec"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=z")
+	(sign_extend:SI (match_operand:QIHI 1 "pre_dec_mem" "<")))]
+  "TARGET_SH2A"
+  "mov.	%1,%0"
+  [(set_attr "type" "load")])
+
 ;; The *_snd patterns will take care of other QImode/HImode addressing
 ;; modes than displacement addressing.  They must be defined _after_ the
 ;; displacement addressing patterns.  Otherwise the displacement addressing
@@ -5261,6 +5270,22 @@
   prepare_move_operands (operands, mode);
 })
 
+;; The pre-dec and post-inc mems must be captured by the '<' and '>'
+;; constraints, otherwise wrong code might get generated.
+(define_insn "*mov_load_predec"
+

Re: Update GCC 6 release page

2016-05-04 Thread FX

> The patch below expands the list of new Fortran support for the GCC 6 Release 
> Series Changes, New Features, Fixes page at 
> .https://gcc.gnu.org/gcc-6/changes.html.  Please let me know whether this is 
> acceptable and will be applied.

Looks OK to me. I think you can apply, and if someone wants to add/fix 
something they can commit as additional change.

Thanks,
FX

Update GCC 6 release page

2016-05-04 Thread Damian Rouson

The patch below expands the list of new Fortran support for the GCC 6 Release 
Series Changes, New Features, Fixes page at 
.https://gcc.gnu.org/gcc-6/changes.html.  Please let me know whether this is 
acceptable and will be applied.

Damian

--- original.html   2016-05-03 22:25:23.0 -0700
+++ update.html 2016-05-03 22:42:21.0 -0700
@@ -366,6 +366,10 @@
 
 Fortran
   
+Fortran 2008 SUBMODULE support.
+Fortran 2015 EVENT_TYPE, EVENT_POST, 
EVENT_WAIT, and EVENT_QUERY support.
+Improved support for Fortran 2003 deferred-length character variables.
+Improved support for OpenMP and OpenACC.
 The MATMUL intrinsic is now inlined for straightforward
   cases if front-end optimization is active.  The maximum size for
   inlining can be set to n with the

99 matches

Mail list logo