Re: Move DECL_INIT_PRIORITY/FINI_PRIORITY to symbol table

2014-06-23 Thread Jan Hubicka
 Jan Hubicka hubi...@ucw.cz writes:
 
  this patch moves init and fini priorities to symbol table instead of trees.
  They are already in on-side hashtables, but the hashtables are now maintaned
  by symbol table.  This is needed for correctness with LTO.
 
 This breaks gcc.dg/initpri3.c.  The constructor and destructor are
 miscompiled to unconditionally call abort.

Sorry for that. This is caused by lto-cgraph using the set_decl_init_priority 
that
it should not.  I am sure I updated this once already.  This patch fixes the 
problem.

Bootstrapped/regtested x86_64-linux, comitted.

* lto-cgraph.c (lto_output_node, input_node): Set/get init/fini priority
directly.
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 211881)
+++ lto-cgraph.c(working copy)
@@ -558,9 +558,9 @@ lto_output_node (struct lto_simple_outpu
 }
   streamer_write_hwi_stream (ob-main_stream, node-profile_id);
   if (DECL_STATIC_CONSTRUCTOR (node-decl))
-streamer_write_hwi_stream (ob-main_stream, DECL_INIT_PRIORITY 
(node-decl));
+streamer_write_hwi_stream (ob-main_stream, node-get_init_priority ());
   if (DECL_STATIC_DESTRUCTOR (node-decl))
-streamer_write_hwi_stream (ob-main_stream, DECL_FINI_PRIORITY 
(node-decl));
+streamer_write_hwi_stream (ob-main_stream, node-get_fini_priority ());
 }
 
 /* Output the varpool NODE to OB. 
@@ -1215,9 +1216,9 @@ input_node (struct lto_file_decl_data *f
 node-alias_target = get_alias_symbol (node-decl);
   node-profile_id = streamer_read_hwi (ib);
   if (DECL_STATIC_CONSTRUCTOR (node-decl))
-SET_DECL_INIT_PRIORITY (node-decl, streamer_read_hwi (ib));
+node-set_init_priority (streamer_read_hwi (ib));
   if (DECL_STATIC_DESTRUCTOR (node-decl))
-SET_DECL_FINI_PRIORITY (node-decl, streamer_read_hwi (ib));
+node-set_fini_priority (streamer_read_hwi (ib));
   return node;
 }
 


[PATCH, 1/10] two hooks for conditional compare (ccmp)

2014-06-23 Thread Zhenqiang Chen
Hi,

The patch adds two hooks for backends to generate conditional compare
instructions.

* gen_ccmp_first is for the first compare.
* gen_ccmp_next is for for the following compares.

The patch is separated from
https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html.

And the original discussion about the hooks was in thread:

https://gcc.gnu.org/ml/gcc-patches/2013-10/msg02601.html

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* doc/md.texi (ccmp): Add description about conditional compare
instruction pattern.
(TARGET_GEN_CCMP_FIRST): Define.
(TARGET_GEN_CCMP_NEXT): Define.
* doc/tm.texi.in (TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): New.
* target.def (gen_ccmp_first, gen_ccmp_first): Add two new hooks.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e17ffca..988c288 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6216,6 +6216,42 @@ A typical @code{ctrap} pattern looks like
   @dots{})
 @end smallexample

+@cindex @code{ccmp} instruction pattern
+@item @samp{ccmp}
+Conditional compare instruction.  Operand 2 and 5 are RTLs which perform
+two comparisons.  Operand 1 is AND or IOR, which operates on the result of
+operand 2 and 5.
+It uses recursive method to support more than two compares.  e.g.
+
+  CC0 = CMP (a, b);
+  CC1 = CCMP (NE (CC0, 0), CMP (e, f));
+  ...
+  CCn = CCMP (NE (CCn-1, 0), CMP (...));
+
+Two target hooks are used to generate conditional compares.  GEN_CCMP_FISRT
+is used to generate the first CMP.  And GEN_CCMP_NEXT is used to generate the
+following CCMPs.  Operand 1 is AND or IOR.  Operand 3 is the result of
+GEN_CCMP_FISRT or a previous GEN_CCMP_NEXT.  Operand 2 is NE.
+Operand 4, 5 and 6 is another compare expression.
+
+A typical CCMP pattern looks like
+
+@smallexample
+(define_insn *ccmp_and_ior
+  [(set (match_operand 0 dominant_cc_register )
+(compare
+ (match_operator 1
+  (match_operator 2 comparison_operator
+   [(match_operand 3 dominant_cc_register)
+(const_int 0)])
+  (match_operator 4 comparison_operator
+   [(match_operand 5 register_operand)
+(match_operand 6 compare_operand]))
+ (const_int 0)))]
+  
+  @dots{})
+@end smallexample
+
 @cindex @code{prefetch} instruction pattern
 @item @samp{prefetch}
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c272630..93f7c74 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11021,6 +11021,23 @@ This target hook is required only when the
target has several different
 modes and they have different conditional execution capability, such as ARM.
 @end deftypefn

+@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (int @var{code},
rtx @var{op0}, rtx @var{op1})
+This function emits a comparison insn for the first of a sequence of
+ conditional comparisions.  It returns a comparison expression appropriate
+ for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}.
+ @code{unsignedp} is used when converting @code{op0} and @code{op1}'s mode.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_GEN_CCMP_NEXT (rtx @var{prev},
int @var{cmp_code}, rtx @var{op0}, rtx @var{op1}, int @var{bit_code})
+This function emits a conditional comparison within a sequence of
+ conditional comparisons.  The @code{prev} expression is the result of a
+ prior call to @code{gen_ccmp_first} or @code{gen_ccmp_next}.  It may return
+ @code{NULL} if the combination of @code{prev} and this comparison is
+ not supported, otherwise the result must be appropriate for passing to
+ @code{gen_ccmp_next} or @code{cbranch_optab}.  @code{bit_code}
+ is AND or IOR, which is the op on the two compares.
+@end deftypefn
+
 @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned
@var{nunroll}, struct loop *@var{loop})
 This target hook returns a new value for the number of times @var{loop}
 should be unrolled. The parameter @var{nunroll} is the number of times
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index dd72b98..e49f8f5 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8107,6 +8107,10 @@ build_type_attribute_variant (@var{mdecl},

 @hook TARGET_HAVE_CONDITIONAL_EXECUTION

+@hook TARGET_GEN_CCMP_FIRST
+
+@hook TARGET_GEN_CCMP_NEXT
+
 @hook TARGET_LOOP_UNROLL_ADJUST

 @defmac POWI_MAX_MULTS
diff --git a/gcc/target.def b/gcc/target.def
index e455211..6bbb907 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2320,6 +2320,27 @@ modes and they have different conditional
execution capability, such as ARM.,
  bool, (void),
  default_have_conditional_execution)

+DEFHOOK
+(gen_ccmp_first,
+ This function emits a comparison insn for the first of a sequence of\n\
+ conditional comparisions.  It returns a comparison expression appropriate\n\
+ for passing to @code{gen_ccmp_next} or to @code{cbranch_optab}.\n\
+ @code{unsignedp} is used when converting @code{op0} and @code{op1}'s mode.,
+ rtx, (int code, rtx op0, rtx op1),
+ NULL)

[PATCH, 2/10] prepare ccmp

2014-06-23 Thread Zhenqiang Chen
Hi,

The patch makes several functions global, which will be used when
expanding ccmp instructions.

The other change in this patch is to check CCMP when turning code into
jumpy sequence.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* cfgexpand.c (expand_gimple_cond): Check conditional compare.
* expmed.c (emit_cstore): Make it global.
* expmed.h: #include insn-codes.h.
(emit_cstore): New prototype.
* expr.c (expand_operands): Make it global.
* expr.h (expand_operands): New prototype.
* optabs.c (get_rtx_code): Make it global and return CODE for
BIT_AND_EXPR and BIT_IOR_EXPR.
* optabs.h (get_rtx_code): New prototype.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e8cd87f..a32e1b3 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2095,9 +2095,10 @@ expand_gimple_cond (basic_block bb, gimple stmt)
  op0 = gimple_assign_rhs1 (second);
  op1 = gimple_assign_rhs2 (second);
}
- /* If jumps are cheap turn some more codes into
-jumpy sequences.  */
- else if (BRANCH_COST (optimize_insn_for_speed_p (), false)  4)
+ /* If jumps are cheap and the target does not support conditional
+compare, turn some more codes into jumpy sequences.  */
+ else if (BRANCH_COST (optimize_insn_for_speed_p (), false)  4
+   (targetm.gen_ccmp_first == NULL))
{
  if ((code2 == BIT_AND_EXPR
TYPE_PRECISION (TREE_TYPE (op0)) == 1
diff --git a/gcc/expmed.c b/gcc/expmed.c
index e76b6fc..c8d63a9 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5105,7 +5105,7 @@ expand_and (enum machine_mode mode, rtx op0, rtx
op1, rtx target)
 }

 /* Helper function for emit_store_flag.  */
-static rtx
+rtx
 emit_cstore (rtx target, enum insn_code icode, enum rtx_code code,
 enum machine_mode mode, enum machine_mode compare_mode,
 int unsignedp, rtx x, rtx y, int normalizep,
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 4d01d1f..a567bad 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -20,6 +20,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef EXPMED_H
 #define EXPMED_H 1

+#include insn-codes.h
+
 enum alg_code {
   alg_unknown,
   alg_zero,
@@ -665,4 +667,9 @@ convert_cost (enum machine_mode to_mode, enum
machine_mode from_mode,
 }

 extern int mult_by_coeff_cost (HOST_WIDE_INT, enum machine_mode, bool);
+
+extern rtx emit_cstore (rtx target, enum insn_code icode, enum rtx_code code,
+enum machine_mode mode, enum machine_mode compare_mode,
+int unsignedp, rtx x, rtx y, int normalizep,
+enum machine_mode target_mode);
 #endif
diff --git a/gcc/expr.c b/gcc/expr.c
index 512c024..04cf56e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -146,8 +146,6 @@ static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target
(const_tree, const_tree);

 static int is_aligning_offset (const_tree, const_tree);
-static void expand_operands (tree, tree, rtx, rtx*, rtx*,
-enum expand_modifier);
 static rtx reduce_to_bit_field_precision (rtx, rtx, tree);
 static rtx do_store_flag (sepops, rtx, enum machine_mode);
 #ifdef PUSH_ROUNDING
@@ -7496,7 +7494,7 @@ convert_tree_comp_to_rtx (enum tree_code tcode,
int unsignedp)
The value may be stored in TARGET if TARGET is nonzero.  The
MODIFIER argument is as documented by expand_expr.  */

-static void
+void
 expand_operands (tree exp0, tree exp1, rtx target, rtx *op0, rtx *op1,
 enum expand_modifier modifier)
 {
diff --git a/gcc/expr.h b/gcc/expr.h
index 6a1d3ab..66ca82f 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -787,4 +787,6 @@ extern bool categorize_ctor_elements (const_tree,
HOST_WIDE_INT *,
by EXP.  This does not include any offset in DECL_FIELD_BIT_OFFSET.  */
 extern tree component_ref_field_offset (tree);

+extern void expand_operands (tree, tree, rtx, rtx*, rtx*,
+enum expand_modifier);
 #endif /* GCC_EXPR_H */
diff --git a/gcc/optabs.c b/gcc/optabs.c
index ca1c194..25aff1a 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6453,7 +6453,7 @@ gen_cond_trap (enum rtx_code code, rtx op1, rtx
op2, rtx tcode)
 /* Return rtx code for TCODE. Use UNSIGNEDP to select signed
or unsigned operation code.  */

-static enum rtx_code
+enum rtx_code
 get_rtx_code (enum tree_code tcode, bool unsignedp)
 {
   enum rtx_code code;
@@ -6503,6 +6503,12 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
   code = LTGT;
   break;

+case BIT_AND_EXPR:
+  code = AND;
+  break;
+case BIT_IOR_EXPR:
+  code = IOR;
+  break;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 089b15a..61be4e2 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -91,6 

[PATCH, 3/10] skip swapping operands used in ccmp

2014-06-23 Thread Zhenqiang Chen
Hi,

Swapping operands in a ccmp will lead to illegal instructions. So the
patch disables it in simplify_while_replacing.

The patch is separated from
https://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html.

To make it clean. The patch adds two files: ccmp.{c,h} to hold all new
ccmp related functions.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* Makefile.in: Add ccmp.o
* ccmp.c: New file.
* ccmp.h: New file.
* recog.c (simplify_while_replacing): Check ccmp_insn_p.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5587b75..8757a30 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1169,6 +1169,7 @@ OBJS = \
builtins.o \
caller-save.o \
calls.o \
+   ccmp.o \
cfg.o \
cfganal.o \
cfgbuild.o \
diff --git a/gcc/ccmp.c b/gcc/ccmp.c
new file mode 100644
index 000..665c2a5
--- /dev/null
+++ b/gcc/ccmp.c
@@ -0,0 +1,62 @@
+/* Conditional compare related functions
+   Copyright (C) 2014-2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+#include rtl.h
+#include tree.h
+#include stringpool.h
+#include regs.h
+#include expr.h
+#include optabs.h
+#include tree-iterator.h
+#include basic-block.h
+#include tree-ssa-alias.h
+#include internal-fn.h
+#include gimple-expr.h
+#include is-a.h
+#include gimple.h
+#include gimple-ssa.h
+#include tree-ssanames.h
+#include target.h
+#include common/common-target.h
+#include df.h
+#include tree-ssa-live.h
+#include tree-outof-ssa.h
+#include cfgexpand.h
+#include tree-phinodes.h
+#include ssa-iterators.h
+#include expmed.h
+#include ccmp.h
+
+bool
+ccmp_insn_p (rtx object)
+{
+  rtx x = PATTERN (object);
+  if (targetm.gen_ccmp_first
+   GET_CODE (x) == SET
+   GET_CODE (XEXP (x, 1)) == COMPARE
+   (GET_CODE (XEXP (XEXP (x, 1), 0)) == IOR
+ || GET_CODE (XEXP (XEXP (x, 1), 0)) == AND))
+return true;
+  return false;
+}
+
diff --git a/gcc/ccmp.h b/gcc/ccmp.h
new file mode 100644
index 000..7e139aa
--- /dev/null
+++ b/gcc/ccmp.h
@@ -0,0 +1,25 @@
+/* Conditional comapre related functions.
+   Copyright (C) 2014-2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.:
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#ifndef GCC_CCMP_H
+#define GCC_CCMP_H
+
+extern bool ccmp_insn_p (rtx);
+
+#endif  /* GCC_CCMP_H  */
diff --git a/gcc/recog.c b/gcc/recog.c
index 8d10a4f..b53a28c 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include tree-pass.h
 #include df.h
 #include insn-codes.h
+#include ccmp.h

 #ifndef STACK_PUSH_CODE
 #ifdef STACK_GROWS_DOWNWARD
@@ -577,7 +578,8 @@ simplify_while_replacing (rtx *loc, rtx to, rtx object,
   enum rtx_code code = GET_CODE (x);
   rtx new_rtx = NULL_RTX;

-  if (SWAPPABLE_OPERANDS_P (x)
+  /* Do not swap compares in conditional compare instruction.  */
+  if (SWAPPABLE_OPERANDS_P (x)  !ccmp_insn_p (object)
swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)))
 {
   validate_unshare_change (object, loc,


[PATCH, 4/10] expand ccmp

2014-06-23 Thread Zhenqiang Chen
Hi,

This patch includes the main logic to expand ccmp instructions.

In the patch,
  * ccmp_candidate_p is used to identify the CCMP candidate
  * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1
to expand CCMP.
  * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP.
It calls gen_ccmp_first and gen_ccmp_next to generate CCMP instructions.

During expanding, we must make sure that no instruction can clobber the
CC reg except the compares.  So clobber_cc_p and check_clobber_cc are
introduced to do the check.

  * If the final result is not used in a COND_EXPR (checked by function
used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a
general register.

Bootstrap and no make check regression on X86-64.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* ccmp.c (ccmp_candidate_p, used_in_cond_stmt_p, check_clobber_cc,
clobber_cc_p, expand_ccmp_next, expand_ccmp_expr_1, expand_ccmp_expr):
New functions to expand ccmp.
* ccmp.h (expand_ccmp_expr): New prototype.
* expr.c: #include ccmp.h
(expand_expr_real_1): Try to expand ccmp.

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 665c2a5..97b3910 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -47,6 +47,262 @@ along with GCC; see the file COPYING3.  If not see
 #include expmed.h
 #include ccmp.h

+/* The following functions expand conditional compare (CCMP) instructions.
+   Here is a short description about the over all algorithm:
+ * ccmp_candidate_p is used to identify the CCMP candidate
+
+ * expand_ccmp_expr is the main entry, which calls expand_ccmp_expr_1
+   to expand CCMP.
+
+ * expand_ccmp_expr_1 uses a recursive algorithm to expand CCMP.
+   It calls two target hooks gen_ccmp_first and gen_ccmp_next to generate
+   CCMP instructions.
+- gen_ccmp_first expands the first compare in CCMP.
+- gen_ccmp_next expands the following compares.
+
+   During expanding, we must make sure that no instruction can clobber the
+   CC reg except the compares.  So clobber_cc_p and check_clobber_cc are
+   introduced to do the check.
+
+ * If the final result is not used in a COND_EXPR (checked by function
+   used_in_cond_stmt_p), it calls cstorecc4 pattern to store the CC to a
+   general register.  */
+
+/* Check whether G is a potential conditional compare candidate.  */
+static bool
+ccmp_candidate_p (gimple g)
+{
+  tree rhs = gimple_assign_rhs_to_tree (g);
+  tree lhs, op0, op1;
+  gimple gs0, gs1;
+  enum tree_code tcode, tcode0, tcode1;
+  tcode = TREE_CODE (rhs);
+
+  if (tcode != BIT_AND_EXPR  tcode != BIT_IOR_EXPR)
+return false;
+
+  lhs = gimple_assign_lhs (g);
+  op0 = TREE_OPERAND (rhs, 0);
+  op1 = TREE_OPERAND (rhs, 1);
+
+  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
+  || !has_single_use (lhs))
+return false;
+
+  gs0 = get_gimple_for_ssa_name (op0);
+  gs1 = get_gimple_for_ssa_name (op1);
+  if (!gs0 || !gs1 || !is_gimple_assign (gs0) || !is_gimple_assign (gs1)
+  /* g, gs0 and gs1 must be in the same basic block, since current stage
+is out-of-ssa.  We can not guarantee the correctness when forwording
+the gs0 and gs1 into g whithout DATAFLOW analysis.  */
+  || gimple_bb (gs0) != gimple_bb (gs1)
+  || gimple_bb (gs0) != gimple_bb (g))
+return false;
+
+  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
+   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
+  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
+  || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
+return false;
+
+  tcode0 = gimple_assign_rhs_code (gs0);
+  tcode1 = gimple_assign_rhs_code (gs1);
+  if (TREE_CODE_CLASS (tcode0) == tcc_comparison
+   TREE_CODE_CLASS (tcode1) == tcc_comparison)
+return true;
+  if (TREE_CODE_CLASS (tcode0) == tcc_comparison
+   ccmp_candidate_p (gs1))
+return true;
+  else if (TREE_CODE_CLASS (tcode1) == tcc_comparison
+   ccmp_candidate_p (gs0))
+return true;
+  /* We skip ccmp_candidate_p (gs1)  ccmp_candidate_p (gs0) since
+ there is no way to set the CC flag.  */
+  return false;
+}
+
+/* Check whether EXP is used in a GIMPLE_COND statement or not.  */
+static bool
+used_in_cond_stmt_p (tree exp)
+{
+  bool expand_cond = false;
+  imm_use_iterator ui;
+  gimple use_stmt;
+  FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp)
+if (gimple_code (use_stmt) == GIMPLE_COND)
+  {
+   tree op1 = gimple_cond_rhs (use_stmt);
+   if (integer_zerop (op1))
+ expand_cond = true;
+   BREAK_FROM_IMM_USE_STMT (ui);
+  }
+  return expand_cond;
+}
+
+/* If SETTER clobber CC reg, set DATA to TRUE.  */
+static void
+check_clobber_cc (rtx reg, const_rtx setter, void *data)
+{
+  if (GET_CODE (setter) == CLOBBER  GET_MODE (reg) == CCmode)
+*(bool *)data = true;
+}
+
+/* 

[PATCH, 5/10] aarch64: add ccmp operand predicate

2014-06-23 Thread Zhenqiang Chen
Hi,

The patches defines ccmp operand predicate for AARCH64.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64-protos.h (aarch64_uimm5): New prototype.
* config/aarch64/constraints.md (Usn): Immediate for ccmn.
* config/aarch64/predicates.md (aarch64_ccmp_immediate): New.
(aarch64_ccmp_operand): New.
* config/aarch64/aarch64.c (aarch64_uimm5): New function.

diff --git a/gcc/config/aarch64/aarch64-protos.h
b/gcc/config/aarch64/aarch64-protos.h
index c4f75b3..997ff50 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -246,6 +246,8 @@ void aarch64_init_expanders (void);
 void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);

+bool aarch64_uimm5 (HOST_WIDE_INT);
+
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f2968ff..ecf88f9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9566,6 +9566,13 @@ aarch64_expand_movmem (rtx *operands)
   return true;
 }

+/* Return true if val can be encoded as a 5-bit unsigned immediate.  */
+bool
+aarch64_uimm5 (HOST_WIDE_INT val)
+{
+  return (val  (HOST_WIDE_INT) 0x1f) == val;
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
diff --git a/gcc/config/aarch64/constraints.md
b/gcc/config/aarch64/constraints.md
index 807d0b1..bb6a8a1 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -89,6 +89,11 @@
   (and (match_code const_int)
(match_test (unsigned HOST_WIDE_INT) ival  32)))

+(define_constraint Usn
+ A constant that can be used with a CCMN operation (once negated).
+ (and (match_code const_int)
+  (match_test aarch64_uimm5 (-ival
+
 (define_constraint Usd
   @internal
   A constraint that matches an immediate shift constant in DImode.
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 2702a3c..dd35714 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -30,6 +30,15 @@
   (ior (match_code symbol_ref)
(match_operand 0 register_operand)))

+(define_predicate aarch64_ccmp_immediate
+  (and (match_code const_int)
+   (ior (match_test aarch64_uimm5 (INTVAL (op)))
+   (match_test aarch64_uimm5 (-INTVAL (op))
+
+(define_predicate aarch64_ccmp_operand
+  (ior (match_operand 0 register_operand)
+   (match_operand 0 aarch64_ccmp_immediate)))
+
 (define_predicate aarch64_simd_register
   (and (match_code reg)
(ior (match_test REGNO_REG_CLASS (REGNO (op)) == FP_LO_REGS)


[PATCH, 6/10] aarch64: add ccmp CC mode

2014-06-23 Thread Zhenqiang Chen
Hi,

The patches add a set of CC mode for AARCH64, which is similar as them for ARM.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64-modes.def: Define new CC modes for ccmp.
* config/aarch64/aarch64.c (aarch64_get_condition_code_1):
New prototype.
(aarch64_get_condition_code): Call aarch64_get_condition_code_1.
(aarch64_get_condition_code_1): New function to handle ccmp CC mode.
* config/aarch64/predicates.md (ccmp_cc_register): New.

diff --git a/gcc/config/aarch64/aarch64-modes.def
b/gcc/config/aarch64/aarch64-modes.def
index 1d2cc76..71fd2f0 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -25,6 +25,16 @@ CC_MODE (CC_ZESWP); /* zero-extend LHS (but swap to
make it RHS).  */
 CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS).  */
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
+CC_MODE (CC_DNE);
+CC_MODE (CC_DEQ);
+CC_MODE (CC_DLE);
+CC_MODE (CC_DLT);
+CC_MODE (CC_DGE);
+CC_MODE (CC_DGT);
+CC_MODE (CC_DLEU);
+CC_MODE (CC_DLTU);
+CC_MODE (CC_DGEU);
+CC_MODE (CC_DGTU);

 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ecf88f9..e5ede6e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3460,6 +3460,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
 }

 static unsigned
+aarch64_get_condition_code_1 (enum machine_mode, enum rtx_code);
+
+static unsigned
 aarch64_get_condition_code (rtx x)
 {
   enum machine_mode mode = GET_MODE (XEXP (x, 0));
@@ -3467,7 +3470,12 @@ aarch64_get_condition_code (rtx x)

   if (GET_MODE_CLASS (mode) != MODE_CC)
 mode = SELECT_CC_MODE (comp_code, XEXP (x, 0), XEXP (x, 1));
+  return aarch64_get_condition_code_1 (mode, comp_code);
+}

+static unsigned
+aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code)
+{
   switch (mode)
 {
 case CCFPmode:
@@ -3490,6 +3498,27 @@ aarch64_get_condition_code (rtx x)
}
   break;

+case CC_DNEmode:
+  return comp_code == NE ? AARCH64_NE : AARCH64_EQ;
+case CC_DEQmode:
+  return comp_code == NE ? AARCH64_EQ : AARCH64_NE;
+case CC_DGEmode:
+  return comp_code == NE ? AARCH64_GE : AARCH64_LT;
+case CC_DLTmode:
+  return comp_code == NE ? AARCH64_LT : AARCH64_GE;
+case CC_DGTmode:
+  return comp_code == NE ? AARCH64_GT : AARCH64_LE;
+case CC_DLEmode:
+  return comp_code == NE ? AARCH64_LE : AARCH64_GT;
+case CC_DGEUmode:
+  return comp_code == NE ? AARCH64_CS : AARCH64_CC;
+case CC_DLTUmode:
+  return comp_code == NE ? AARCH64_CC : AARCH64_CS;
+case CC_DGTUmode:
+  return comp_code == NE ? AARCH64_HI : AARCH64_LS;
+case CC_DLEUmode:
+  return comp_code == NE ? AARCH64_LS : AARCH64_HI;
+
 case CCmode:
   switch (comp_code)
{
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index dd35714..ab02fd0 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -39,6 +39,23 @@
   (ior (match_operand 0 register_operand)
(match_operand 0 aarch64_ccmp_immediate)))

+(define_special_predicate ccmp_cc_register
+  (and (match_code reg)
+   (and (match_test REGNO (op) == CC_REGNUM)
+   (ior (match_test mode == GET_MODE (op))
+(match_test mode == VOIDmode
+  (GET_MODE (op) == CC_DNEmode
+ || GET_MODE (op) == CC_DEQmode
+ || GET_MODE (op) == CC_DLEmode
+ || GET_MODE (op) == CC_DLTmode
+ || GET_MODE (op) == CC_DGEmode
+ || GET_MODE (op) == CC_DGTmode
+ || GET_MODE (op) == CC_DLEUmode
+ || GET_MODE (op) == CC_DLTUmode
+ || GET_MODE (op) == CC_DGEUmode
+ || GET_MODE (op) == CC_DGTUmode)
+)
+
 (define_predicate aarch64_simd_register
   (and (match_code reg)
(ior (match_test REGNO_REG_CLASS (REGNO (op)) == FP_LO_REGS)


[PATCH, 7/10] aarch64: add function to output ccmp insn

2014-06-23 Thread Zhenqiang Chen
Hi,

The patch adds three help functions to output ccmp instructions.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64-protos.h (aarch64_output_ccmp): New prototype.
* config/aarch64/aarch64.c (aarch64_code_to_nzcv): New function.
(aarch64_mode_to_condition_code): New function.
(aarch64_output_ccmp): New function.

diff --git a/gcc/config/aarch64/aarch64-protos.h
b/gcc/config/aarch64/aarch64-protos.h
index 997ff50..ff1a0f4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -247,6 +247,7 @@ void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);

 bool aarch64_uimm5 (HOST_WIDE_INT);
+const char* aarch64_output_ccmp (rtx *, bool, int);

 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e5ede6e..5fe4826 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9602,6 +9602,137 @@ aarch64_uimm5 (HOST_WIDE_INT val)
   return (val  (HOST_WIDE_INT) 0x1f) == val;
 }

+/* N Z C V.  */
+#define AARCH64_CC_V 1
+#define AARCH64_CC_C (1  1)
+#define AARCH64_CC_Z (1  2)
+#define AARCH64_CC_N (1  3)
+
+static unsigned int
+aarch64_code_to_nzcv (enum rtx_code code, bool inverse)
+{
+  switch (code)
+{
+case NE: /* NE, Z == 0.  */
+  return inverse ? AARCH64_CC_Z : 0;
+case EQ: /* EQ, Z == 1.  */
+  return inverse ? 0 : AARCH64_CC_Z;
+case LE: /* LE, !(Z == 0  N == V).  */
+  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_Z;
+case GT: /* GT, Z == 0  N == V.  */
+  return inverse ? AARCH64_CC_Z : AARCH64_CC_N | AARCH64_CC_V;
+case LT: /* LT, N != V.  */
+  return inverse ? AARCH64_CC_N | AARCH64_CC_V : AARCH64_CC_N;
+case GE: /* GE, N == V.  */
+  return inverse ? AARCH64_CC_N : AARCH64_CC_N | AARCH64_CC_V;
+case LEU: /* LS, !(C == 1  Z == 0).  */
+  return inverse ? AARCH64_CC_C: AARCH64_CC_Z;
+case GTU: /* HI, C ==1  Z == 0.  */
+  return inverse ? AARCH64_CC_Z : AARCH64_CC_C;
+case LTU: /* CC, C == 0.  */
+  return inverse ? AARCH64_CC_C : 0;
+case GEU: /* CS, C == 1.  */
+  return inverse ? 0 : AARCH64_CC_C;
+default:
+  gcc_unreachable ();
+  return 0;
+}
+}
+
+static unsigned
+aarch64_mode_to_condition_code (enum machine_mode mode, bool inverse)
+{
+  switch (mode)
+{
+case CC_DNEmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, EQ)
+  : aarch64_get_condition_code_1 (CCmode, NE);
+case CC_DEQmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, NE)
+  : aarch64_get_condition_code_1 (CCmode, EQ);
+case CC_DLEmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, GT)
+  : aarch64_get_condition_code_1 (CCmode, LE);
+case CC_DGTmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, LE)
+  : aarch64_get_condition_code_1 (CCmode, GT);
+case CC_DLTmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, GE)
+  : aarch64_get_condition_code_1 (CCmode, LT);
+case CC_DGEmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, LT)
+  : aarch64_get_condition_code_1 (CCmode, GE);
+case CC_DLEUmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, GTU)
+  : aarch64_get_condition_code_1 (CCmode, LEU);
+case CC_DGTUmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, LEU)
+  : aarch64_get_condition_code_1 (CCmode, GTU);
+case CC_DLTUmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, GEU)
+  : aarch64_get_condition_code_1 (CCmode, LTU);
+case CC_DGEUmode:
+  return inverse ? aarch64_get_condition_code_1 (CCmode, LTU)
+  : aarch64_get_condition_code_1 (CCmode, GEU);
+default:
+  gcc_unreachable ();
+}
+}
+
+const char *
+aarch64_output_ccmp (rtx *operands, bool is_and, int which_alternative)
+{
+  char buf[32];
+  rtx cc = operands[0];
+  enum rtx_code code = GET_CODE (operands[5]);
+  unsigned char nzcv = aarch64_code_to_nzcv (code, is_and);
+  enum machine_mode mode = GET_MODE (cc);
+  unsigned int cond_code = aarch64_mode_to_condition_code (mode, !is_and);
+
+  gcc_assert (GET_MODE (operands[2]) == SImode
+ || GET_MODE (operands[2]) == DImode);
+
+  if (GET_MODE (operands[2]) == SImode)
+switch (which_alternative)
+  {
+  case 0:
+   snprintf (buf, sizeof (buf), ccmp\t%%w2, %%w3, #%u, %s,
+ nzcv, aarch64_condition_codes[cond_code]);
+   break;
+  case 1:
+   snprintf (buf, sizeof (buf), ccmp\t%%w2, #%%3, #%u, %s,
+ nzcv, 

[PATCH, 9/10] aarch64: generate conditional compare instructions

2014-06-23 Thread Zhenqiang Chen
Hi,

The patches implements the two hooks for AARCH64 to generate ccmp instructions.

Bootstrap and no make check regression on qemu.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64.c (aarch64_code_to_ccmode): New function.
(aarch64_convert_mode, aarch64_convert_mode): New functions.
(aarch64_gen_ccmp_first, aarch64_gen_ccmp_next): New functions.
(TARGET_GEN_CCMP_FIRST, TARGET_GEN_CCMP_NEXT): Define the two hooks.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4e8d55b..6f08e38 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9601,6 +9601,137 @@ aarch64_uimm5 (HOST_WIDE_INT val)
   return (val  (HOST_WIDE_INT) 0x1f) == val;
 }

+static enum machine_mode
+aarch64_code_to_ccmode (enum rtx_code code)
+{
+  switch (code)
+{
+case NE:
+  return CC_DNEmode;
+case EQ:
+  return CC_DEQmode;
+case LE:
+  return CC_DLEmode;
+case LT:
+  return CC_DLTmode;
+case GE:
+  return CC_DGEmode;
+case GT:
+  return CC_DGTmode;
+case LEU:
+  return CC_DLEUmode;
+case LTU:
+  return CC_DLTUmode;
+case GEU:
+  return CC_DGEUmode;
+case GTU:
+  return CC_DGTUmode;
+default:
+  return CCmode;
+}
+}
+
+static bool
+aarch64_convert_mode (rtx* op0, rtx* op1, int unsignedp)
+{
+  enum machine_mode mode;
+
+  mode = GET_MODE (*op0);
+  if (mode == VOIDmode)
+mode = GET_MODE (*op1);
+
+  if (mode == QImode || mode == HImode)
+{
+  *op0 = convert_modes (SImode, mode, *op0, unsignedp);
+  *op1 = convert_modes (SImode, mode, *op1, unsignedp);
+}
+  else if (mode != SImode  mode != DImode)
+return false;
+
+  return true;
+}
+
+static rtx
+aarch64_gen_ccmp_first (int code, rtx op0, rtx op1)
+{
+  enum machine_mode mode;
+  rtx cmp, target;
+  int unsignedp = code == LTU || code == LEU || code == GTU || code == GEU;
+
+  mode = GET_MODE (op0);
+  if (mode == VOIDmode)
+mode = GET_MODE (op1);
+
+  if (mode == VOIDmode)
+return NULL_RTX;
+
+  /* Make op0 and op1 are legal operands for cmp.  */
+  if (!register_operand (op0, GET_MODE (op0)))
+op0 = force_reg (mode, op0);
+  if (!aarch64_plus_operand (op1, GET_MODE (op1)))
+op1 = force_reg (mode, op1);
+
+  if (!aarch64_convert_mode (op0, op1, unsignedp))
+return NULL_RTX;
+
+  mode = aarch64_code_to_ccmode ((enum rtx_code) code);
+  if (mode == CCmode)
+return NULL_RTX;
+
+  cmp = gen_rtx_fmt_ee (COMPARE, CCmode, op0, op1);
+  target = gen_rtx_REG (mode, CC_REGNUM);
+  emit_insn (gen_rtx_SET (VOIDmode, gen_rtx_REG (CCmode, CC_REGNUM), cmp));
+  return target;
+}
+
+static rtx
+aarch64_gen_ccmp_next (rtx prev, int cmp_code, rtx op0, rtx op1, int bit_code)
+{
+  rtx cmp0, cmp1, target, bit_op;
+  enum machine_mode mode;
+  int unsignedp = cmp_code == LTU || cmp_code == LEU
+ || cmp_code == GTU || cmp_code == GEU;
+
+  mode = GET_MODE (op0);
+  if (mode == VOIDmode)
+mode = GET_MODE (op1);
+
+  if (mode == VOIDmode)
+return NULL_RTX;
+
+  /* Give up if the operand is illegal since force_reg will introduce
+ additional overhead.  */
+  if (!register_operand (op0, GET_MODE (op0))
+  || !aarch64_ccmp_operand (op1, GET_MODE (op1)))
+return NULL_RTX;
+
+  if (!aarch64_convert_mode (op0, op1, unsignedp))
+return NULL_RTX;
+
+  mode = aarch64_code_to_ccmode ((enum rtx_code) cmp_code);
+  if (mode == CCmode)
+return NULL_RTX;
+
+  cmp1 = gen_rtx_fmt_ee ((enum rtx_code) cmp_code, SImode, op0, op1);
+
+  cmp0 = gen_rtx_fmt_ee (NE, SImode, prev, const0_rtx);
+
+  bit_op = gen_rtx_fmt_ee ((enum rtx_code) bit_code, SImode, cmp0, cmp1);
+
+  /* Generate insn to match ccmp_and/ccmp_ior.  */
+  target = gen_rtx_REG (mode, CC_REGNUM);
+  emit_insn (gen_rtx_SET (VOIDmode, target,
+  gen_rtx_fmt_ee (COMPARE, VOIDmode,
+  bit_op, const0_rtx)));
+  return target;
+}
+
+#undef TARGET_GEN_CCMP_FIRST
+#define TARGET_GEN_CCMP_FIRST aarch64_gen_ccmp_first
+
+#undef TARGET_GEN_CCMP_NEXT
+#define TARGET_GEN_CCMP_NEXT aarch64_gen_ccmp_next
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost


[PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-06-23 Thread Zhenqiang Chen
Hi,

The patch enhances ifcvt to handle conditional compare instruction
(ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to
TRUE when calling canonicalize_condition. And the backend does not
need to generate additional compare (CC, 0) for it.

Bootstrap and no check regression on qemu.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64.md (movmodecc): Handle ccmp_cc.
* ifcvt.c: #include ccmp.h.
(struct noce_if_info): Add a new field ccmp_p.
(noce_emit_cmove): Allow ccmp condition.
(noce_get_alt_condition): Call canonicalize_condition with ccmp_p.
(noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp.
(noce_process_if_block): Set ccmp_p for ccmp.

testsuite/ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* gcc.target/aarch64/ccmn-csel-1.c: New testcase.
* gcc.target/aarch64/ccmn-csel-2.c: New testcase.
* gcc.target/aarch64/ccmn-csel-3.c: New testcase.
* gcc.target/aarch64/ccmp-csel-1.c: New testcase.
* gcc.target/aarch64/ccmp-csel-2.c: New testcase.
* gcc.target/aarch64/ccmp-csel-3.c: New testcase.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fcc5559..82cc561 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2459,15 +2459,19 @@
   (match_operand:ALLI 3 register_operand )))]
   
   {
-rtx ccreg;
 enum rtx_code code = GET_CODE (operands[1]);

 if (code == UNEQ || code == LTGT)
   FAIL;

-ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
- XEXP (operands[1], 1));
-operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+if (!ccmp_cc_register (XEXP (operands[1], 0),
+  GET_MODE (XEXP (operands[1], 0
+  {
+   rtx ccreg;
+   ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
+XEXP (operands[1], 1));
+   operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+  }
   }
 )

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2ca2278..8ee1266 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -43,6 +43,7 @@
 #include vec.h
 #include pointer-set.h
 #include dbgcnt.h
+#include ccmp.h

 #ifndef HAVE_conditional_move
 #define HAVE_conditional_move 0
@@ -786,6 +787,9 @@ struct noce_if_info

   /* Estimated cost of the particular branch instruction.  */
   int branch_cost;
+
+  /* The COND is a conditional compare or not.  */
+  bool ccmp_p;
 };

 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -1407,9 +1411,16 @@ noce_emit_cmove (struct noce_if_info *if_info,
rtx x, enum rtx_code code,
   end_sequence ();
 }

-  /* Don't even try if the comparison operands are weird.  */
-  if (! general_operand (cmp_a, GET_MODE (cmp_a))
-  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
+  /* Don't even try if the comparison operands are weird
+ except conditional compare.  */
+  if (if_info-ccmp_p)
+{
+  if (!(GET_MODE_CLASS (GET_MODE (cmp_a)) == MODE_CC
+   || GET_MODE_CLASS (GET_MODE (cmp_b)) == MODE_CC))
+   return NULL_RTX;
+}
+  else if (! general_operand (cmp_a, GET_MODE (cmp_a))
+  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
 return NULL_RTX;

 #if HAVE_conditional_move
@@ -1849,7 +1860,7 @@ noce_get_alt_condition (struct noce_if_info
*if_info, rtx target,
 }

   cond = canonicalize_condition (if_info-jump, cond, reverse,
-earliest, target, false, true);
+earliest, target, if_info-ccmp_p, true);
   if (! cond || ! reg_mentioned_p (target, cond))
 return NULL;

@@ -2300,6 +2311,7 @@ noce_get_condition (rtx jump, rtx *earliest,
bool then_else_reversed)
 {
   rtx cond, set, tmp;
   bool reverse;
+  int allow_cc_mode = false;

   if (! any_condjump_p (jump))
 return NULL_RTX;
@@ -2333,10 +2345,21 @@ noce_get_condition (rtx jump, rtx *earliest,
bool then_else_reversed)
   return cond;
 }

+  /* For conditional compare, set ALLOW_CC_MODE to TRUE.  */
+  if (targetm.gen_ccmp_first)
+{
+  rtx prev = prev_nonnote_nondebug_insn (jump);
+  if (prev
+  NONJUMP_INSN_P (prev)
+  BLOCK_FOR_INSN (prev) == BLOCK_FOR_INSN (jump)
+  ccmp_insn_p (prev))
+   allow_cc_mode = true;
+}
+
   /* Otherwise, fall back on canonicalize_condition to do the dirty
  work of manipulating MODE_CC values and COMPARE rtx codes.  */
   tmp = canonicalize_condition (jump, cond, reverse, earliest,
-   NULL_RTX, false, true);
+   NULL_RTX, allow_cc_mode, true);

   /* We don't handle side-effects in the condition, like handling
  REG_INC notes and making sure no duplicate conditions are emitted.  */
@@ -2577,6 

[PATCH, 8/10] aarch64: ccmp insn patterns

2014-06-23 Thread Zhenqiang Chen
Hi,

The patch adds two insn patterns for ccmp instructions.

cbranchcc4 is introduced to generate optimized conditional branch
without an additional compare against the result of ccmp.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

* config/aarch64/aarch64.md (cbranchcc4): New.
(*ccmp_and, *ccmp_ior): New.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a4d8887..c25d940 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -230,6 +230,52 @@
   
 )
+(define_expand cbranchcc4
+  [(set (pc) (if_then_else
+ (match_operator 0 aarch64_comparison_operator
+  [(match_operand 1 cc_register )
+   (const_int 0)])
+ (label_ref (match_operand 3  ))
+ (pc)))]
+  
+   )
+
+(define_insn *ccmp_and
+  [(set (match_operand 6 ccmp_cc_register )
+   (compare
+(and:SI
+ (match_operator 4 aarch64_comparison_operator
+  [(match_operand 0 ccmp_cc_register )
+   (match_operand 1 aarch64_plus_operand )])
+ (match_operator 5 aarch64_comparison_operator
+  [(match_operand:GPI 2 register_operand r,r,r)
+   (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)]))
+(const_int 0)))]
+  
+  {
+return aarch64_output_ccmp (operands, true, which_alternative);
+  }
+  [(set_attr type alus_reg,alus_imm,alus_imm)]
+)
+
+(define_insn *ccmp_ior
+  [(set (match_operand 6 ccmp_cc_register )
+   (compare
+(ior:SI
+ (match_operator 4 aarch64_comparison_operator
+  [(match_operand 0 ccmp_cc_register )
+   (match_operand 1 aarch64_plus_operand )])
+ (match_operator 5 aarch64_comparison_operator
+  [(match_operand:GPI 2 register_operand r,r,r)
+   (match_operand:GPI 3 aarch64_ccmp_operand r,Uss,Usn)]))
+(const_int 0)))]
+  
+  {
+return aarch64_output_ccmp (operands, false, which_alternative);
+  }
+  [(set_attr type alus_reg,alus_imm,alus_imm)]
+)
+
 (define_insn *condjump
   [(set (pc) (if_then_else (match_operator 0 aarch64_comparison_operator
[(match_operand 1 cc_register ) (const_int 0)])
   57,1


Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-06-23 Thread Andrew Pinski
On Mon, Jun 23, 2014 at 12:01 AM, Zhenqiang Chen
zhenqiang.c...@linaro.org wrote:
 Hi,

 The patch enhances ifcvt to handle conditional compare instruction
 (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to
 TRUE when calling canonicalize_condition. And the backend does not
 need to generate additional compare (CC, 0) for it.

 Bootstrap and no check regression on qemu.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * config/aarch64/aarch64.md (movmodecc): Handle ccmp_cc.
 * ifcvt.c: #include ccmp.h.
 (struct noce_if_info): Add a new field ccmp_p.
 (noce_emit_cmove): Allow ccmp condition.
 (noce_get_alt_condition): Call canonicalize_condition with ccmp_p.
 (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp.
 (noce_process_if_block): Set ccmp_p for ccmp.

 testsuite/ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * gcc.target/aarch64/ccmn-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-3.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-3.c: New testcase.

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index fcc5559..82cc561 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2459,15 +2459,19 @@
(match_operand:ALLI 3 register_operand )))]

{
 -rtx ccreg;
  enum rtx_code code = GET_CODE (operands[1]);

  if (code == UNEQ || code == LTGT)
FAIL;

 -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 - XEXP (operands[1], 1));
 -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +if (!ccmp_cc_register (XEXP (operands[1], 0),
 +  GET_MODE (XEXP (operands[1], 0
 +  {
 +   rtx ccreg;
 +   ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 +XEXP (operands[1], 1));
 +   operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +  }
}
  )



You should do the same thing for the FP one.  The change to aarch64.md
is exactly the same patch which I came up with.

For the rest I actually I have a late phi-opt pass which does the
conversion into COND_EXPR.  That is I don't change ifcvt at all.

And then I needed two more patches after that to get conditional
compares to work with cmov's.
The following patch which fixes up expand_cond_expr_using_cmove to
handle CCmode correctly:
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7989,7 +7989,9 @@ expand_cond_expr_using_cmove (tree treeop0
ATTRIBUTE_UNUSED,
   op00 = expand_normal (treeop0);
   op01 = const0_rtx;
   comparison_code = NE;
-  comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
+  comparison_mode = GET_MODE (op00);
+  if (comparison_mode == VOIDmode)
+   comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
 }

   if (GET_MODE (op1) != mode)


--- CUT ---
And then this one to have ccmp to be expanded from the tree level:
index cfc4a16..056e9b0 100644 (file)
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9300,26 +9300,36 @@ used_in_cond_stmt_p (tree exp)
   imm_use_iterator ui;
   gimple use_stmt;
   FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp)
-if (gimple_code (use_stmt) == GIMPLE_COND)
-  {
-   tree op1 = gimple_cond_rhs (use_stmt);
-   /* TBD: If we can convert all
-   _Bool t;
+{
+  if (gimple_code (use_stmt) == GIMPLE_COND)
+   {
+ tree op1 = gimple_cond_rhs (use_stmt);
+ /* TBD: If we can convert all
+ _Bool t;

-   if (t == 1)
- goto bb 3;
-   else
- goto bb 4;
-  to
-   if (t != 0)
- goto bb 3;
-   else
- goto bb 4;
-  we can remove the following check.  */
-   if (integer_zerop (op1))
- expand_cond = true;
-   BREAK_FROM_IMM_USE_STMT (ui);
-  }
+ if (t == 1)
+   goto bb 3;
+ else
+   goto bb 4;
+to
+ if (t != 0)
+   goto bb 3;
+ else
+   goto bb 4;
+we can remove the following check.  */
+ if (integer_zerop (op1))
+   expand_cond = true;
+ BREAK_FROM_IMM_USE_STMT (ui);
+   }
+  /* a = EXP ? b : c is also an use in conditional
+ statement. */
+  else if (gimple_code (use_stmt) == GIMPLE_ASSIGN
+   gimple_expr_code (use_stmt) == COND_EXPR)
+   {
+ if (gimple_assign_rhs1 (use_stmt) == exp)
+   expand_cond = true;
+   }
+}
   return expand_cond;
 }

Thanks,
Andrew Pinski


 diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
 

Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-06-23 Thread Andrew Pinski
On Mon, Jun 23, 2014 at 12:09 AM, Andrew Pinski pins...@gmail.com wrote:
 On Mon, Jun 23, 2014 at 12:01 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 Hi,

 The patch enhances ifcvt to handle conditional compare instruction
 (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to
 TRUE when calling canonicalize_condition. And the backend does not
 need to generate additional compare (CC, 0) for it.

 Bootstrap and no check regression on qemu.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * config/aarch64/aarch64.md (movmodecc): Handle ccmp_cc.
 * ifcvt.c: #include ccmp.h.
 (struct noce_if_info): Add a new field ccmp_p.
 (noce_emit_cmove): Allow ccmp condition.
 (noce_get_alt_condition): Call canonicalize_condition with ccmp_p.
 (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp.
 (noce_process_if_block): Set ccmp_p for ccmp.

 testsuite/ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * gcc.target/aarch64/ccmn-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-3.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-3.c: New testcase.


I forgot to make a mention that the following code does not catch:
char foo_c (char a, signed char b)
{
  if (a  9  b  -20)
return 0;
  else
return 1;
}

Where you need to define a cstorecc4.  I can submit the patch which
adds this pattern if you want.
Note with the cstorecc4 defined you will need the following patch too
to fix up return statements which are not expecting a different mode:
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 789c8b6..cb8b922 100644 (file)
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3079,6 +3079,11 @@ expand_value_return (rtx val)
   else
 mode = promote_function_mode (type, old_mode, unsignedp, funtype, 1);

+  /* If the mode of val is not VOID mode (that is val is not a constant),
+ use it for the old mode.  */
+  if (mode != BLKmode  GET_MODE (val) != VOIDmode)
+   old_mode = GET_MODE(val);
+
   if (mode != old_mode)
val = convert_modes (mode, old_mode, val, unsignedp);


Thanks,
Andrew



 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index fcc5559..82cc561 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2459,15 +2459,19 @@
(match_operand:ALLI 3 register_operand )))]

{
 -rtx ccreg;
  enum rtx_code code = GET_CODE (operands[1]);

  if (code == UNEQ || code == LTGT)
FAIL;

 -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 - XEXP (operands[1], 1));
 -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +if (!ccmp_cc_register (XEXP (operands[1], 0),
 +  GET_MODE (XEXP (operands[1], 0
 +  {
 +   rtx ccreg;
 +   ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 +XEXP (operands[1], 1));
 +   operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +  }
}
  )



 You should do the same thing for the FP one.  The change to aarch64.md
 is exactly the same patch which I came up with.

 For the rest I actually I have a late phi-opt pass which does the
 conversion into COND_EXPR.  That is I don't change ifcvt at all.

 And then I needed two more patches after that to get conditional
 compares to work with cmov's.
 The following patch which fixes up expand_cond_expr_using_cmove to
 handle CCmode correctly:
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -7989,7 +7989,9 @@ expand_cond_expr_using_cmove (tree treeop0
 ATTRIBUTE_UNUSED,
op00 = expand_normal (treeop0);
op01 = const0_rtx;
comparison_code = NE;
 -  comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
 +  comparison_mode = GET_MODE (op00);
 +  if (comparison_mode == VOIDmode)
 +   comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
  }

if (GET_MODE (op1) != mode)


 --- CUT ---
 And then this one to have ccmp to be expanded from the tree level:
 index cfc4a16..056e9b0 100644 (file)
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -9300,26 +9300,36 @@ used_in_cond_stmt_p (tree exp)
imm_use_iterator ui;
gimple use_stmt;
FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp)
 -if (gimple_code (use_stmt) == GIMPLE_COND)
 -  {
 -   tree op1 = gimple_cond_rhs (use_stmt);
 -   /* TBD: If we can convert all
 -   _Bool t;
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +   {
 + tree op1 = gimple_cond_rhs (use_stmt);
 + /* TBD: If we can convert all
 + _Bool t;

 -   if (t == 1)
 -  

Re: calloc = malloc + memset

2014-06-23 Thread Jakub Jelinek
On Tue, Jun 03, 2014 at 04:00:17PM +0200, Marc Glisse wrote:
 Ping?

Ok for trunk, sorry for the delay.

Jakub


Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-06-23 Thread Zhenqiang Chen
On 23 June 2014 15:09, Andrew Pinski pins...@gmail.com wrote:
 On Mon, Jun 23, 2014 at 12:01 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 Hi,

 The patch enhances ifcvt to handle conditional compare instruction
 (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to
 TRUE when calling canonicalize_condition. And the backend does not
 need to generate additional compare (CC, 0) for it.

 Bootstrap and no check regression on qemu.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * config/aarch64/aarch64.md (movmodecc): Handle ccmp_cc.
 * ifcvt.c: #include ccmp.h.
 (struct noce_if_info): Add a new field ccmp_p.
 (noce_emit_cmove): Allow ccmp condition.
 (noce_get_alt_condition): Call canonicalize_condition with ccmp_p.
 (noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp.
 (noce_process_if_block): Set ccmp_p for ccmp.

 testsuite/ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * gcc.target/aarch64/ccmn-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmn-csel-3.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-1.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-2.c: New testcase.
 * gcc.target/aarch64/ccmp-csel-3.c: New testcase.

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index fcc5559..82cc561 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2459,15 +2459,19 @@
(match_operand:ALLI 3 register_operand )))]

{
 -rtx ccreg;
  enum rtx_code code = GET_CODE (operands[1]);

  if (code == UNEQ || code == LTGT)
FAIL;

 -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 - XEXP (operands[1], 1));
 -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +if (!ccmp_cc_register (XEXP (operands[1], 0),
 +  GET_MODE (XEXP (operands[1], 0
 +  {
 +   rtx ccreg;
 +   ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 +XEXP (operands[1], 1));
 +   operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +  }
}
  )



 You should do the same thing for the FP one.  The change to aarch64.md
 is exactly the same patch which I came up with.

Thanks for the comments.

For AARCH64, we can mix INT and FP compares. But FP compare would be
slower than INT compare.

CMP
FCCMP

FCMP
CCMP

FCMP
FCCMP

I have no enough resource to collect benchmark results to approve them
valuable. So the patches did not handle FP at all. If you had approved
CCMP for FP valuable, I will work out a separate patch to support it.
Or you can share your patches.

Thanks!
-Zhenqiang

 For the rest I actually I have a late phi-opt pass which does the
 conversion into COND_EXPR.  That is I don't change ifcvt at all.

 And then I needed two more patches after that to get conditional
 compares to work with cmov's.

Thanks. Any patch to improve ccmp is welcome.

-Zhenqiang

 The following patch which fixes up expand_cond_expr_using_cmove to
 handle CCmode correctly:
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -7989,7 +7989,9 @@ expand_cond_expr_using_cmove (tree treeop0
 ATTRIBUTE_UNUSED,
op00 = expand_normal (treeop0);
op01 = const0_rtx;
comparison_code = NE;
 -  comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
 +  comparison_mode = GET_MODE (op00);
 +  if (comparison_mode == VOIDmode)
 +   comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
  }

if (GET_MODE (op1) != mode)


 --- CUT ---
 And then this one to have ccmp to be expanded from the tree level:
 index cfc4a16..056e9b0 100644 (file)
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -9300,26 +9300,36 @@ used_in_cond_stmt_p (tree exp)
imm_use_iterator ui;
gimple use_stmt;
FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp)
 -if (gimple_code (use_stmt) == GIMPLE_COND)
 -  {
 -   tree op1 = gimple_cond_rhs (use_stmt);
 -   /* TBD: If we can convert all
 -   _Bool t;
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +   {
 + tree op1 = gimple_cond_rhs (use_stmt);
 + /* TBD: If we can convert all
 + _Bool t;

 -   if (t == 1)
 - goto bb 3;
 -   else
 - goto bb 4;
 -  to
 -   if (t != 0)
 - goto bb 3;
 -   else
 - goto bb 4;
 -  we can remove the following check.  */
 -   if (integer_zerop (op1))
 - expand_cond = true;
 -   BREAK_FROM_IMM_USE_STMT (ui);
 -  }
 + if (t == 1)
 +   goto bb 3;
 + else
 +   goto bb 4;
 +to
 + if (t != 0)
 +   goto bb 3;
 +  

Re: [PATCH, 10/10] aarch64: Handle ccmp in ifcvt to make it work with cmov

2014-06-23 Thread pinskia


 On Jun 23, 2014, at 12:37 AM, Zhenqiang Chen zhenqiang.c...@linaro.org 
 wrote:
 
 On 23 June 2014 15:09, Andrew Pinski pins...@gmail.com wrote:
 On Mon, Jun 23, 2014 at 12:01 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 Hi,
 
 The patch enhances ifcvt to handle conditional compare instruction
 (ccmp) to make it work with cmov. For ccmp, ALLOW_CC_MODE is set to
 TRUE when calling canonicalize_condition. And the backend does not
 need to generate additional compare (CC, 0) for it.
 
 Bootstrap and no check regression on qemu.
 
 OK for trunk?
 
 Thanks!
 -Zhenqiang
 
 ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org
 
* config/aarch64/aarch64.md (movmodecc): Handle ccmp_cc.
* ifcvt.c: #include ccmp.h.
(struct noce_if_info): Add a new field ccmp_p.
(noce_emit_cmove): Allow ccmp condition.
(noce_get_alt_condition): Call canonicalize_condition with ccmp_p.
(noce_get_condition): Set ALLOW_CC_MODE to TRUE for ccmp.
(noce_process_if_block): Set ccmp_p for ccmp.
 
 testsuite/ChangeLog:
 2014-06-23  Zhenqiang Chen  zhenqiang.c...@linaro.org
 
* gcc.target/aarch64/ccmn-csel-1.c: New testcase.
* gcc.target/aarch64/ccmn-csel-2.c: New testcase.
* gcc.target/aarch64/ccmn-csel-3.c: New testcase.
* gcc.target/aarch64/ccmp-csel-1.c: New testcase.
* gcc.target/aarch64/ccmp-csel-2.c: New testcase.
* gcc.target/aarch64/ccmp-csel-3.c: New testcase.
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index fcc5559..82cc561 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -2459,15 +2459,19 @@
   (match_operand:ALLI 3 register_operand )))]
   
   {
 -rtx ccreg;
 enum rtx_code code = GET_CODE (operands[1]);
 
 if (code == UNEQ || code == LTGT)
   FAIL;
 
 -ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 - XEXP (operands[1], 1));
 -operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +if (!ccmp_cc_register (XEXP (operands[1], 0),
 +  GET_MODE (XEXP (operands[1], 0
 +  {
 +   rtx ccreg;
 +   ccreg = aarch64_gen_compare_reg (code, XEXP (operands[1], 0),
 +XEXP (operands[1], 1));
 +   operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
 +  }
   }
 )
 
 
 
 You should do the same thing for the FP one.  The change to aarch64.md
 is exactly the same patch which I came up with.
 
 Thanks for the comments.
 
 For AARCH64, we can mix INT and FP compares. But FP compare would be
 slower than INT compare.

One point is that this is not about fp compares but rather moving of fp 
register. The fp pattern is used for that.  So something like this would 
fail/ice:
double f(double a, double b, int c, int d)
{
  return c10d20?a:b;
}

Thanks,
Andrew



 
 CMP
 FCCMP
 
 FCMP
 CCMP
 
 FCMP
 FCCMP
 
 I have no enough resource to collect benchmark results to approve them
 valuable. So the patches did not handle FP at all. If you had approved
 CCMP for FP valuable, I will work out a separate patch to support it.
 Or you can share your patches.

I need to l

 
 Thanks!
 -Zhenqiang
 
 For the rest I actually I have a late phi-opt pass which does the
 conversion into COND_EXPR.  That is I don't change ifcvt at all.
 
 And then I needed two more patches after that to get conditional
 compares to work with cmov's.
 
 Thanks. Any patch to improve ccmp is welcome.
 
 -Zhenqiang
 
 The following patch which fixes up expand_cond_expr_using_cmove to
 handle CCmode correctly:
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -7989,7 +7989,9 @@ expand_cond_expr_using_cmove (tree treeop0
 ATTRIBUTE_UNUSED,
   op00 = expand_normal (treeop0);
   op01 = const0_rtx;
   comparison_code = NE;
 -  comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
 +  comparison_mode = GET_MODE (op00);
 +  if (comparison_mode == VOIDmode)
 +   comparison_mode = TYPE_MODE (TREE_TYPE (treeop0));
 }
 
   if (GET_MODE (op1) != mode)
 
 
 --- CUT ---
 And then this one to have ccmp to be expanded from the tree level:
 index cfc4a16..056e9b0 100644 (file)
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -9300,26 +9300,36 @@ used_in_cond_stmt_p (tree exp)
   imm_use_iterator ui;
   gimple use_stmt;
   FOR_EACH_IMM_USE_STMT (use_stmt, ui, exp)
 -if (gimple_code (use_stmt) == GIMPLE_COND)
 -  {
 -   tree op1 = gimple_cond_rhs (use_stmt);
 -   /* TBD: If we can convert all
 -   _Bool t;
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +   {
 + tree op1 = gimple_cond_rhs (use_stmt);
 + /* TBD: If we can convert all
 + _Bool t;
 
 -   if (t == 1)
 - goto bb 3;
 -   else
 - goto bb 4;
 -  to
 -   if (t != 0)
 - goto bb 3;
 -   else
 - 

Re: [PATCH] Trust TREE_ADDRESSABLE

2014-06-23 Thread Richard Biener
On Mon, 23 Jun 2014, Jan Hubicka wrote:

   On Fri, 13 Jun 2014, Jan Hubicka wrote:
   
 
 When you extract the address and use it.  For example when you
 do auto-parallelization and outline a part of your function it
 passes arrays as addresses.
 
 Or if you start to introduce address induction variables like
 the vectorizer or IVOPTs does.

I see, nothing really done by current early/IPA optimizers and in those 
cases
we also want to set TREE_ADDRESSABLE bit, too I suppose.
Do you think I should make patch for setting the NOVOPS bits in ipa 
code?
   
   No, please don't introduce new users of NOVOPS (it's a quite broken
   hack - it's sth like a 'const' function with side-effects so we should
   have instead used 'const' and some kind of volatile flag).  We're
   not using NOVOPS much and that's good (I think handling of such
   function calls are somewhat broken).
  
  I meant DECL_NONALIASED.  I will test the patch and lets see.
 
 Hi,
 this patch adds the discussed code to set DECL_NOALIASED so we get better AA
 with partitioning.  We probably also can sed DECL_NOALIASED for variables
 whose address is passed only to external calls that do not capture the
 parameters (i.e. memset).
 
 I suppose I can teach ipa-ref code about this, but will need a bit of
 extra infrastructure for that, since currently REF_ADDR does not associate
 any information about these.
 
 Martin, this is related to your controlled uses.  What do you think about 
 adding
 stable UIDs into ipa_ref datastructure and then adding a vector into cgraph 
 edges
 that describe what REFs are directly used as parameters of a given callsite?
 It will take some work to maintain these, but we will be able to remove them 
 when
 call site or parameter was eliminated in a more general way.
 
 I suppose we could also use these to associate REFs with given use in the
 satement or constructor (i.e. have pointer to statement as well as pointer to
 specific use within the statement). With this we will be able to redirect
 references same way as we redirect callgraph edges now.  This is something I
 need to for the ipa-visibility optimizations.

I don't like this very much.  It's fragile and it will be very hard to
detect bugs caused by it.

Please don't spread uses of the DECL_NONALIASED hack.

If we are only concerned about LTO I'd rather have a in_lto_p check
in may_be_aliased and trust TREE_ADDRESSABLE there.

Richard.

 Honza
 
 Bootstrapped/regtested and lto-bootstrapped x86_64-linux, will commit it 
 shortly.
 
   * ipa.c (clear_addressable_bit): Set also DECL_NONALIASED.
   (ipa_discover_readonly_nonaddressable_var): Compute also NONALIASED.
 Index: ipa.c
 ===
 --- ipa.c (revision 211881)
 +++ ipa.c (working copy)
 @@ -669,6 +669,10 @@ clear_addressable_bit (varpool_node *vno
  {
vnode-address_taken = false;
TREE_ADDRESSABLE (vnode-decl) = 0;
 +  /* Set also non-aliased bit.  In LTO, when program is partitioned, we no 
 longer
 + trust TREE_ADDRESSABLE for TREE_PUBLIC variables and then 
 DECL_NONALIASED is
 + useful to improve code.  */
 +  DECL_NONALIASED (vnode-decl) = 1;
return false;
  }
  
 @@ -690,6 +694,7 @@ ipa_discover_readonly_nonaddressable_var
FOR_EACH_VARIABLE (vnode)
  if (!vnode-alias
(TREE_ADDRESSABLE (vnode-decl)
 + || !DECL_NONALIASED (vnode-decl)
   || !vnode-writeonly
   || !TREE_READONLY (vnode-decl)))
{
 @@ -703,8 +708,8 @@ ipa_discover_readonly_nonaddressable_var
 continue;
   if (!address_taken)
 {
 - if (TREE_ADDRESSABLE (vnode-decl)  dump_file)
 -   fprintf (dump_file,  %s (non-addressable), vnode-name ());
 + if ((TREE_ADDRESSABLE (vnode-decl) || !DECL_NONALIASED 
 (vnode-decl))  dump_file)
 +   fprintf (dump_file,  %s (non-addressable non-aliased), 
 vnode-name ());
   varpool_for_node_and_aliases (vnode, clear_addressable_bit, NULL, 
 true);
 }
   if (!address_taken  !written
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


RE: [PATCH] Change default for --param allow-...-data-races to off

2014-06-23 Thread Bernd Edlinger
Hi,


On Fri, 20 Jun 2014 13:44:18, Martin Jambor wrote:

 Hi,

 On Thu, Jun 19, 2014 at 06:18:47PM +0200, Bernd Edlinger wrote:
 Hi,

 from a recent discussion on g...@gcc.gnu.org I have learned that the default 
 of
 --param allow-store-data-races is still 1, and it is causing problems.
 Therefore I would like to suggest to change the default of this option to 0.

 I was about to propose a similar patch but I intended to leave the
 parameter set to one when -Ofast is specified so that benchmarks are
 not hurt by this and as a nice pointer for people exploring our
 options to really squeeze out 100% performance (which would of course
 mean documenting it too).


Well actually, I am not sure if we ever wanted to have a race condition here.
Have you seen any impact of --param allow-store-data-races on any benchmark?


Thanks
Bernd.

 Thanks,

 Martin


 Boot-strapped and regression tested on x86_64-linux-gnu.
 Ok for trunk?


 Thanks
 Bernd.


 gcc/ChangeLog:
 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de

 Set default for --param allow-...-data-races to off.
 * params.def (PARAM_ALLOW_LOAD_DATA_RACES,
 PARAM_ALLOW_STORE_DATA_RACES, PARAM_ALLOW_PACKED_LOAD_DATA_RACES,
 PARAM_ALLOW_PACKED_STORE_DATA_RACES): Set default to off.

 testsuite/ChangeLog:
 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de

 Adjust to new default for --param allow-...-data-races.
 * c-c++-common/cxxbitfields-3.c: Adjust.
 * c-c++-common/cxxbitfields-6.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-1.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-2.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-3.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-4.c: Adjust.
 * g++.dg/simulate-thread/bitfields.C: Adjust.
 * g++.dg/simulate-thread/bitfields-2.C: Adjust.
 * gcc.dg/lto/pr52097_0.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-2.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-3.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-4.c: Adjust.
 * gcc.dg/simulate-thread/strict-align-global.c: Adjust.
 * gcc.dg/simulate-thread/subfields.c: Adjust.
 * gcc.dg/tree-ssa/20050314-1.c: Adjust.



  

Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-06-23 Thread Marcus Shawcroft
On 19 June 2014 14:12, James Greenhalgh james.greenha...@arm.com wrote:

 This has been sitting waiting for comment for a while now. If we do need a
 mechanism to describe individual costs for alternatives, it will need
 applied to all the existing uses in aarch64.md/aarch64-simd.md. I think
 solving that problem (if we need to) is a seperate patch, and shouldn't
 prevent this one from going in.

Agreed. OK /Marcus


Re: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Richard Biener
On Fri, Jun 20, 2014 at 12:41 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 10, 2014 5:05 PM

 Backports are welcome - please post a patch.


 Sorry for the delay. Here you are:

 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c 
 b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
 new file mode 100644
 index 000..d3b54a8
 --- /dev/null
 +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c
 @@ -0,0 +1,35 @@
 +#ifdef __UINT64_TYPE__
 +typedef __UINT64_TYPE__ uint64_t;
 +#else
 +typedef unsigned long long uint64_t;
 +#endif
 +
 +#ifndef __SIZEOF_INT128__
 +#define __int128 long long
 +#endif
 +
 +/* Some version of bswap optimization would ICE when analyzing a mask 
 constant
 +   too big for an HOST_WIDE_INT (PR61375).  */
 +
 +__attribute__ ((noinline, noclone)) uint64_t
 +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2)
 +{
 +  __int128 mask = (__int128)0x  56;
 +  return ((in1  mask)  56) | in2;
 +}
 +
 +int
 +main (int argc)
 +{
 +  __int128 in = 1;
 +#ifdef __SIZEOF_INT128__
 +  in = 64;
 +#endif
 +  if (sizeof (uint64_t) * __CHAR_BIT__ != 64)
 +return 0;
 +  if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128)
 +return 0;
 +  if (uint128_central_bitsi_ior (in, 2) != 0x102)
 +__builtin_abort ();
 +  return 0;
 +}
 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
 index 9ff857c..9d64205 100644
 --- a/gcc/tree-ssa-math-opts.c
 +++ b/gcc/tree-ssa-math-opts.c
 @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, 
 int limit)
   if (n-size % BITS_PER_UNIT != 0)
 return NULL_TREE;
   n-size /= BITS_PER_UNIT;
 + if (n-size  (int)sizeof (unsigned HOST_WIDEST_INT))
 +   return NULL_TREE;
   n-n = (sizeof (HOST_WIDEST_INT)  8 ? 0 :
   (unsigned HOST_WIDEST_INT)0x08070605  32 | 0x04030201);

 @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, 
 int limit)
 type_size = TYPE_PRECISION (gimple_expr_type (stmt));
 if (type_size % BITS_PER_UNIT != 0)
   return NULL_TREE;
 +   if (type_size  (int)HOST_BITS_PER_WIDEST_INT)
 + return NULL_TREE;

 if (type_size / BITS_PER_UNIT  (int)(sizeof (HOST_WIDEST_INT)))
   {

 Ok for GCC 4.8 and GCC 4.9 branches?

Ok.

Thanks,
Richard.

 Best regards,

 Thomas




Re: [PATCH] Trust TREE_ADDRESSABLE

2014-06-23 Thread Martin Jambor
Hi,

On Mon, Jun 23, 2014 at 04:55:36AM +0200, Jan Hubicka wrote:
   On Fri, 13 Jun 2014, Jan Hubicka wrote:
   
 
 When you extract the address and use it.  For example when you
 do auto-parallelization and outline a part of your function it
 passes arrays as addresses.
 
 Or if you start to introduce address induction variables like
 the vectorizer or IVOPTs does.

I see, nothing really done by current early/IPA optimizers and in those 
cases
we also want to set TREE_ADDRESSABLE bit, too I suppose.
Do you think I should make patch for setting the NOVOPS bits in ipa 
code?
   
   No, please don't introduce new users of NOVOPS (it's a quite broken
   hack - it's sth like a 'const' function with side-effects so we should
   have instead used 'const' and some kind of volatile flag).  We're
   not using NOVOPS much and that's good (I think handling of such
   function calls are somewhat broken).
  
  I meant DECL_NONALIASED.  I will test the patch and lets see.
 
 Hi,
 this patch adds the discussed code to set DECL_NOALIASED so we get better AA
 with partitioning.  We probably also can sed DECL_NOALIASED for variables
 whose address is passed only to external calls that do not capture the
 parameters (i.e. memset).
 
 I suppose I can teach ipa-ref code about this, but will need a bit of
 extra infrastructure for that, since currently REF_ADDR does not associate
 any information about these.
 
 Martin, this is related to your controlled uses.  What do you think about 
 adding
 stable UIDs into ipa_ref datastructure and then adding a vector into cgraph 
 edges
 that describe what REFs are directly used as parameters of a given callsite?
 It will take some work to maintain these, but we will be able to remove them 
 when
 call site or parameter was eliminated in a more general way.

I'm still recovering from getting up at six in the morning today so I
may be a bit slow but: the big patch already assigns (per-function)
UIDs to interesting DECLs and then maintains this information in jump
functions.  The only advantage of reference UIDs I can think of now is
that we would stop treating all references from and to same things as
equal (because currently we delete the first one we find).  Is that
what you want to achieve?

And by the way, if we add support for nocapture calls like memeset
that you described above, the big ipa-prop noescape patch will
actually directly calculate the nonaliased flag.  Perhaps it should be
even called that and not noescape.


 I suppose we could also use these to associate REFs with given use in the
 satement or constructor (i.e. have pointer to statement as well as pointer to
 specific use within the statement). With this we will be able to redirect
 references same way as we redirect callgraph edges now.  This is something I
 need to for the ipa-visibility optimizations.

I see.  I will think about this some more (and will be happy to chat on IRC).
Thanks,

Martin


 
 Honza
 
 Bootstrapped/regtested and lto-bootstrapped x86_64-linux, will commit it 
 shortly.
 
   * ipa.c (clear_addressable_bit): Set also DECL_NONALIASED.
   (ipa_discover_readonly_nonaddressable_var): Compute also NONALIASED.
 Index: ipa.c
 ===
 --- ipa.c (revision 211881)
 +++ ipa.c (working copy)
 @@ -669,6 +669,10 @@ clear_addressable_bit (varpool_node *vno
  {
vnode-address_taken = false;
TREE_ADDRESSABLE (vnode-decl) = 0;
 +  /* Set also non-aliased bit.  In LTO, when program is partitioned, we no 
 longer
 + trust TREE_ADDRESSABLE for TREE_PUBLIC variables and then 
 DECL_NONALIASED is
 + useful to improve code.  */
 +  DECL_NONALIASED (vnode-decl) = 1;
return false;
  }
  
 @@ -690,6 +694,7 @@ ipa_discover_readonly_nonaddressable_var
FOR_EACH_VARIABLE (vnode)
  if (!vnode-alias
(TREE_ADDRESSABLE (vnode-decl)
 + || !DECL_NONALIASED (vnode-decl)
   || !vnode-writeonly
   || !TREE_READONLY (vnode-decl)))
{
 @@ -703,8 +708,8 @@ ipa_discover_readonly_nonaddressable_var
 continue;
   if (!address_taken)
 {
 - if (TREE_ADDRESSABLE (vnode-decl)  dump_file)
 -   fprintf (dump_file,  %s (non-addressable), vnode-name ());
 + if ((TREE_ADDRESSABLE (vnode-decl) || !DECL_NONALIASED 
 (vnode-decl))  dump_file)
 +   fprintf (dump_file,  %s (non-addressable non-aliased), 
 vnode-name ());
   varpool_for_node_and_aliases (vnode, clear_addressable_bit, NULL, 
 true);
 }
   if (!address_taken  !written


Re: [PATCH] Remove bogus include path with in-tree cloog

2014-06-23 Thread Richard Biener
On Fri, Jun 20, 2014 at 6:52 PM, Bernd Edlinger
bernd.edlin...@hotmail.de wrote:
 Hi,

 I have noticed there is a minor flaw with the include path when cloog is 
 installed in-tree.

 That is, the cloog-include directory is added twice, first with absolute 
 path, and then
 again with relative path, but with one ../ to little, so this is useless 
 when compiling
 sources in the gcc directory.

 For example, if I call ../gcc-4.10-20140608/configure, the following is added 
 to each
 invocation of xg++:
 -I/absolute_path/gcc-4.10-20140608/cloog/include 
 -I../gcc-4.10-20140608/cloog/include
 The attached patch removes the bogus relative include path for in-tree 
 cloog/include.

 Boot-strapped and regression-tested on x86_64-linux-gnu.
 OK for trunk?

Ok.

Thanks,
Richard.


 Thanks
 Bernd.



Re: [PATCH][AArch64] Fix some saturating math NEON intrinsics types

2014-06-23 Thread Marcus Shawcroft
On 20 June 2014 15:14, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 Sure, but it depends on
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
 Is it ok to backport that one as well?

This can be backported as well.
/Marcus


Re: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Jakub Jelinek
On Mon, Jun 23, 2014 at 10:18:16AM +0200, Richard Biener wrote:
  --- a/gcc/tree-ssa-math-opts.c
  +++ b/gcc/tree-ssa-math-opts.c
  @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, 
  int limit)
if (n-size % BITS_PER_UNIT != 0)
  return NULL_TREE;
n-size /= BITS_PER_UNIT;
  + if (n-size  (int)sizeof (unsigned HOST_WIDEST_INT))
  +   return NULL_TREE;

This looks wrong, while the bswap pass is guarded with BITS_PER_UNIT == 8
check (i.e. target), you don't know of HOST_BITS_PER_CHAR is 8.
I'd move the test before the division by BITS_PER_UNIT, and compare
against HOST_BITS_PER_WIDEST_INT.

n-n = (sizeof (HOST_WIDEST_INT)  8 ? 0 :
(unsigned HOST_WIDEST_INT)0x08070605  32 | 0x04030201);
 
  @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, 
  int limit)
  type_size = TYPE_PRECISION (gimple_expr_type (stmt));
  if (type_size % BITS_PER_UNIT != 0)
return NULL_TREE;
  +   if (type_size  (int)HOST_BITS_PER_WIDEST_INT)
  + return NULL_TREE;
 
  if (type_size / BITS_PER_UNIT  (int)(sizeof (HOST_WIDEST_INT)))
{

Similarly here.

BTW, the formatting is wrong too, the (int) cast should be followed by space.

Jakub


Re: [PATCH] Fix forwporp pattern (T)(P + A) - (T)P - (T)A

2014-06-23 Thread Richard Biener
On Sun, Jun 22, 2014 at 9:14 AM, Bernd Edlinger
bernd.edlin...@hotmail.de wrote:
 Hi,

 I noticed that several testcases in the GMP-4.3.2 test suite are failing now 
 which
 did not happen with GCC 4.9.0.  I debugged the first one, mpz/convert, and 
 found
 the file mpn/generic/get_str.c was miscompiled.

 mpn/get_str.c.132t.dse2:
   pretmp_183 = (sizetype) chars_per_limb_80;
   pretmp_184 = -pretmp_183;
   _23 = chars_per_limb_80 + 4294967295;
   _68 = (sizetype) _23;
   _28 = _68 + pretmp_184;

 mpn/get_str.c.133t.forwprop4:
   _28 = 4294967295;


 That is wrong, because chars_per_limb is unsigned, and it is not zero.
 So the right result should be -1.  This makes the loop termination in that
 function fail.

 The reason for this is in this check-in:

 r210807 | ebotcazou | 2014-05-22 16:32:56 +0200 (Thu, 22 May 2014) | 3 lines

 * tree-ssa-forwprop.c (associate_plusminus): Extend (T)(P + A) - (T)P
 - (T)A transformation to integer types.


 Because it implicitly assumes that integer overflow is not allowed with all 
 types,
 including unsigned int.

Hmm?  But the transform is correct if overflow wraps.  And it's correct if
overflow is undefined as well, as (T)A is always well-defined (implementation
defined) if it is a truncation.

So we match the above an try to transform it to (T)P + (T)A - (T)P.  That's
wrong if the conversion is extending I think.

Richard.



 The attached patch fixes these regressions, and because the reasoning depends
 on the TYPE_OVERFLOW_UNDEFINED attribute, a strict overflow warning has to be
 emitted here, at least for widening conversions.


 Boot-strapped and regression-tested on x86_64-linux-gnu with all languages, 
 including Ada.
 OK for trunk?

+ if (!TYPE_SATURATING (TREE_TYPE (a))

this is already tested at the very beginning of the function.

+  !FLOAT_TYPE_P (TREE_TYPE (a))
+  !FIXED_POINT_TYPE_P (TREE_TYPE (a))

likewise.

+ || (!POINTER_TYPE_P (TREE_TYPE (p))
+  INTEGRAL_TYPE_P (TREE_TYPE (a))
+  TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (a)))

INTEGRAL_TYPE_P are always !POINTER_TYPE_P.







 Thanks
 Bernd.



Re: [PATCH, PR61554] ICE during CCP

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 7:32 AM, Chung-Lin Tang clt...@codesourcery.com wrote:
 Hi Richard,

 In this change:
 https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01278.html

 where substitute_and_fold() was changed to use a dom walker, the calls
 to purge dead EH edges during the walk can alter the dom-tree, and have
 chaotic results; the testcase in PR 61554 has some blocks traversed
 twice during the walk, causing the segfault during CCP.

 The patch records the to-be-purged-for-dead-EH blocks in a similar
 manner like stmts_to_remove, and processes it after the walk. (another
 possible method would be using a bitmap to record the BBs + calling
 gimple_purge_all_dead_eh_edges...)

Oops.

 Bootstrapped and tested on x86_64-linux, is this okay for trunk?

Can you please use a bitmap and use gimple_purge_all_dead_eh_edges
like tree-ssa-pre.c does?

Also please add the reduced testcase from the PR to the g++.dg/torture

Ok with that changes.

Thanks,
Richard.

 Thanks,
 Chung-Lin

 2014-06-23  Chung-Lin Tang  clt...@codesourcery.com

 PR tree-optimization/61554
 * tree-ssa-propagate.c (substitute_and_fold_dom_walker):
 Add 'vecbasic_block bbs_to_purge_dead_eh_edges' member,
 properly update constructor/destructor.
 (substitute_and_fold_dom_walker::before_dom_children):
 Remove call to gimple_purge_dead_eh_edges, add bb to
 bbs_to_purge_dead_eh_edges instead.
 (substitute_and_fold): Call gimple_purge_dead_eh_edges for
 bbs recorded in bbs_to_purge_dead_eh_edges.


Re: [PATCH] Change default for --param allow-...-data-races to off

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 10:02 AM, Bernd Edlinger
bernd.edlin...@hotmail.de wrote:
 Hi,


 On Fri, 20 Jun 2014 13:44:18, Martin Jambor wrote:

 Hi,

 On Thu, Jun 19, 2014 at 06:18:47PM +0200, Bernd Edlinger wrote:
 Hi,

 from a recent discussion on g...@gcc.gnu.org I have learned that the 
 default of
 --param allow-store-data-races is still 1, and it is causing problems.
 Therefore I would like to suggest to change the default of this option to 0.

 I was about to propose a similar patch but I intended to leave the
 parameter set to one when -Ofast is specified so that benchmarks are
 not hurt by this and as a nice pointer for people exploring our
 options to really squeeze out 100% performance (which would of course
 mean documenting it too).


 Well actually, I am not sure if we ever wanted to have a race condition here.
 Have you seen any impact of --param allow-store-data-races on any benchmark?

It's trivially to write one.   The only pass that checks the param is
tree loop invariant motion and it does that when it applies store-motion.
Register pressure increase is increased by a factor of two.

So I'd agree that we might want to disable this again for -Ofast.

As nothing tests for the PACKED variants nor for the LOAD variant
I'd rather remove those.  Claiming we don't create races for those
when you disable it via the param is simply not true.

Thanks,
Richard.


 Thanks
 Bernd.

 Thanks,

 Martin


 Boot-strapped and regression tested on x86_64-linux-gnu.
 Ok for trunk?


 Thanks
 Bernd.


 gcc/ChangeLog:
 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de

 Set default for --param allow-...-data-races to off.
 * params.def (PARAM_ALLOW_LOAD_DATA_RACES,
 PARAM_ALLOW_STORE_DATA_RACES, PARAM_ALLOW_PACKED_LOAD_DATA_RACES,
 PARAM_ALLOW_PACKED_STORE_DATA_RACES): Set default to off.

 testsuite/ChangeLog:
 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de

 Adjust to new default for --param allow-...-data-races.
 * c-c++-common/cxxbitfields-3.c: Adjust.
 * c-c++-common/cxxbitfields-6.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-1.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-2.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-3.c: Adjust.
 * c-c++-common/simulate-thread/bitfields-4.c: Adjust.
 * g++.dg/simulate-thread/bitfields.C: Adjust.
 * g++.dg/simulate-thread/bitfields-2.C: Adjust.
 * gcc.dg/lto/pr52097_0.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-2.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-3.c: Adjust.
 * gcc.dg/simulate-thread/speculative-store-4.c: Adjust.
 * gcc.dg/simulate-thread/strict-align-global.c: Adjust.
 * gcc.dg/simulate-thread/subfields.c: Adjust.
 * gcc.dg/tree-ssa/20050314-1.c: Adjust.






RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Monday, June 23, 2014 4:37 PM
 
 On Mon, Jun 23, 2014 at 10:18:16AM +0200, Richard Biener wrote:
   --- a/gcc/tree-ssa-math-opts.c
   +++ b/gcc/tree-ssa-math-opts.c
   @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct
 symbolic_number *n, int limit)
 if (n-size % BITS_PER_UNIT != 0)
   return NULL_TREE;
 n-size /= BITS_PER_UNIT;
   + if (n-size  (int)sizeof (unsigned HOST_WIDEST_INT))
   +   return NULL_TREE;
 
 This looks wrong, while the bswap pass is guarded with BITS_PER_UNIT == 8
 check (i.e. target), you don't know of HOST_BITS_PER_CHAR is 8.
 I'd move the test before the division by BITS_PER_UNIT, and compare
 against HOST_BITS_PER_WIDEST_INT.

I may misunderstand you but I don't think there is a problem here because we
just check if we can create a value on the host with as many bytes as the value
on the target. The value on the host is different, with each byte being a
number from 1 to SIZE, SIZE being the number of bytes on the target. So this
would fail only if the target value has so many bytes that this number of byte
cannot be represented in a HOST_WIDEST_INT.

 
 n-n = (sizeof (HOST_WIDEST_INT)  8 ? 0 :
 (unsigned HOST_WIDEST_INT)0x08070605  32 | 
   0x04030201);
  
   @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct
 symbolic_number *n, int limit)
   type_size = TYPE_PRECISION (gimple_expr_type (stmt));
   if (type_size % BITS_PER_UNIT != 0)
 return NULL_TREE;
   +   if (type_size  (int)HOST_BITS_PER_WIDEST_INT)
   + return NULL_TREE;
  
   if (type_size / BITS_PER_UNIT  (int)(sizeof 
   (HOST_WIDEST_INT)))
 {
 
 Similarly here.

I agree that here the test is not correct as we look at the number of bits on 
the
host which should be enough to count the number of byte on the target. To
reflect better the intent we should first compute the number of byte that
type_size forms and then compare to the size in byte of HOST_WIDEST_INT.

I'll rework the patch in this directly.

 
 BTW, the formatting is wrong too, the (int) cast should be followed by space.

Right, but note that I merely followed the current style in this file. There are
many more occurences of this style mistake in this file. Do you want me to
fix this one anyway?

Best regards,

Thomas




Re: [GSoC] [match-and-simplify] check for capture index

2014-06-23 Thread Richard Biener
On Wed, Jun 18, 2014 at 3:07 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 Put a check for capture index.

 * genmatch.c (parse_capture): Add condition to check capture index.
  (capture_max): New constant.
  (stdlib.h): Include.

I'd rather record the maximum seen capture index and remove that
fixed-size everywhere ...

Thanks,
Richard.

 Thanks and Regards,
 Prathamesh


Re: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Jakub Jelinek
On Mon, Jun 23, 2014 at 04:50:49PM +0800, Thomas Preud'homme wrote:
  Sent: Monday, June 23, 2014 4:37 PM
  
  On Mon, Jun 23, 2014 at 10:18:16AM +0200, Richard Biener wrote:
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct
  symbolic_number *n, int limit)
  if (n-size % BITS_PER_UNIT != 0)
return NULL_TREE;
  n-size /= BITS_PER_UNIT;
+ if (n-size  (int)sizeof (unsigned HOST_WIDEST_INT))
+   return NULL_TREE;
  
  This looks wrong, while the bswap pass is guarded with BITS_PER_UNIT == 8
  check (i.e. target), you don't know of HOST_BITS_PER_CHAR is 8.
  I'd move the test before the division by BITS_PER_UNIT, and compare
  against HOST_BITS_PER_WIDEST_INT.
 
 I may misunderstand you but I don't think there is a problem here because we
 just check if we can create a value on the host with as many bytes as the 
 value
 on the target. The value on the host is different, with each byte being a
 number from 1 to SIZE, SIZE being the number of bytes on the target. So this
 would fail only if the target value has so many bytes that this number of byte
 cannot be represented in a HOST_WIDEST_INT.

Host could e.g. in theory have CHAR_BIT 32, while target BITS_PER_UNIT 8
(otherwise bswap pass would give up).  sizeof (unsigned HOST_WIDE_INT) could
very well be 2 in that case.

Anyway, another option is to also not run the bswap pass if CHAR_BIT != 8,
e.g. fold-const.c does something similar when deciding if VIEW_CONVERT_EXPR
of constants can be safely folded.

  BTW, the formatting is wrong too, the (int) cast should be followed by 
  space.
 
 Right, but note that I merely followed the current style in this file. There 
 are
 many more occurences of this style mistake in this file. Do you want me to
 fix this one anyway?

Just don't introduce new formatting issues and on lines you touch anyway
also fix formatting issues.

Jakub


RE: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-23 Thread Sameera Deshpande
Hi Richard,

Thanks for your comments. I am working on the review comments, and will share 
the reworked patch soon.
However, here is clarification on some of the issues raised.

  +  if (TARGET_FIX_24K  TUNE_P5600)
  +error (unsupported combination: %s, -mtune=p5600 -mfix-24k);
  +
 /* Save the base compression state and process flags as though we
were generating uncompressed code.  */
 mips_base_compression_flags = TARGET_COMPRESSION;
 
 Although it's a bit of an odd combination, we need to accept -mfix-24k -
 mtune=p5600 and continue to implement the 24k workarounds.
 The idea is that a distributor can build for a common base architecture, add -
 mfix- options for processors that might run the code, and add -mtune= for
 the processor that's most of interest optimisation-wise.
 
 We should just make the pairing of stores conditional on !TARGET_FIX_24K.
We had offline discussion based on your comment. There is additional view on 
the same.
Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs do not 
support P5600. 
For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is implemented 
separately, and hence, as you suggested, P5600 Ld-ST bonding optimization 
should not be enabled for them.
So, is it fine if I emit error for any ISAs other than mips32r2, mips32r3 and 
mips32r5 when P5600 is enabled, or the compilation should continue by emitting 
warning and disabling P5600?
Also, the optimization will be enabled only if !TARGET_FIX_24K  
!TARGET_MICTOMIPS as suggested by you.

  +
  +#define ENABLE_LD_ST_PAIRING \
  +  (TARGET_ENABLE_LD_ST_PAIRING  TUNE_P5600)
 
 The patch requires -mld-st-pairing to be passed explicitly even for -
 mtune=p5600.  Is that because it's not a consistent enough win for us to
 enable it by default?  It sounded from the description like it should be an
 improvement more often that not.
 
 We should allow pairing even without -mtune=p5600.
Performance testing for this patch is not yet done. 
If the patch proves beneficial in most of the testcases (which we believe will 
do on P5600) we will enable this optimization by default for P5600 - in which 
case this option can be removed.

 
 Are QImodes not paired in the same way?  If so, it'd be worth adding a
 comment above the define_mode_iterator saying that QI is deliberately
 excluded.
The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF modes 
only. Hence QI mode is excluded. I will add the comment on the iterator.

- Thanks and regards,
   Sameera D.



RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Monday, June 23, 2014 4:59 PM
 
 Host could e.g. in theory have CHAR_BIT 32, while target BITS_PER_UNIT 8
 (otherwise bswap pass would give up).  sizeof (unsigned HOST_WIDE_INT)
 could
 very well be 2 in that case.

In this case the pass would skip any value of more than 2 bytes. However 
although the original comments on struct symbolic_number implies that there is 
a mapping between host bytes (the bytes of the symbolic number) and target 
bytes, it isn't the case since do_shift_rotate () shift the symbolic number by 
quantity of BYTES_PER_UNIT instead of CHAR_BIT. Also there is quite a few 8 
here and there. Although not a problem in practice, the mix of 8 and 
BITS_PER_UNIT does not look very good. I guess a quick review would be in 
order. Of course, with regards to the backport the mix of 8 and BITS_PER_UNIT 
should be left as is and only confusion about how to represent a target value 
into a host type should be fixed if any.

I'll come back to you whenever this is done.

Best regards,

Thomas





Re: [PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-23 Thread Ramana Radhakrishnan



On 20/06/14 21:28, Richard Henderson wrote:

There aren't too many users of the cmpelim pass, and previously they were all
small embedded targets without an FPU.

I'm a bit surprised that Ramana decided to enable this pass for aarch64, as
that target is not so limited as the block comment for the pass describes.
Honestly, whatever is being deleted here ought to have been found earlier,
either via combine or cse.  We ought to find out why any changes are made
during this pass for aarch64.


Agreed - Going back and looking at my notes I remember seeing a 
difference in code generation with the elimination of a number of 
compares that prompted me to turn this on in a number of benchmarks. I 
don't remember double checking why CSE hadn't removed that at that time. 
This also probably explains the equivalent patch for ARM and Thumb2 
hasn't shown demonstrable differences.


Investigating this pass for Thumb1 may be interesting.




That said, this PR does demonstrate a bug in the handling of fp comparisons in
the presence of -fnon-call-exceptions, so I go ahead and fix that regardless of
what we do with the aarch64 port longer term.


Fixing it properly in the pass makes sense and sorry about the breakage.

I can't look into this immediately but one of us will pick this up.


regards
Ramana



Bootstrap still in progress, but the original testcase is resolved.


r~



Re: [GSoC] Addition of ISL AST generation to Graphite

2014-06-23 Thread Sebastian Pop
Please add a FIXME note in graphite_regenerate_ast_isl saying that
this is not yet a full implementation of the code generator with ISL
ASTs.
It would be useful to make the current graphite_regenerate_ast_isl
working by calling graphite_regenerate_ast_cloog and adding the fixme
note above saying that we rely on the cloog code generator until we
implement the ISL AST parsing.

There is also a minor code style issue in:

isl_set * context_isl = isl_set_params (isl_set_copy (scop-context));

please remove the space after *:

isl_set *context_isl = isl_set_params (isl_set_copy (scop-context));

Otherwise the two patches look good to me.

Thanks,
Sebastian

On Wed, Jun 18, 2014 at 12:04 PM, Tobias Grosser tob...@grosser.es wrote:
 On 18/06/2014 21:00, Roman Gareev wrote:

 These patches add ISL AST generation to graphite, which can be chosen
 by the fgraphite-code-generator=[isl|cloog] switch. The first patch
 makes initial renaming of gloog and gloog_error to
 graphite_regenerate_ast_cloog and graphite_regenerate_error,
 respectively. The second one adds new files with generation of ISL
 AST, new switch, new testcase that checks that the dump is generated.

 Is it fine for trunk?


 I went over this from the graphite side and it looks fine. However,
 as I did not commit for a while to gcc, it would be great if someone else
 could have a look.

 Cheers,
 Tobias



RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT

2014-06-23 Thread Thomas Preud'homme
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 
 However
 although the original comments on struct symbolic_number implies that
 there is a mapping between host bytes (the bytes of the symbolic number)
 and target bytes, it isn't the case since do_shift_rotate () shift the 
 symbolic
 number by quantity of BYTES_PER_UNIT instead of CHAR_BIT.

My bad, the comment can be understood both ways.

Best regards,

Thomas




Re: [GSoC] Addition of ISL AST generation to Graphite

2014-06-23 Thread Tobias Grosser
Thanks Sebastian for the review! It is good to see you again on the 
mailing list!


On 23/06/2014 11:29, Sebastian Pop wrote:

Please add a FIXME note in graphite_regenerate_ast_isl saying that
this is not yet a full implementation of the code generator with ISL
ASTs.
It would be useful to make the current graphite_regenerate_ast_isl
working by calling graphite_regenerate_ast_cloog and adding the fixme
note above saying that we rely on the cloog code generator until we
implement the ISL AST parsing.


I would prefer to avoid this for the following two reasons:

1) Having a clear internal compiler error on unimplemented features
   allows us to easily understand which parts are working and which
   ones are not.

2) Going forward we will implement the isl ast generation step by step.
   To my understanding there is no reasonable way to do parts of the
   ast generation with isl and parts with CLooG. Developing such a
   hybrid generation seems not useful at all.

Instead of developing the ast code generation step-by-step in the 
graphite source tree, we could develop it outside of gcc and only commit 
a single large patch performing the switch. I personally

prefer the incremental development approach.


There is also a minor code style issue in:

isl_set * context_isl = isl_set_params (isl_set_copy (scop-context));

please remove the space after *:

isl_set *context_isl = isl_set_params (isl_set_copy (scop-context));

Otherwise the two patches look good to me.


Cheers,
Tobias



Re: [PATCH][MIPS] Enable load-load/store-store bonding

2014-06-23 Thread Richard Sandiford
Sameera Deshpande sameera.deshpa...@imgtec.com writes:
  +  if (TARGET_FIX_24K  TUNE_P5600)
  +error (unsupported combination: %s, -mtune=p5600 -mfix-24k);
  +
 /* Save the base compression state and process flags as though we
were generating uncompressed code.  */
 mips_base_compression_flags = TARGET_COMPRESSION;
 
 Although it's a bit of an odd combination, we need to accept -mfix-24k -
 mtune=p5600 and continue to implement the 24k workarounds.
 The idea is that a distributor can build for a common base architecture, add 
 -
 mfix- options for processors that might run the code, and add -mtune= for
 the processor that's most of interest optimisation-wise.
 
 We should just make the pairing of stores conditional on !TARGET_FIX_24K.
 We had offline discussion based on your comment. There is additional
 view on the same.
 Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining ISAs
 do not support P5600.
 For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
 implemented separately, and hence, as you suggested, P5600 Ld-ST bonding
 optimization should not be enabled for them.
 So, is it fine if I emit error for any ISAs other than mips32r2,
 mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
 continue by emitting warning and disabling P5600?

No, the point is that we have two separate concepts: ISA and optimisation
target.  -mipsN and -march=N control the ISA (which instructions are
available) and -mtune=M controls optimisation decisions within the
constraints of that N, such as scheduling and the cost of things like
multiplication and division.

E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS
II-compatible code, optimise it for p5600, but make sure that 24k
workarounds are used.  The code would run correctly on any MIPS
II-compatible processor without known errata and also on the 24k.

  +
  +#define ENABLE_LD_ST_PAIRING \
  +  (TARGET_ENABLE_LD_ST_PAIRING  TUNE_P5600)
 
 The patch requires -mld-st-pairing to be passed explicitly even for -
 mtune=p5600.  Is that because it's not a consistent enough win for us to
 enable it by default?  It sounded from the description like it should be an
 improvement more often that not.
 
 We should allow pairing even without -mtune=p5600.
 Performance testing for this patch is not yet done. 
 If the patch proves beneficial in most of the testcases (which we
 believe will do on P5600) we will enable this optimization by default
 for P5600 - in which case this option can be removed.

OK.  Sending the patch for comments before performance testing is fine,
but I think it'd be better to commit the patch only after the testing
is done, since otherwise the patch might need to be tweaked.

I don't see any problem with keeping the option in case people want to
experiment with it.  I just think the patch should only go in once it
can be enabled by default for p5600.  I.e. the option would exist to
turn off the pairing.

Not having the option is fine too of course.

 Are QImodes not paired in the same way?  If so, it'd be worth adding a
 comment above the define_mode_iterator saying that QI is deliberately
 excluded.
 The P5600 datasheet mentions bonding of load/stores in HI, SI, SF and DF
 modes only. Hence QI mode is excluded. I will add the comment on the
 iterator.

Thanks.

Richard


[PATCH]Enable elimination of IV use with unsigned type candidate

2014-06-23 Thread Bin Cheng
Hi,
For below simplified case:

#define LEN (32000)
__attribute__((aligned(16))) float a[LEN],b[LEN];

int foo (int M)
{
  for (int i = 0; i  M; i++)
a[i+M] = a[i] + b[i];
}

Compiling it with command like:
$ aarch64-elf-gcc -O3 -S foo.c -o foo.S -std=c99

The assembly code of vectorized loop is in below form:
mov  x1, 0
mov  w2, 0
.L4:
ldr q0, [x1, x3]
add w2, w2, 1
ldr q1, [x1, x4]
cmp w2, w5
faddv0.4s, v0.4s, v1.4s
str q0, [x6, x1]
add x1, x1, 16
bcc .L4

Induction variable w2 is unnecessary and can be eliminated with x1.  This is
safe because X1 will never overflow during all iterations of loop. The
optimal assembly should be like:
mov  x1, 0
.L4:
ldr q0, [x1, x2]
ldr q1, [x1, x4]
faddv0.4s, v0.4s, v1.4s
str q0, [x5, x1]
add x1, x1, 16
cmp x1, x3
bcc .L4

This case can be handled if we do more complex overflow check on unsigned
type in function iv_elimination_compare_lt.

Also there is another blocker for the transformation, function
number_of_iterations_lt calls fold_build2 to build folded form of
may_be_zero, while iv_elimination_compare_lt only handles simple form tree
expressions.  It's possible to have iv_elimination_compare_lt to do some
undo transformation on may_be_zero, but I found it's difficult for cases
involving signed/unsigned conversion like case loop-41.c.  Since I think
there is no obvious benefit to fold may_be_zero here (somehow because the
two operands are already in folded forms), this patch just calls build2_loc
instead.

This optimization is picked up by patch B, but patch A is necessary since
the check in iv_elimination_compare_lt of two aff_trees isn't enough when
two different types (one in signed type, the other in unsigned) involved.  I
have to use tree comparison here instead.  Considering below simple case:

Analyzing # of iterations of loop 5
  exit condition [1, + , 1](no_overflow) = i_1
  bounds on difference of bases: -3 ... 1
  result:
zero if i_1 + 1  1
# of iterations (unsigned int) i_1, bounded by 2
  number of iterations (unsigned int) i_1; zero if i_1 + 1  1

use 0
  compare
  in statement if (S.7_9  i_1)

  at position 
  type integer(kind=4)
  base 1
  step 1
  is a biv
  related candidates 

candidate 0 (important)
  var_before ivtmp.28
  var_after ivtmp.28
  incremented at end
  type unsigned int
  base 0
  step 1

When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
iv_elimination_compare_lt are like below with i_1 of signed type:
B: i_1 + 1
A: 0
niter-niter:  (unsigned int)i_1

Apparently, (B-A-1) is i_1, which doesn't equal to (unsigned int)i_1.
Without this patch, it is considered equal to each other.

Note that the induction variable IS necessary on 32 bit systems since
otherwise there is type overflow.

These two patch fix the mentioned problem.  
They pass bootstrap and regression test on x86_64/x86/aarch64/arm, so any
comments?

Thanks,
bin

PATCH A)

2014-06-23  Bin Cheng  bin.ch...@arm.com

* tree-ssa-loop-ivopts.c (iv_elimination_compare_lt): Check number
of iteration using tree comparison.

PATCH B)

2014-06-23  Bin Cheng  bin.ch...@arm.com

* tree-ssa-loop-niter.c (number_of_iterations_lt): Build unfolded
form of may_be_zero.
* tree-ssa-loop-ivopts.c (iv_nowrap_period)
(nowrap_cand_for_loop_niter_p): New functions.
(period_greater_niter_exit): New function refactored from
may_eliminate_iv.
(iv_elimination_compare_lt): New parameter.  Relax overflow check.
Handle special forms may_be_zero expression.
(may_eliminate_iv): Call period_greater_niter_exit.  Pass new
argument for iv_elimination_compare_lt.

gcc/testsuite/ChangeLog
2014-06-23  Bin Cheng  bin.ch...@arm.com

* gcc.dg/tree-ssa/loop-40.c: New test.
* gcc.dg/tree-ssa/loop-41.c: New test.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-41.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-41.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-41.c (revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options -O2 -fdump-tree-ivopts-details } */
+
+typedef long unsigned int size_t;
+extern float a[100], b[100];
+
+int foo (int M, int l)
+{
+  unsigned ivtmp = 0, niters, _37, _38, bnd;
+  size_t _67, _1;
+  float *vectp_a, *vectp_b, *vectp_a2;
+  float vect__6, vect__7, vect__8;
+
+  _38 = (unsigned int)l;
+  bnd = _38 + 1;
+
+  _1 = (size_t) M;
+  _67 = _1 * 4;
+  vectp_a = a; vectp_b = b; vectp_a2 = a + _67;
+
+  do
+{
+  vect__6 = *vectp_a;
+  vect__7 = *vectp_b;
+  vect__8 = vect__6 + vect__7;
+  *vectp_a = vect__8;
+  vectp_a = vectp_a + 4;
+  vectp_b = vectp_b + 4;
+  vectp_a2 = vectp_a2 + 4;
+  

[PATCH 4.9 ARM] Backport r210219: Neon Intrinsics TLC - remove ML

2014-06-23 Thread Alan Lawrence
As for 4.8, I'm intending to backport the ZIP/UZP/TRN fix for ARM big-endian in 
r211369 of mainline. That patches arm_neon.h, so again we need to remove the 
OCAML code by which that file is autogenerated...ok?


--Alancommit e83cb5fff3687316ff391e9e7a8c65df2d35c880
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Jun 23 11:02:03 2014 +0100

Backport r210219 from mainline: Neon intrinsics TLC - remove ML

	2014-05-08  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

	* config/arm/arm_neon.h: Update comment.
	* config/arm/neon-docgen.ml: Delete.
	* config/arm/neon-gen.ml: Delete.
	* doc/arm-neon-intrinsics.texi: Update comment.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 37a6e61..cd36b1d 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1,5 +1,4 @@
-/* ARM NEON intrinsics include file. This file is generated automatically
-   using neon-gen.ml.  Please do not edit manually.
+/* ARM NEON intrinsics include file.
 
Copyright (C) 2006-2014 Free Software Foundation, Inc.
Contributed by CodeSourcery.
diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
deleted file mode 100644
index 5788a53..000
--- a/gcc/config/arm/neon-docgen.ml
+++ /dev/null
@@ -1,424 +0,0 @@
-(* ARM NEON documentation generator.
-
-   Copyright (C) 2006-2014 Free Software Foundation, Inc.
-   Contributed by CodeSourcery.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it under
-   the terms of the GNU General Public License as published by the Free
-   Software Foundation; either version 3, or (at your option) any later
-   version.
-
-   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or
-   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-   for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   http://www.gnu.org/licenses/.
-
-   This is an O'Caml program.  The O'Caml compiler is available from:
-
- http://caml.inria.fr/
-
-   Or from your favourite OS's friendly packaging system. Tested with version
-   3.09.2, though other versions will probably work too.
-
-   Compile with:
- ocamlc -c neon.ml
- ocamlc -o neon-docgen neon.cmo neon-docgen.ml
-
-   Run with:
- /path/to/neon-docgen /path/to/gcc/doc/arm-neon-intrinsics.texi
-*)
-
-open Neon
-
-(* The combined ops and reinterp table.  *)
-let ops_reinterp = reinterp @ ops
-
-(* Helper functions for extracting things from the ops table.  *)
-let single_opcode desired_opcode () =
-  List.fold_left (fun got_so_far -
-  fun row -
-match row with
-  (opcode, _, _, _, _, _) -
-if opcode = desired_opcode then row :: got_so_far
-   else got_so_far
- ) [] ops_reinterp
-
-let multiple_opcodes desired_opcodes () =
-  List.fold_left (fun got_so_far -
-  fun desired_opcode -
-(single_opcode desired_opcode ()) @ got_so_far)
- [] desired_opcodes
-
-let ldx_opcode number () =
-  List.fold_left (fun got_so_far -
-  fun row -
-match row with
-  (opcode, _, _, _, _, _) -
-match opcode with
-  Vldx n | Vldx_lane n | Vldx_dup n when n = number -
-row :: got_so_far
-  | _ - got_so_far
- ) [] ops_reinterp
-
-let stx_opcode number () =
-  List.fold_left (fun got_so_far -
-  fun row -
-match row with
-  (opcode, _, _, _, _, _) -
-match opcode with
-  Vstx n | Vstx_lane n when n = number -
-row :: got_so_far
-  | _ - got_so_far
- ) [] ops_reinterp
-
-let tbl_opcode () =
-  List.fold_left (fun got_so_far -
-  fun row -
-match row with
-  (opcode, _, _, _, _, _) -
-match opcode with
-  Vtbl _ - row :: got_so_far
-  | _ - got_so_far
- ) [] ops_reinterp
-
-let tbx_opcode () =
-  List.fold_left (fun got_so_far -
-  fun row -
-match row with
-  (opcode, _, _, _, _, _) -
-match opcode with
-  Vtbx _ - row :: got_so_far
-  | _ - got_so_far
- ) [] ops_reinterp
-
-(* The groups of intrinsics.  *)
-let intrinsic_groups =
-  [ Addition, single_opcode Vadd;
-Multiplication, single_opcode Vmul;
-

RE: [PATCH,MIPS] MIPS64r6 support

2014-06-23 Thread Matthew Fortune
Richard Sandiford rdsandif...@googlemail.com writes:
 Sorry for the slow review.

And my slow response :-)

 Matthew Fortune matthew.fort...@imgtec.com writes:
  The initial support for MIP64r6 is intentionally minimal to make
 review
  easier. Performance enhancements and use of new MIPS64r6 features will
  be introduced separately. The current patch makes no attempt to
  get the testsuite appropriately configured for MIPS64r6 as the
 existing
  structure relies on each MIPS revision being a superset of the
 previous.
  Recommendations on how to rework the mips.exp logic to cope with this
  would be appreciated.
 
 Could you give an example of the kind of thing you mean?

You have actually covered the cases I was concerned about below. The
problem cases are those tests that already have a isa/isa_rev = ...
 
 If tests really do need r5 or earlier, we should enforce that in the
 dg-options.  E.g. for conditional moves we should add an isa_rev
 limit to the existing tests and add new ones with isa_rev=6.

OK. Steve has actually been working on this in parallel to the
review and has taken this approach.
 
 I suppose we'll need a way of specifying an isa_rev range, say
 isa_rev=2-5.  That should be a fairly localised change though.

There appear to be about 9 tests that are not fixed by educating mips.exp
about flags which are not supported on R6. Steve has initially dealt with
these via forbid_cpu=mips.*r6 but I guess it would be cleaner to try and
support an isa_rev range. I'll see we can club together enough tcl skills
to write it :-)

 Maybe just change the comment to:
 
   An address suitable for a @code{prefetch} instruction, or for any
 other
 instruction with the same addressing mode as @code{prefetch}.
 
 perhaps going on to say what the microMIPS, r6 and other cases are,
 if you think that's better.
 
 You need to update md.texi too; it isn't yet automatic.

Thanks. I've omitted the detail about what the cases are as the code
speaks for itself.

  (if_then_else (match_test TARGET_MICROMIPS)
   (match_test umips_12bit_offset_address_p (op, mode))
  -(match_test mips_address_insns (op, mode, false
  +(if_then_else (match_test ISA_HAS_PREFETCH_9BIT)
  +   (match_test mipsr6_9bit_offset_address_p (op, mode))
  +   (match_test mips_address_insns (op, mode, false)

 Please use (cond ...) instead.

It seems I cannot use cond in a predicate expression, so I've had to
leave it as is.

  diff --git a/gcc/config/mips/linux.h b/gcc/config/mips/linux.h
  index e539422..751623f 100644
  --- a/gcc/config/mips/linux.h
  +++ b/gcc/config/mips/linux.h
  @@ -18,8 +18,9 @@ along with GCC; see the file COPYING3.  If not see
   http://www.gnu.org/licenses/.  */
 
   #define GLIBC_DYNAMIC_LINKER \
  -  %{mnan=2008:/lib/ld-linux-mipsn8.so.1;:/lib/ld.so.1}
  +  %{mnan=2008|mips32r6|mips64r6:/lib/ld-linux-
 mipsn8.so.1;:/lib/ld.so.1}
 
   #undef UCLIBC_DYNAMIC_LINKER
   #define UCLIBC_DYNAMIC_LINKER \
  -  %{mnan=2008:/lib/ld-uClibc-mipsn8.so.0;:/lib/ld-uClibc.so.0}
  +  %{mnan=2008|mips32r6|mips64r6:/lib/ld-uClibc-mipsn8.so.0; \
  +  :/lib/ld-uClibc.so.0}
 
 Rather than update all the specs like this, I think we should force
 -mnan=2008 onto the command line for r6 using DRIVER_SELF_SPECS.
 See e.g. MIPS_ISA_SYNCI_SPEC.

I agree this could be simpler and your comment has made me realise the
implementation in the patch is wrong for configurations like
mipsisa32r6-unknown-linux-gnu. The issue for both the current patch and
your suggestion is that they rely on MIPS_ISA_LEVEL_SPEC having been
applied but this only happens in the vendor triplets. The --with-arch*
options used with mips-unknown-linux-gnu would be fine as they place
an arch option on the command line.

If I add MIPS_ISA_LEVEL_SPEC to the DRIVER_SELF_SPECS generic
definition in mips.h then I believe that would fix the problem. Any new
spec I add for R6/nan setting would also need adding to the generic
DRIVER_SELF_SPECS in mips.h and any vendor definitions of
DRIVER_SELF_SPECS.

  diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-
 protos.h
  index 0b32a70..9560506 100644
  --- a/gcc/config/mips/mips-protos.h
  +++ b/gcc/config/mips/mips-protos.h 
  @@ -4186,6 +4230,46 @@ mips_rtx_costs (rtx x, int code, int
 outer_code, int opno ATTRIBUTE_UNUSED,
  }
 *total = mips_zero_extend_cost (mode, XEXP (x, 0));
 return false;
  +case TRUNCATE:
  +  /* Costings for highpart multiplies.  */
  +  if (ISA_HAS_R6MUL
  +  (GET_CODE (XEXP (x, 0)) == ASHIFTRT
  + || GET_CODE (XEXP (x, 0)) == LSHIFTRT)
  +  CONST_INT_P (XEXP (XEXP (x, 0), 1))
  +  ((INTVAL (XEXP (XEXP (x, 0), 1)) == 32
  +   GET_MODE (XEXP (x, 0)) == DImode)
  + || (ISA_HAS_R6DMUL
  +  INTVAL (XEXP (XEXP (x, 0), 1)) == 64
  +  GET_MODE (XEXP (x, 0)) == TImode))
  +  GET_CODE (XEXP (XEXP (x, 0), 0)) == MULT
  +  ((GET_CODE 

[patch libgcc]: Fix PR libgcc/61585 Subscript-out-of-range in unwind-seh.c

2014-06-23 Thread Kai Tietz
Hi,

this fixes a potential out-of-bound access in unwind-seh's GetGr/SetGr function.

ChangeLog

2014-06-23  Kai Tietz  kti...@redhat.com

PR libgcc/61585
* unwind-seh.c (_Unwind_GetGR): Check for proper
index range.
(_Unwind_SetGR): Likewise.


I will apply this patch after successful build and test.

Kai

Index: unwind-seh.c
===
--- unwind-seh.c(Revision 211888)
+++ unwind-seh.c(Arbeitskopie)
@@ -79,7 +79,7 @@ struct _Unwind_Context
 _Unwind_Word
 _Unwind_GetGR (struct _Unwind_Context *c, int index)
 {
-  if (index  0 || index  2)
+  if (index  0 || index = 2)
 abort ();
   return c-reg[index];
 }
@@ -89,7 +89,7 @@ _Unwind_GetGR (struct _Unwind_Context *c, int inde
 void
 _Unwind_SetGR (struct _Unwind_Context *c, int index, _Unwind_Word val)
 {
-  if (index  0 || index  2)
+  if (index  0 || index = 2)
 abort ();
   c-reg[index] = val;
 }


Re: Fortran OpenMP UDR fixes, nested handling fixes etc.

2014-06-23 Thread Jakub Jelinek
On Sat, Jun 21, 2014 at 10:28:41AM +0200, Tobias Burnus wrote:
 Jakub Jelinek wrote:
 Bootstrap/regtest pending, does this look ok?
 
 Except for the module/resolved issues discussed elsewhere, it look good to
 me.

So, either we need something like the following patch (incremental), or
another possibility for the problem is not do the value.function.name
related change in module.c in the UDR patch, and instead fix up the UDR
combiner/initializer expressions when they are loaded from module
(change  name to NULL only in the UDR combiner/initializer expressions,
where they shouldn't be resolved yet).  Or make sure value.function.name
is set to non-NULL when resolving all intrinsic function calls, rather than
just for a subset of them.

With this patch it seems to pass bootstrap/regtest.

2014-06-21  Jakub Jelinek  ja...@redhat.com

* resolve.c (resolve_function): If value.function.isym is non-NULL,
consider it already resolved.
* module.c (fix_mio_expr): Likewise.
* trans-openmp.c (gfc_trans_omp_array_reduction_or_udr): Don't
initialize value.function.isym.

--- gcc/fortran/resolve.c.jj2014-06-20 23:31:49.0 +0200
+++ gcc/fortran/resolve.c   2014-06-21 20:07:39.708099045 +0200
@@ -2887,7 +2887,8 @@ resolve_function (gfc_expr *expr)
 
   /* See if function is already resolved.  */
 
-  if (expr-value.function.name != NULL)
+  if (expr-value.function.name != NULL
+  || expr-value.function.isym != NULL)
 {
   if (expr-ts.type == BT_UNKNOWN)
expr-ts = sym-ts;
--- gcc/fortran/module.c.jj 2014-06-20 23:31:49.0 +0200
+++ gcc/fortran/module.c2014-06-23 08:53:50.488662314 +0200
@@ -3173,7 +3173,8 @@ fix_mio_expr (gfc_expr *e)
   !e-symtree-n.sym-attr.dummy)
e-symtree = ns_st;
 }
-  else if (e-expr_type == EXPR_FUNCTION  e-value.function.name)
+  else if (e-expr_type == EXPR_FUNCTION
+   (e-value.function.name || e-value.function.isym))
 {
   gfc_symbol *sym;
 
--- gcc/fortran/trans-openmp.c.jj   2014-06-20 23:31:49.0 +0200
+++ gcc/fortran/trans-openmp.c  2014-06-23 11:53:02.932495166 +0200
@@ -1417,7 +1417,6 @@ gfc_trans_omp_array_reduction_or_udr (tr
   e4-expr_type = EXPR_FUNCTION;
   e4-where = where;
   e4-symtree = symtree4;
-  e4-value.function.isym = gfc_find_function (iname);
   e4-value.function.actual = gfc_get_actual_arglist ();
   e4-value.function.actual-expr = e3;
   e4-value.function.actual-next = gfc_get_actual_arglist ();

Jakub


Re: [PATCH][AArch64] Fix some saturating math NEON intrinsics types

2014-06-23 Thread Kyrill Tkachov


On 23/06/14 09:26, Marcus Shawcroft wrote:

On 20 June 2014 15:14, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


Sure, but it depends on
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
Is it ok to backport that one as well?

This can be backported as well.
/Marcus


Thanks, I've backported to 4.9 the above mentioned 
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html patch as r211889.


Kyrill




Re: [PATCH][AArch64] Fix some saturating math NEON intrinsics types

2014-06-23 Thread Kyrill Tkachov


On 23/06/14 11:52, Kyrill Tkachov wrote:

On 23/06/14 09:26, Marcus Shawcroft wrote:

On 20 June 2014 15:14, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


Sure, but it depends on
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
Is it ok to backport that one as well?

This can be backported as well.
/Marcus

Thanks, I've backported to 4.9 the above mentioned
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html patch as r211889.


The backport for this patch itself is in testing...


Kyrill








[patch] fix std::chrono::duration literals

2014-06-23 Thread Jonathan Wakely

This fixes errors when using 0s or 1'000s to create durations.

Tested x86_64-linux, committed to trunk.

commit c57e645222ac9bf476ef5b6414773b5f4e2b86fc
Author: Jonathan Wakely jwak...@redhat.com
Date:   Sun Jun 22 18:06:42 2014 +0100

	* include/bits/parse_numbers.h (_Number_help): Fix divide-by-zero.
	* include/std/chrono (_Checked_integral_constant): Allow zero.
	* testsuite/20_util/duration/literals/values.cc: Test non-positive
	values and digit separators.

diff --git a/libstdc++-v3/include/bits/parse_numbers.h b/libstdc++-v3/include/bits/parse_numbers.h
index a29d127..f46c59c 100644
--- a/libstdc++-v3/include/bits/parse_numbers.h
+++ b/libstdc++-v3/include/bits/parse_numbers.h
@@ -190,10 +190,11 @@ namespace __parse_int
   using __digit = _Digit_Base, _Dig;
   using __valid_digit = typename __digit::__valid;
   using __next = _Number_help_Base,
-  _Pow / (_Base * __valid_digit::value),
+  __valid_digit::value ? _Pow / _Base : _Pow,
   _Digs...;
   using type = __ull_constant_Pow * __digit::value + __next::type::value;
-  static_assert((type::value / _Pow) == __digit::value, overflow);
+  static_assert((type::value / _Pow) == __digit::value,
+		integer literal does not fit in unsigned long long);
 };
 
   templateunsigned _Base, unsigned long long _Pow, char _Dig
@@ -214,7 +215,6 @@ namespace __parse_int
 { };
 
 //--
-//  This _Parse_int is the same 'level' as the old _Base_dispatch.
 
   templatechar... _Digs
 struct _Parse_int;
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 39ad5e3..88eaa16 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -791,7 +791,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
   struct _Checked_integral_constant
   : integral_constant_Rep, static_cast_Rep(_Val)
   {
-	static_assert(_Checked_integral_constant::value  0
+	static_assert(_Checked_integral_constant::value = 0
 		   _Checked_integral_constant::value == _Val,
 		  literal value cannot be represented by duration type);
   };
diff --git a/libstdc++-v3/testsuite/20_util/duration/literals/values.cc b/libstdc++-v3/testsuite/20_util/duration/literals/values.cc
index f55c32f..ce86358 100644
--- a/libstdc++-v3/testsuite/20_util/duration/literals/values.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/literals/values.cc
@@ -56,6 +56,12 @@ test03()
   VERIFY( workday == std::chrono::hours(8) );
   auto fworkday = 8.0h;
   VERIFY( (fworkday == std::chrono::durationlong double, std::ratio3600,1(8.0L)) );
+  auto immediate = 0s;
+  VERIFY( immediate == std::chrono::seconds(0) );
+  auto minute_ago = -1min;
+  VERIFY( minute_ago == std::chrono::minutes(-1) );
+  auto separated = 1'000'000s;
+  VERIFY( separated == std::chrono::seconds(1'000'000) );
 }
 
 int


[PATCH] Fix for PR 61561

2014-06-23 Thread Marat Zakirov
Hi all,

Here's my new patch for PR 61561
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61561).
Which fixes ICE appeared due to QI/HI pattern lack in arm.md for stack
pointer register. 
Reg. tested on arm-v7. 

--Marat


arm.diff
Description: Binary data


Re: [C/C++ PATCH] Add -Wlogical-not-parentheses (PR c/49706)

2014-06-23 Thread Marek Polacek
On Sun, Jun 22, 2014 at 10:33:57PM +0200, Gerald Pfeifer wrote:
 On Mon, 2 Jun 2014, Marek Polacek wrote:
  * c-typeck.c (parser_build_binary_op): Warn when logical not is used
  on the left hand side operand of a comparison. 
 
 This...
 
  +/* Warn about logical not used on the left hand side operand of a 
  comparison.
 
 ...and this...
 
  +  warning_at (location, OPT_Wlogical_not_parentheses,
  + logical not is only applied to the left hand side of 
  + comparison);
 
 ...does not appear consistent with the actual warning.
 
 Why does that warning say is _ONLY_ applied to the left hand side?
 
 Based on the message, I naively assumed that the code should not warn
 about
 
   int same(int a, int b) {
 return !a == !b;
   }
 
 alas this is not the case.  (Code like this occurs in Wine where
 bool types are emulated and !!a or a comparison like above ensure
 that those emulated bools are normalized to either 0 or 1.)
 
 
 I understand there is ambiguity in cases like
 
   return !a == b;
 
 where the warning would be approriately worded and the programmer
 might have intended !(a == b).
 
 
 I do recommend to either omit only from the text of the warning
 or not warn for cases where ! occurs on both sides of the comparison
 (and keep the text as is).

I think the latter is better, incidentally, g++ doesn't warn either.
The following one liner makes cc1 behave as cc1plus.  Thanks for the
report.

Regtested/bootstrapped on x86_64.  Joseph, is this ok?

2014-06-23  Marek Polacek  pola...@redhat.com

* c-typeck.c (parser_build_binary_op): Don't call
warn_logical_not_parentheses if the RHS is TRUTH_NOT_EXPR.

* c-c++-common/pr49706-2.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 63bd65e..0764630 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -3402,7 +3402,8 @@ parser_build_binary_op (location_t location, enum 
tree_code code,
   code1, arg1.value, code2, arg2.value);
 
   if (warn_logical_not_paren
-   code1 == TRUTH_NOT_EXPR)
+   code1 == TRUTH_NOT_EXPR
+   code2 != TRUTH_NOT_EXPR)
 warn_logical_not_parentheses (location, code, arg1.value, arg2.value);
 
   /* Warn about comparisons against string literals, with the exception
diff --git gcc/testsuite/c-c++-common/pr49706-2.c 
gcc/testsuite/c-c++-common/pr49706-2.c
index e69de29..09cc9eb 100644
--- gcc/testsuite/c-c++-common/pr49706-2.c
+++ gcc/testsuite/c-c++-common/pr49706-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options -Wlogical-not-parentheses } */
+
+/* Test that we don't warn if both operands of the comparison
+   are negated.  */
+
+#ifndef __cplusplus
+#define bool _Bool
+#endif
+
+bool r;
+
+int
+same (int a, int b)
+{
+  r = !a == !b;
+  r = !!a == !!b;
+  r = !!a == !b;
+  r = !a == !!b;
+}

Marek


Re: [patch] Update libstdc++ FAQ and ensure stable anchors in HTML docs

2014-06-23 Thread Jonathan Wakely

On 22/06/14 22:40 +0200, Gerald Pfeifer wrote:

On Mon, 9 Jun 2014, Jonathan Wakely wrote:

This fairly tedious patch refreshes the FAQ, including adding some
notes saying This answer is old and probably no longer relevant to
several answers referring to problems in the GCC 3.x era.


I'm wondering, should those old items be removed?  GCC 3.x is
really a looong while ago (and I did check some of those items).


Yes, that would probably be better. I considered doing that but didn't
want to spend the time to decide which ones to cull.

Suggestions are welcome.

A FAQ that containts many NAFAQs (not actually frequently asked 
questions) may actually be less useful and paint a more negative

picture.


Good point.



Re: [RFC][ARM] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

2014-06-23 Thread Jay Foad
On 2 May 2014 10:04, Kugan kugan.vivekanandara...@linaro.org wrote:
 Thanks for spotting it. Here is the updated patch that changes it to
 ARM_FE_*.

 +2014-05-02  Kugan Vivekanandarajah  kug...@linaro.org
 +
 +   * config/arm/arm.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New define.
 +   (arm_builtins) : Add ARM_BUILTIN_GET_FPSCR and ARM_BUILTIN_SET_FPSCR.
 +   (bdesc_2arg) : Add description for builtins __builtins_arm_set_fpscr
 +   and __builtins_arm_get_fpscr.

s/__builtins/__builtin/g

 +   (arm_init_builtins) : Initialize builtins __builtins_arm_set_fpscr and
 +   __builtins_arm_get_fpscr.

s/__builtins/__builtin/g

This doesn't match the code, which initializes builtins ...ldfscr
and ...stfscr (with no p in fscr).

 +   (arm_expand_builtin) : Expand builtins __builtins_arm_set_fpscr and
 +   __builtins_arm_ldfpscr.

s/__builtins/__builtin/g

Did you mean and __builtin_arm_get_fpscr?

 +#define FP_BUILTIN(L, U) \
 +  {0, CODE_FOR_##L, __builtin_arm_#L, ARM_BUILTIN_##U, \
 +   UNKNOWN, 0},
 +
 +  FP_BUILTIN (set_fpscr, GET_FPSCR)
 +  FP_BUILTIN (get_fpscr, SET_FPSCR)
 +#undef FP_BUILTIN

This looks like a typo: you have mapped set-GET and get-SET.

Jay.


Re: [PATCH, PR 61540] Do not ICE on impossible devirtualization

2014-06-23 Thread James Greenhalgh
On Thu, Jun 19, 2014 at 12:49:55PM +0100, Martin Jambor wrote:
 Hi,
 
 On Wed, Jun 18, 2014 at 06:12:34PM +0200, Bernhard Reutner-Fischer wrote:
  On 18 June 2014 10:24:16 Martin Jambor mjam...@suse.cz wrote:
  
  @@ -3002,10 +3014,8 @@ try_make_edge_direct_virtual_call (struct
  cgraph_edge *ie,
  
 if (target)
   {
  -#ifdef ENABLE_CHECKING
  -  gcc_assert (possible_polymorphic_call_target_p
  -   (ie, cgraph_get_node (target)));
  -#endif
  +  if (!possible_polymorphic_call_target_p (ie, cgraph_get_node 
  (target)))
  +  return ipa_make_edge_direct_to_target (ie, target);
 return ipa_make_edge_direct_to_target (ie, target);
   }
  
  The above looks odd. You return the same thing both conditionally
  and unconditionally?
  
 
 You are obviously right, apparently I was too tired to attempt to work
 that night.  Thanks, for spotting it.  The following patch has this
 corrected and it also passes bootstrap and testing on x86_64-linux on
 both the trunk and the 4.9 branch. OK for both?
 
 Thanks,
 
 Martin

Hi Martin,

This new test fails for test variants with -fPIC. ( trunk and gcc-4_9-branch )
I've confirmed this on ARM, AArch64 and x86_64.

  FAIL: g++.dg/ipa/pr61540.C -std=gnu++11  scan-ipa-dump cp Type inconsident 
devirtualization
  FAIL: g++.dg/ipa/pr61540.C -std=gnu++1y  scan-ipa-dump cp Type inconsident 
devirtualization
  FAIL: g++.dg/ipa/pr61540.C -std=gnu++98  scan-ipa-dump cp Type inconsident 
devirtualization

I don't understand enough of this area to be more helpful, but I've
attached the relevant dump and the generated assembly for x86_64 for this
command:

./cc1plus ../../src/gcc/gcc/testsuite/g++.dg/ipa/pr61540.C -fmessage-length=0 
-std=gnu++98 -O3 -fno-early-inlining -fdump-ipa-cp -o foo.s -fPIC

Let me know if you need any patches testing, or if there is anything else
I can help with.

Thanks,
James
.file   pr61540.C
.section
.text.unlikely._ZN3top4topfEv,axG,@progbits,_ZN3top4topfEv,comdat
.align 2
.LCOLDB0:
.section
.text._ZN3top4topfEv,axG,@progbits,_ZN3top4topfEv,comdat
.LHOTB0:
.align 2
.p2align 4,,15
.weak   _ZN3top4topfEv
.type   _ZN3top4topfEv, @function
_ZN3top4topfEv:
.LFB3:
.cfi_startproc
rep; ret
.cfi_endproc
.LFE3:
.size   _ZN3top4topfEv, .-_ZN3top4topfEv
.section
.text.unlikely._ZN3top4topfEv,axG,@progbits,_ZN3top4topfEv,comdat
.LCOLDE0:
.section
.text._ZN3top4topfEv,axG,@progbits,_ZN3top4topfEv,comdat
.LHOTE0:
.section
.text.unlikely._ZN12intermediate4topfEv,axG,@progbits,_ZN12intermediate4topfEv,comdat
.align 2
.LCOLDB1:
.section
.text._ZN12intermediate4topfEv,axG,@progbits,_ZN12intermediate4topfEv,comdat
.LHOTB1:
.align 2
.p2align 4,,15
.weak   _ZN12intermediate4topfEv
.type   _ZN12intermediate4topfEv, @function
_ZN12intermediate4topfEv:
.LFB4:
.cfi_startproc
xorl%eax, %eax
ret
.cfi_endproc
.LFE4:
.size   _ZN12intermediate4topfEv, .-_ZN12intermediate4topfEv
.section
.text.unlikely._ZN12intermediate4topfEv,axG,@progbits,_ZN12intermediate4topfEv,comdat
.LCOLDE1:
.section
.text._ZN12intermediate4topfEv,axG,@progbits,_ZN12intermediate4topfEv,comdat
.LHOTE1:
.section.text.unlikely,ax,@progbits
.LCOLDB2:
.text
.LHOTB2:
.p2align 4,,15
.globl  _Z4testR3top
.type   _Z4testR3top, @function
_Z4testR3top:
.LFB6:
.cfi_startproc
subq$24, %rsp
.cfi_def_cfa_offset 32
movq(%rdi), %rax
movq(%rax), %rax
cmpq_ZN3top4topfEv@GOTPCREL(%rip), %rax
jne .L7
.L4:
movq_ZTV6child2@GOTPCREL(%rip), %rax
movq%rsp, %rdi
addq$16, %rax
movq%rax, (%rsp)
call_Z4testR3top@PLT
addq$24, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L7:
.cfi_restore_state
call*%rax
jmp .L4
.cfi_endproc
.LFE6:
.size   _Z4testR3top, .-_Z4testR3top
.section.text.unlikely
.LCOLDE2:
.text
.LHOTE2:
.section.text.unlikely
.LCOLDB3:
.section.text.startup,ax,@progbits
.LHOTB3:
.p2align 4,,15
.globl  main
.type   main, @function
main:
.LFB16:
.cfi_startproc
subq$24, %rsp
.cfi_def_cfa_offset 32
movq_ZTV6child1@GOTPCREL(%rip), %rax
movq%rsp, %rdi
addq$16, %rax
movq%rax, (%rsp)
call_Z4testR3top@PLT
xorl%eax, %eax
addq$24, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE16:
.size   main, .-main
.section.text.unlikely
.LCOLDE3:
.section.text.startup
.LHOTE3:
  

Re: [GSoC] Addition of ISL AST generation to Graphite

2014-06-23 Thread Roman Gareev
Thank you for the review!

--
   Cheers, Roman Gareev


ChangeLog_entry1
Description: Binary data


ChangeLog_entry2
Description: Binary data


patch1
Description: Binary data


patch2
Description: Binary data


Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-06-23 Thread Kyrill Tkachov

Hi James,


On 19/06/14 14:12, James Greenhalgh wrote:

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4 
100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1068,16 +1068,17 @@ (define_expand addmode3
  
  (define_insn *addsi3_aarch64

[(set
-(match_operand:SI 0 register_operand =rk,rk,rk)
+(match_operand:SI 0 register_operand =rk,rk,w,rk)
  (plus:SI
- (match_operand:SI 1 register_operand %rk,rk,rk)
- (match_operand:SI 2 aarch64_plus_operand I,r,J)))]
+ (match_operand:SI 1 register_operand %rk,rk,w,rk)
+ (match_operand:SI 2 aarch64_plus_operand I,r,w,J)))]

@
add\\t%w0, %w1, %2
add\\t%w0, %w1, %w2
+  add\\t%0.2s, %1.2s, %2.2s
sub\\t%w0, %w1, #%n2
-  [(set_attr type alu_imm,alu_reg,alu_imm)]
+  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)]
  )
  
Minor nit, you should set the simd attribute to yes for the added 
alternative to make sure it doesn't get selected when !TARGET_SIMD


Thanks,
Kyrill



[PATCH][match-and-simplify] Remove now dead code

2014-06-23 Thread Richard Biener

This removes the non-DT matching code.

Committed.

Richard.

2014-06-23  Richard Biener  rguent...@suse.de

* genmatch.c (operand::gen_gimple_match): Remove.
(predicate::gen_gimple_match): Likewise.
(expr::gen_gimple_match): Likewise.
(c_expr::gen_gimple_match): Likewise.
(capture::gen_gimple_match): Likewise.
(write_nary_simplifiers): Remove.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 211891)
+++ gcc/genmatch.c  (working copy)
@@ -204,7 +204,6 @@ struct operand {
   enum op_type { OP_PREDICATE, OP_EXPR, OP_CAPTURE, OP_C_EXPR };
   operand (enum op_type type_) : type (type_) {}
   enum op_type type;
-  virtual void gen_gimple_match (FILE *f, const char *, const char * = NULL) = 
0;
   virtual void gen_gimple_transform (FILE *f, const char *, const char *) = 0;
 };
 
@@ -212,7 +211,6 @@ struct predicate : public operand
 {
   predicate (const char *ident_) : operand (OP_PREDICATE), ident (ident_) {}
   const char *ident;
-  virtual void gen_gimple_match (FILE *f, const char *, const char *);
   virtual void gen_gimple_transform (FILE *, const char *, const char *) { 
gcc_unreachable (); }
 };
 
@@ -230,7 +228,6 @@ struct expr : public operand
   void append_op (operand *op) { ops.safe_push (op); }
   e_operation *operation;
   vecoperand * ops;
-  virtual void gen_gimple_match (FILE *f, const char *, const char *);
   virtual void gen_gimple_transform (FILE *f, const char *, const char *);
 };
 
@@ -243,7 +240,6 @@ struct c_expr : public operand
   veccpp_token code;
   unsigned nr_stmts;
   char *fname;
-  virtual void gen_gimple_match (FILE *, const char *, const char *) { 
gcc_unreachable (); }
   virtual void gen_gimple_transform (FILE *f, const char *, const char *);
 };
 
@@ -253,7 +249,6 @@ struct capture : public operand
   : operand (OP_CAPTURE), where (where_), what (what_) {}
   const char *where;
   operand *what;
-  virtual void gen_gimple_match (FILE *f, const char *, const char *);
   virtual void gen_gimple_transform (FILE *f, const char *, const char *);
 };
 
@@ -467,145 +462,6 @@ gen_gimple_match_fail (FILE *f, const ch
 }
 
 void
-predicate::gen_gimple_match (FILE *f, const char *op, const char *label)
-{
-  fprintf (f, if (!%s (%s)) , ident, op);
-  gen_gimple_match_fail (f, label);
-}
-
-void
-expr::gen_gimple_match (FILE *f, const char *name, const char *label)
-{
-  if (operation-op-kind == id_base::CODE)
-{
-  operator_id *op = static_cast operator_id * (operation-op);
-  /* The GIMPLE variant.  */
-  fprintf (f, if (TREE_CODE (%s) == SSA_NAME)\n, name);
-  fprintf (f,   {\n);
-  fprintf (f, gimple def_stmt = SSA_NAME_DEF_STMT (%s);\n, name);
-  fprintf (f, if (!is_gimple_assign (def_stmt)\n);
-  if (op-code == NOP_EXPR
- || op-code == CONVERT_EXPR)
-   fprintf (f, || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code 
(def_stmt))) );
-  else
-   fprintf (f, || gimple_assign_rhs_code (def_stmt) != %s) ,  
op-id);
-  gen_gimple_match_fail (f, label);
-  if (op-code == REALPART_EXPR
- || op-code == IMAGPART_EXPR
- || op-code == VIEW_CONVERT_EXPR
- || op-code == BIT_FIELD_REF)
-   {
- fprintf (f, tree rhs = gimple_assign_rhs1 (def_stmt);\n);
- for (unsigned i = 0; i  ops.length (); ++i)
-   {
- fprintf (f,{\n);
- fprintf (f,  tree op = TREE_OPERAND (rhs, %d);\n, i);
- fprintf (f,  if (valueize  TREE_CODE (op) == SSA_NAME)\n);
- fprintf (f,{\n);
- fprintf (f,  op = valueize (op);\n);
- fprintf (f,  if (!op) );
- gen_gimple_match_fail (f, label);
- fprintf (f,}\n);
- ops[i]-gen_gimple_match (f, op, label);
- fprintf (f,}\n);
-   }
-   }
-  else
-   {
- for (unsigned i = 0; i  ops.length (); ++i)
-   {
- fprintf (f,{\n);
- fprintf (f,  tree op = gimple_assign_rhs%d (def_stmt);\n, i 
+ 1);
- fprintf (f,  if (valueize  TREE_CODE (op) == SSA_NAME)\n);
- fprintf (f,{\n);
- fprintf (f,  op = valueize (op);\n);
- fprintf (f,  if (!op) );
- gen_gimple_match_fail (f, label);
- fprintf (f,}\n);
- ops[i]-gen_gimple_match (f, op, label);
- fprintf (f,}\n);
-   }
-   }
-  fprintf (f,   }\n);
-  /* The GENERIC variant.  */
-  fprintf (f, else if (TREE_CODE (%s) == %s)\n, name, op-id);
-  fprintf (f,   {\n);
-  for (unsigned i = 0; i  ops.length (); ++i)
-   {
- fprintf (f,{\n);
- fprintf (f,  tree op_ = %s;\n, name);
- fprintf (f,  tree op = TREE_OPERAND (op_, %d);\n, i);
- fprintf (f,  if (valueize  

[GSoC][match-and-simplify] mark some more operators as commutative

2014-06-23 Thread Prathamesh Kulkarni
* match.pd: Mark operators in some bitwise and plus-minus
patterns to be commutative.

Thanks and Regards,
Prathamesh
Index: gcc/match.pd
===
--- gcc/match.pd	(revision 211893)
+++ gcc/match.pd	(working copy)
@@ -138,7 +138,7 @@ along with GCC; see the file COPYING3.
   (minus (plus @0 @1) @1)
   @0)
 (match_and_simplify
-  (plus (minus @0 @1) @1)
+  (plus:c (minus @0 @1) @1)
   @0)
 /* (CST +- A) +- CST - CST' +- A.  */
 /* match_and_simplify handles constant folding for us so we can
@@ -176,7 +176,7 @@ along with GCC; see the file COPYING3.
 
 /* A - (A +- B) - -+ B */
 (match_and_simplify
-  (minus @0 (plus @0 @1))
+  (minus @0 (plus:c @0 @1))
   (negate @0))
 
 (match_and_simplify
@@ -285,25 +285,25 @@ along with GCC; see the file COPYING3.
 
 /* x  ~x - 0 */
 (match_and_simplify
-  (bit_and @0 (bit_not @0))
+  (bit_and:c @0 (bit_not @0))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   { build_int_cst (type, 0); })
 
 /* ~x  ~y - ~(x | y) */
 (match_and_simplify
-  (bit_and (bit_not @0) (bit_not @1))
+  (bit_and:c (bit_not @0) (bit_not @1))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_not (bit_ior @0 @1)))
 
 /* ~x | ~y - ~(x  y) */
 (match_and_simplify
-  (bit_ior (bit_not @0) (bit_not @1))
+  (bit_ior:c (bit_not @0) (bit_not @1))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_not (bit_and @0 @1)))
 
 /* x  (~x | y) - y  x */
 (match_and_simplify
-  (bit_and @0 (bit_ior (bit_not @0) @1))
+  (bit_and:c @0 (bit_ior:c (bit_not @0) @1))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_and @1 @0))
 
@@ -320,25 +320,25 @@ along with GCC; see the file COPYING3.
 
 /* (x | y)  x - x */
 (match_and_simplify
-  (bit_and (bit_ior @0 @1) @0)
+  (bit_and:c (bit_ior:c @0 @1) @0)
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   @0)
 
 /* (x  y) | x - x */
 (match_and_simplify
-  (bit_ior (bit_and @0 @1) @0)
+  (bit_ior:c (bit_and:c @0 @1) @0)
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   @0)
 
 /* (~x | y)  x - x  y */
 (match_and_simplify
-  (bit_and (bit_ior (bit_not @0) @1) @0)
+  (bit_and:c (bit_ior:c (bit_not @0) @1) @0)
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_and @0 @1))
 
 /* (~x  y) | x - x | y */
 (match_and_simplify
-  (bit_ior (bit_and (bit_not @0) @1) @0)
+  (bit_ior:c (bit_and:c (bit_not @0) @1) @0)
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_ior @0 @1))
 
@@ -350,7 +350,7 @@ along with GCC; see the file COPYING3.
 
 /* ((a  b)  ~a) - 0 */
 (match_and_simplify
-  (bit_and (bit_and @0 @1) (bit_not @0))
+  (bit_and:c (bit_and:c @0 @1) (bit_not @0))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   { build_int_cst (type, 0); })
 


RE: [PATCH] Change default for --param allow-...-data-races to off

2014-06-23 Thread Bernd Edlinger
Hi Martin,


 Well actually, I am not sure if we ever wanted to have a race condition here.
 Have you seen any impact of --param allow-store-data-races on any benchmark?

 It's trivially to write one. The only pass that checks the param is
 tree loop invariant motion and it does that when it applies store-motion.
 Register pressure increase is increased by a factor of two.

 So I'd agree that we might want to disable this again for -Ofast.

 As nothing tests for the PACKED variants nor for the LOAD variant
 I'd rather remove those. Claiming we don't create races for those
 when you disable it via the param is simply not true.

 Thanks,
 Richard.


OK, please go ahead with your patch.

Thanks
Bernd.
  

Re: [GSoC] Addition of ISL AST generation to Graphite

2014-06-23 Thread Roman Gareev
 It seems the patch1/patch2 files you attach have the Content-Type:
 application/octet-stream. This makes it impossible to view them inline.
 Could you send them as text files? Just calling them patch1.patch or
 patch1.txt should make this work.

Yes, sure.
diff --git a/gcc/graphite-clast-to-gimple.c b/gcc/graphite-clast-to-gimple.c
index 9ac9b67..49b7bc6 100644
--- a/gcc/graphite-clast-to-gimple.c
+++ b/gcc/graphite-clast-to-gimple.c
@@ -109,7 +109,7 @@ value_max (mpz_t res, mpz_t v1, mpz_t v2)
 
 /* This flag is set when an error occurred during the translation of
CLAST to Gimple.  */
-static bool gloog_error;
+static bool graphite_regenerate_error;
 
 /* Verifies properties that GRAPHITE should maintain during translation.  */
 
@@ -363,7 +363,7 @@ max_precision_type (tree type1, tree type2)
 
   if (precision  BITS_PER_WORD)
 {
-  gloog_error = true;
+  graphite_regenerate_error = true;
   return integer_type_node;
 }
 
@@ -373,7 +373,7 @@ max_precision_type (tree type1, tree type2)
 
   if (!type)
 {
-  gloog_error = true;
+  graphite_regenerate_error = true;
   return integer_type_node;
 }
 
@@ -456,7 +456,7 @@ clast_to_gcc_expression (tree type, struct clast_expr *e, 
ivs_params_p ip)
if (!POINTER_TYPE_P (type))
  return fold_build2 (MULT_EXPR, type, cst, name);
 
-   gloog_error = true;
+   graphite_regenerate_error = true;
return cst;
  }
  }
@@ -535,7 +535,7 @@ type_for_interval (mpz_t bound_one, mpz_t bound_two)
 
   if (precision  BITS_PER_WORD)
 {
-  gloog_error = true;
+  graphite_regenerate_error = true;
   return integer_type_node;
 }
 
@@ -558,7 +558,7 @@ type_for_interval (mpz_t bound_one, mpz_t bound_two)
 
   if (!type)
 {
-  gloog_error = true;
+  graphite_regenerate_error = true;
   return integer_type_node;
 }
 
@@ -1112,7 +1112,7 @@ translate_clast_user (struct clast_user_stmt *stmt, edge 
next_e,
 
   build_iv_mapping (iv_map, stmt, ip);
   next_e = copy_bb_and_scalar_dependences (GBB_BB (gbb), ip-region,
-  next_e, iv_map, gloog_error);
+  next_e, iv_map, 
graphite_regenerate_error);
   iv_map.release ();
 
   new_bb = next_e-src;
@@ -1488,7 +1488,7 @@ build_cloog_union_domain (scop_p scop, int 
nb_scattering_dims)
   return union_domain;
 }
 
-/* Return the options that will be used in GLOOG.  */
+/* Return the options that will be used in graphite_regenerate_ast_cloog.  */
 
 static CloogOptions *
 set_cloog_options (void)
@@ -1503,7 +1503,7 @@ set_cloog_options (void)
   /* Enable complex equality spreading: removes dummy statements
  (assignments) in the generated code which repeats the
  substitution equations for statements.  This is useless for
- GLooG.  */
+ graphite_regenerate_ast_cloog.  */
   options-esp = 1;
 
   /* Silence CLooG to avoid failing tests due to debug output to stderr.  */
@@ -1663,7 +1663,7 @@ debug_generated_program (scop_p scop)
 */
 
 bool
-gloog (scop_p scop, bb_pbb_htab_type bb_pbb_mapping)
+graphite_regenerate_ast_cloog (scop_p scop, bb_pbb_htab_type bb_pbb_mapping)
 {
   auto_vectree, 10 newivs;
   loop_p context_loop;
@@ -1674,7 +1674,7 @@ gloog (scop_p scop, bb_pbb_htab_type bb_pbb_mapping)
   struct ivs_params ip;
 
   timevar_push (TV_GRAPHITE_CODE_GEN);
-  gloog_error = false;
+  graphite_regenerate_error = false;
 
   params_index.create (10);
 
@@ -1714,7 +1714,7 @@ gloog (scop_p scop, bb_pbb_htab_type bb_pbb_mapping)
   recompute_all_dominators ();
   graphite_verify ();
 
-  if (gloog_error)
+  if (graphite_regenerate_error)
 set_ifsese_condition (if_region, integer_zero_node);
 
   free (if_region-true_region);
@@ -1739,6 +1739,6 @@ gloog (scop_p scop, bb_pbb_htab_type bb_pbb_mapping)
   num_no_dependency);
 }
 
-  return !gloog_error;
+  return !graphite_regenerate_error;
 }
 #endif
diff --git a/gcc/graphite-clast-to-gimple.h b/gcc/graphite-clast-to-gimple.h
index fc5a679..615cae8 100644
--- a/gcc/graphite-clast-to-gimple.h
+++ b/gcc/graphite-clast-to-gimple.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_GRAPHITE_CLAST_TO_GIMPLE_H
 #define GCC_GRAPHITE_CLAST_TO_GIMPLE_H
 
+#include graphite-htab.h
+
 extern CloogState *cloog_state;
 
 /* Data structure for CLooG program representation.  */
@@ -30,14 +32,7 @@ struct cloog_prog_clast {
   struct clast_stmt *stmt;
 };
 
-/* Stores BB's related PBB.  */
-
-struct bb_pbb_def
-{
-  basic_block bb;
-  poly_bb_p pbb;
-};
-
+extern bool graphite_regenerate_ast_cloog (scop_p, bb_pbb_htab_type);
 extern void debug_clast_stmt (struct clast_stmt *);
 extern void print_clast_stmt (FILE *, struct clast_stmt *);
 
diff --git a/gcc/graphite-htab.h b/gcc/graphite-htab.h
index d67dd0c..9f31fac 100644
--- a/gcc/graphite-htab.h
+++ b/gcc/graphite-htab.h
@@ -22,7 +22,14 @@ along with 

[GSoC][match-and-simplify] Remove gen_gimple_match_fail

2014-06-23 Thread Prathamesh Kulkarni
* genmatch.c (gen_gimple_match_fail): Remove.
  (expr::gen_gimple_transform): Remove call to gen_gimple_match_fail.
Change fprintf (f, if (!res)) to fprintf (f, if (!res) return false;\n)

Thanks and Regards,
Prathamesh
Index: gcc/genmatch.c
===
--- gcc/genmatch.c	(revision 211893)
+++ gcc/genmatch.c	(working copy)
@@ -452,15 +452,6 @@ commutate (operand *op)
 
 /* Code gen off the AST.  */
 
-static void
-gen_gimple_match_fail (FILE *f, const char *label)
-{
-  if (!label)
-fprintf (f, return NULL_TREE;\n);
-  else
-fprintf (f, goto %s;\n, label);
-}
-
 void
 expr::gen_gimple_transform (FILE *f, const char *label, const char *dest)
 {
@@ -481,8 +472,7 @@ expr::gen_gimple_transform (FILE *f, con
   for (unsigned i = 0; i  ops.length (); ++i)
 fprintf (f, , ops[%u], i);
   fprintf (f, , seq, valueize);\n);
-  fprintf (f,   if (!res) );
-  gen_gimple_match_fail (f, label);
+  fprintf (f,   if (!res) return false;\n);
   fprintf (f, }\n);
   fprintf (f,   else\n);
   fprintf (f, res = gimple_build (seq, UNKNOWN_LOCATION, %s, 


Re: [PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-23 Thread Richard Henderson
On 06/23/2014 02:29 AM, Ramana Radhakrishnan wrote:
 
 
 On 20/06/14 21:28, Richard Henderson wrote:
 There aren't too many users of the cmpelim pass, and previously they were all
 small embedded targets without an FPU.

 I'm a bit surprised that Ramana decided to enable this pass for aarch64, as
 that target is not so limited as the block comment for the pass describes.
 Honestly, whatever is being deleted here ought to have been found earlier,
 either via combine or cse.  We ought to find out why any changes are made
 during this pass for aarch64.
 
 Agreed - Going back and looking at my notes I remember seeing a difference in
 code generation with the elimination of a number of compares that prompted me
 to turn this on in a number of benchmarks. I don't remember double checking 
 why
 CSE hadn't removed that at that time. This also probably explains the
 equivalent patch for ARM and Thumb2 hasn't shown demonstrable differences.
 
 Investigating this pass for Thumb1 may be interesting.

Isn't it true that thumb1 has only adds r,r,#i not add r,r,#i?
Lack of an addition that doesn't clobber the flags is one of the two
reasons why you'd want to enable the cmpelim pass.

Although for thumb1, can't you achieve a no-clobber addition with an
IT block with an always condition?  Seems like a good use for the addptr
pattern that Andreas Krebbel recently added for s390.  That might let you
make thumb1 look more like the rest of the arm port and emit separate compare
and branch instructions from the start.


r~


Re: [GSoC][match-and-simplify] mark some more operators as commutative

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 3:32 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 * match.pd: Mark operators in some bitwise and plus-minus
 patterns to be commutative.

 /* A - (A +- B) - -+ B */
 (match_and_simplify
-  (minus @0 (plus @0 @1))
+  (minus @0 (plus:c @0 @1))
   (negate @0))

seems pointless

 /* ~x  ~y - ~(x | y) */
 (match_and_simplify
-  (bit_and (bit_not @0) (bit_not @1))
+  (bit_and:c (bit_not @0) (bit_not @1))
   if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
   (bit_not (bit_ior @0 @1)))

likewise.

I have removed the pointless ones and committed the patch.

Richard.

 Thanks and Regards,
 Prathamesh


Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Richard Henderson
On 06/20/2014 02:59 PM, Kai Tietz wrote:
 So I suggest following change of passes.def:
 
 Index: passes.def
 ===
 --- passes.def  (Revision 211850)
 +++ passes.def  (Arbeitskopie)
 @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_rtl_dse2);
   NEXT_PASS (pass_stack_adjustments);
   NEXT_PASS (pass_jump2);
 - NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_if_after_reload);
   NEXT_PASS (pass_regrename);
   NEXT_PASS (pass_cprop_hardreg);
 @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fast_rtl_dce);
   NEXT_PASS (pass_duplicate_computed_gotos);
   NEXT_PASS (pass_reorder_blocks);
 + NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_branch_target_load_optimize2);
   NEXT_PASS (pass_leaf_regs);
   NEXT_PASS (pass_split_before_sched2);

Looks good to me.  I guess just keep an eye out for bug reports for other ports.


r~


Re: [GSoC][match-and-simplify] mark some more operators as commutative

2014-06-23 Thread Marc Glisse

On Mon, 23 Jun 2014, Richard Biener wrote:


On Mon, Jun 23, 2014 at 3:32 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:

* match.pd: Mark operators in some bitwise and plus-minus
patterns to be commutative.


/* A - (A +- B) - -+ B */
(match_and_simplify
-  (minus @0 (plus @0 @1))
+  (minus @0 (plus:c @0 @1))
  (negate @0))

seems pointless


Why? a-(a+b) and a-(b+a) are both wanted and don't appear elsewhere in the 
file, no? Should simplify to (negate @1) though.


--
Marc Glisse


Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification

2014-06-23 Thread Richard Henderson
On 06/20/2014 01:42 PM, Marek Polacek wrote:
 2014-06-20  Marek Polacek  pola...@redhat.com
 
   * genpreds.c (verify_rtx_codes): New function.
   (main): Call it.
   * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define.
   (struct rtx_def): Use them.

Looks pretty good.  Just a few nits.

 +static void
 +verify_rtx_codes (void)
 +{
 +  unsigned int i, j;
 +
 +  for (i = 0; i  NUM_RTX_CODE; i++)
 +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL)
 +  {
 + if (strlen (GET_RTX_FORMAT (i))  RTX_FLD_WIDTH)
 +   internal_error (%s format %s longer than RTX_FLD_WIDTH %d\n,
 +   GET_RTX_NAME (i), GET_RTX_FORMAT (i),
 +   (int) RTX_FLD_WIDTH);
 +  }
 +else
 +  {
 + const size_t len = strlen (GET_RTX_FORMAT (i));

The strlen result is used in both arms of the if.  Tidier to hoist it, I think.

 + for (j = 0; j  len; j++)
 +   if (GET_RTX_FORMAT (i)[j] != 'w')
 + internal_error (%s format %s should contain only w, but 
 + has %c\n, GET_RTX_NAME (i), GET_RTX_FORMAT (i),
 + GET_RTX_FORMAT (i)[j]);

The loop is strspn.  Perhaps tidier as

  const size_t spn = strspn(GET_RTX_FORMAT (i), w);
  if (spn != len)
internal_error (...);


r~



Re: [PATCH] Fix forwporp pattern (T)(P + A) - (T)P - (T)A

2014-06-23 Thread Eric Botcazou
 I noticed that several testcases in the GMP-4.3.2 test suite are failing now
 which did not happen with GCC 4.9.0.  I debugged the first one,
 mpz/convert, and found the file mpn/generic/get_str.c was miscompiled.
 
 mpn/get_str.c.132t.dse2:
   pretmp_183 = (sizetype) chars_per_limb_80;
   pretmp_184 = -pretmp_183;
   _23 = chars_per_limb_80 + 4294967295;
   _68 = (sizetype) _23;
   _28 = _68 + pretmp_184;
 
 mpn/get_str.c.133t.forwprop4:
   _28 = 4294967295;
 
 
 That is wrong, because chars_per_limb is unsigned, and it is not zero.
 So the right result should be -1.  This makes the loop termination in that
 function fail.

Can't we compute the right result in this case?  4294967295 is almost -1.

 The attached patch fixes these regressions, and because the reasoning
 depends on the TYPE_OVERFLOW_UNDEFINED attribute, a strict overflow warning
 has to be emitted here, at least for widening conversions.

Saturating, floating-point and fixed-point types are already excluded here, 
see the beginning of the function. 

-- 
Eric Botcazou


RE: [PATCH] Fix forwporp pattern (T)(P + A) - (T)P - (T)A

2014-06-23 Thread Bernd Edlinger
Hi,

On Mon, 23 Jun 2014 10:40:53, Richard Biener wrote:

 On Sun, Jun 22, 2014 at 9:14 AM, Bernd Edlinger
 bernd.edlin...@hotmail.de wrote:
 Hi,

 I noticed that several testcases in the GMP-4.3.2 test suite are failing now 
 which
 did not happen with GCC 4.9.0. I debugged the first one, mpz/convert, and 
 found
 the file mpn/generic/get_str.c was miscompiled.

 mpn/get_str.c.132t.dse2:
 pretmp_183 = (sizetype) chars_per_limb_80;
 pretmp_184 = -pretmp_183;
 _23 = chars_per_limb_80 + 4294967295;
 _68 = (sizetype) _23;
 _28 = _68 + pretmp_184;

 mpn/get_str.c.133t.forwprop4:
 _28 = 4294967295;


 That is wrong, because chars_per_limb is unsigned, and it is not zero.
 So the right result should be -1. This makes the loop termination in that
 function fail.

 The reason for this is in this check-in:

 r210807 | ebotcazou | 2014-05-22 16:32:56 +0200 (Thu, 22 May 2014) | 3 lines

 * tree-ssa-forwprop.c (associate_plusminus): Extend (T)(P + A) - (T)P
 - (T)A transformation to integer types.


 Because it implicitly assumes that integer overflow is not allowed with all 
 types,
 including unsigned int.

 Hmm? But the transform is correct if overflow wraps. And it's correct if
 overflow is undefined as well, as (T)A is always well-defined (implementation
 defined) if it is a truncation.


we have no problem when the cast from (P + A) to T is a truncation, except if
the add operation P + A is saturating.
 
 So we match the above an try to transform it to (T)P + (T)A - (T)P. That's
 wrong if the conversion is extending I think.


Yes, in a way.  But OTOH, Eric's test case opt37.adb, fails if we simply punt 
here.

Fortunately, with opt37.adb, P and A are signed 32-bit integers, and T is 
size_t (64 bit)
and because the overflow of P + A causes undefined behaviour, we can assume that
P + A _did_ not overflow, and therefore the transformation (T)(P + A) == (T)P + 
(T)A
is correct (but needs a strict overflow warning), and we still can use the 
pattern
(T)P + (T)A - (T)P - (T)A in this case.

But we cannot use this transformation, as the attached test case demonstrates
when P + A is done in unsigned integers, because the result of the addition is
different if it is done in unsigned int with allowed overflow, or in long 
without
overflow.

 Richard.



 The attached patch fixes these regressions, and because the reasoning depends
 on the TYPE_OVERFLOW_UNDEFINED attribute, a strict overflow warning has to be
 emitted here, at least for widening conversions.


 Boot-strapped and regression-tested on x86_64-linux-gnu with all languages, 
 including Ada.
 OK for trunk?

 + if (!TYPE_SATURATING (TREE_TYPE (a))

 this is already tested at the very beginning of the function.


We have done TYPE_SATURATING (TREE_TYPE (rhs1)) that refers to T,
but I am concerned about the inner addition operation here,
and if it is done in a saturating way.


 +  !FLOAT_TYPE_P (TREE_TYPE (a))
 +  !FIXED_POINT_TYPE_P (TREE_TYPE (a))

 likewise.

Well, maybe this cannot happen, because if we have P + A, computed in float,
and T an integer type, then probably CONVERT_EXPR_CODE_P (def_code)
will not match, because def_code is FIX_TRUNC_EXPR in that case?

OTOH it does not hut to check that, becaue A's type may be quite different
than rhs1's type.


 + || (!POINTER_TYPE_P (TREE_TYPE (p))
 +  INTEGRAL_TYPE_P (TREE_TYPE (a))
 +  TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (a)))

 INTEGRAL_TYPE_P are always !POINTER_TYPE_P.



We come here, either because P + A is a POINTER_PLUS_EXPR or because P + A is a 
PLUS_EXPR.

In the first case, P's type is POINTER_TYPE_P and A's type is INTEGRAL_TYPE_P
so this should not check the TYPE_OVERFLOW_UNDEFINED, but instead
the POINTER_TYPE_OVERFLOW_UNDEFINED.

Also with undefined pointer wraparound, we can exploit that in the same way as 
with
signed integers.

But I am concerned, if (T)A is always the same thing as (T)(void*)A.
I'd say, yes, if TYPE_UNSIGNED (TREE_TYPE (p)) == TYPE_UNSIGNED (TREE_TYPE (a))
or if A is a constant, and it is positive.



Thanks
Bernd.

  

[AArch64] Implement some vca*_f[32,64] intrinsics

2014-06-23 Thread Kyrill Tkachov

Hi all,

This patch implements some absolute compare intrinsics in arm_neon.h.

Execution tests are added.
Tested aarch64-none-elf, aarch64_be-none-elf, bootstrapped on aarch64 linux

Ok for trunk?

2014-06-23  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/arm_neon.h (vcage_f64): New intrinsic.
(vcagt_f64): Likewise.
(vcale_f64): Likewise.
(vcaled_f64): Likewise.
(vcales_f32): Likewise.
(vcalt_f64): Likewise.
(vcaltd_f64): Likewise.
(vcalts_f32): Likewise.

2014-06-23  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.target/aarch64/simd/vcage_f64.c: New test.
* gcc.target/aarch64/simd/vcagt_f64.c: Likewise.
* gcc.target/aarch64/simd/vcale_f64.c: Likewise.
* gcc.target/aarch64/simd/vcaled_f64.c: Likewise.
* gcc.target/aarch64/simd/vcales_f32.c: Likewise.
* gcc.target/aarch64/simd/vcalt_f64.c: Likewise.
* gcc.target/aarch64/simd/vcaltd_f64.c: Likewise.
* gcc.target/aarch64/simd/vcalts_f32.c: Likewise.commit f87169afe8bd853b4aa5ab3ed5e270de0cb95461
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Thu Jun 19 09:37:54 2014 +0100

[AArch64] Implement vc* intrinsics

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 0ff6996..8fa469a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13823,6 +13823,12 @@ vaesimcq_u8 (uint8x16_t data)
 
 /* vcage  */
 
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcage_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return vabs_f64 (__a) = vabs_f64 (__b);
+}
+
 __extension__ static __inline uint32_t __attribute__ ((__always_inline__))
 vcages_f32 (float32_t __a, float32_t __b)
 {
@@ -13867,6 +13873,12 @@ vcagt_f32 (float32x2_t __a, float32x2_t __b)
   return vabs_f32 (__a)  vabs_f32 (__b);
 }
 
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcagt_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return vabs_f64 (__a)  vabs_f64 (__b);
+}
+
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcagtq_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -13893,6 +13905,24 @@ vcale_f32 (float32x2_t __a, float32x2_t __b)
   return vabs_f32 (__a) = vabs_f32 (__b);
 }
 
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcale_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return vabs_f64 (__a) = vabs_f64 (__b);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcaled_f64 (float64_t __a, float64_t __b)
+{
+  return __builtin_fabs (__a) = __builtin_fabs (__b) ? -1 : 0;
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcales_f32 (float32_t __a, float32_t __b)
+{
+  return __builtin_fabsf (__a) = __builtin_fabsf (__b) ? -1 : 0;
+}
+
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcaleq_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -13913,6 +13943,18 @@ vcalt_f32 (float32x2_t __a, float32x2_t __b)
   return vabs_f32 (__a)  vabs_f32 (__b);
 }
 
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcalt_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return vabs_f64 (__a)  vabs_f64 (__b);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+vcaltd_f64 (float64_t __a, float64_t __b)
+{
+  return __builtin_fabs (__a)  __builtin_fabs (__b) ? -1 : 0;
+}
+
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vcaltq_f32 (float32x4_t __a, float32x4_t __b)
 {
@@ -13925,6 +13967,12 @@ vcaltq_f64 (float64x2_t __a, float64x2_t __b)
   return vabsq_f64 (__a)  vabsq_f64 (__b);
 }
 
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcalts_f32 (float32_t __a, float32_t __b)
+{
+  return __builtin_fabsf (__a)  __builtin_fabsf (__b) ? -1 : 0;
+}
+
 /* vceq - vector.  */
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vcage_f64.c b/gcc/testsuite/gcc.target/aarch64/simd/vcage_f64.c
new file mode 100644
index 000..67dc8ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vcage_f64.c
@@ -0,0 +1,42 @@
+/* Test the vcage_f64 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options -save-temps -O3 } */
+
+#include arm_neon.h
+#include stdio.h
+
+#define SIZE 6
+
+extern void abort (void);
+
+volatile float64_t in[SIZE] = { -10.4, -3.14, 0.0, 1.5, 5.3, 532.3 };
+
+int
+main (void)
+{
+  uint64_t expected;
+  uint64_t actual;
+  float64x1_t arg1, arg2;
+  int i, j;
+
+  for (i = 0; i  SIZE; ++i)
+   for (j = 0; j  SIZE; ++j)
+ {
+expected = __builtin_fabs (in[i]) = __builtin_fabs (in[j]) ? -1 : 0;
+arg1 = (float64x1_t) { in[i] };
+arg2 = (float64x1_t) { in[j] };
+actual = vget_lane_u64 (vcage_f64 (arg1, arg2), 0);
+
+if (actual != expected)
+  {
+fprintf (stderr, Expected: %ld, got %ld\n, expected, actual);
+   

Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson r...@redhat.com wrote:
 On 06/20/2014 02:59 PM, Kai Tietz wrote:
 So I suggest following change of passes.def:

 Index: passes.def
 ===
 --- passes.def  (Revision 211850)
 +++ passes.def  (Arbeitskopie)
 @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_rtl_dse2);
   NEXT_PASS (pass_stack_adjustments);
   NEXT_PASS (pass_jump2);
 - NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_if_after_reload);
   NEXT_PASS (pass_regrename);
   NEXT_PASS (pass_cprop_hardreg);
 @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fast_rtl_dce);
   NEXT_PASS (pass_duplicate_computed_gotos);
   NEXT_PASS (pass_reorder_blocks);
 + NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_branch_target_load_optimize2);
   NEXT_PASS (pass_leaf_regs);
   NEXT_PASS (pass_split_before_sched2);

 Looks good to me.  I guess just keep an eye out for bug reports for other 
 ports.

Maybe put a comment here because it looks like a random placement to me
which would be obvious to revert.  peepholing before if-after-reload sounds
good anyway.

Did you test effect on code-generation of this change on other targets?

Btw, there is now no DCE after peephole2?  Is peephole2 expected to
cleanup after itself?

Richard.


 r~


Re: [GSoC][match-and-simplify] Remove gen_gimple_match_fail

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 3:43 PM, Prathamesh Kulkarni
bilbotheelffri...@gmail.com wrote:
 * genmatch.c (gen_gimple_match_fail): Remove.
   (expr::gen_gimple_transform): Remove call to gen_gimple_match_fail.
 Change fprintf (f, if (!res)) to fprintf (f, if (!res) return 
 false;\n)

Thanks, committed.

Richard.

 Thanks and Regards,
 Prathamesh


Re: [GSoC][match-and-simplify] mark some more operators as commutative

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 4:23 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 23 Jun 2014, Richard Biener wrote:

 On Mon, Jun 23, 2014 at 3:32 PM, Prathamesh Kulkarni
 bilbotheelffri...@gmail.com wrote:

 * match.pd: Mark operators in some bitwise and plus-minus
 patterns to be commutative.


 /* A - (A +- B) - -+ B */
 (match_and_simplify
 -  (minus @0 (plus @0 @1))
 +  (minus @0 (plus:c @0 @1))
   (negate @0))

 seems pointless


 Why? a-(a+b) and a-(b+a) are both wanted and don't appear elsewhere in the
 file, no? Should simplify to (negate @1) though.

Ah, indeed.  So here commutation doesn't work because of correctness.

Richard.

 --
 Marc Glisse


[PATCH] Don't segv on __atomic_store (PR c/61553)

2014-06-23 Thread Marek Polacek
We ICEd on the following testcase since the void type has a NULL
TYPE_SIZE_UNIT.  I took Andrew's patch from gcc@ ML and added
a testcase.

Regtested/bootstrapped on x86_64-linux, ok for trunk?

2014-06-23  Marek Polacek  pola...@redhat.com
Andrew MacLeod  amacl...@redhat.com

PR c/61553
* c-common.c (get_atomic_generic_size): Don't segfault if the
type doesn't have a size.

* c-c++-common/pr61553.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 077263e..087f036 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -10471,7 +10471,8 @@ get_atomic_generic_size (location_t loc, tree function,
function);
  return 0;
}
-  size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (type)));
+  tree type_size = TYPE_SIZE_UNIT (TREE_TYPE (type));
+  size = type_size ? tree_to_uhwi (type_size) : 0;
   if (size != size_0)
{
  error_at (loc, size mismatch in argument %d of %qE, x + 1,
diff --git gcc/testsuite/c-c++-common/pr61553.c 
gcc/testsuite/c-c++-common/pr61553.c
index e69de29..fa97e94 100644
--- gcc/testsuite/c-c++-common/pr61553.c
+++ gcc/testsuite/c-c++-common/pr61553.c
@@ -0,0 +1,8 @@
+/* PR c/61553 */
+/* { dg-do compile } */
+
+void
+foo (char *s)
+{
+  __atomic_store (s, (void *) 0, __ATOMIC_SEQ_CST);
+}

Marek


Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 4:25 PM, Richard Henderson r...@redhat.com wrote:
 On 06/20/2014 01:42 PM, Marek Polacek wrote:
 2014-06-20  Marek Polacek  pola...@redhat.com

   * genpreds.c (verify_rtx_codes): New function.
   (main): Call it.
   * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define.
   (struct rtx_def): Use them.

 Looks pretty good.  Just a few nits.

 +static void
 +verify_rtx_codes (void)
 +{
 +  unsigned int i, j;
 +
 +  for (i = 0; i  NUM_RTX_CODE; i++)
 +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL)
 +  {
 + if (strlen (GET_RTX_FORMAT (i))  RTX_FLD_WIDTH)
 +   internal_error (%s format %s longer than RTX_FLD_WIDTH %d\n,
 +   GET_RTX_NAME (i), GET_RTX_FORMAT (i),
 +   (int) RTX_FLD_WIDTH);
 +  }
 +else
 +  {
 + const size_t len = strlen (GET_RTX_FORMAT (i));

 The strlen result is used in both arms of the if.  Tidier to hoist it, I 
 think.

 + for (j = 0; j  len; j++)
 +   if (GET_RTX_FORMAT (i)[j] != 'w')
 + internal_error (%s format %s should contain only w, but 
 + has %c\n, GET_RTX_NAME (i), GET_RTX_FORMAT (i),
 + GET_RTX_FORMAT (i)[j]);

 The loop is strspn.  Perhaps tidier as

   const size_t spn = strspn(GET_RTX_FORMAT (i), w);
   if (spn != len)
 internal_error (...);

Note that RTX_HWINT_WIDTH is wrong because of CONST_WIDE_INT.

Also I don't like increasing the array sizes - this is wrong in the same
way as [1] is.

Can we instead refactor expmed.c to avoid allocating rtx_def directly?
Like by using rtx in init_expmed_rtl and allocating from an obstack
(or not care and GC-allocate anyway).

Richard.


 r~



Re: [C/C++ PATCH] Add -Wlogical-not-parentheses (PR c/49706)

2014-06-23 Thread Joseph S. Myers
On Mon, 23 Jun 2014, Marek Polacek wrote:

 I think the latter is better, incidentally, g++ doesn't warn either.
 The following one liner makes cc1 behave as cc1plus.  Thanks for the
 report.
 
 Regtested/bootstrapped on x86_64.  Joseph, is this ok?
 
 2014-06-23  Marek Polacek  pola...@redhat.com
 
   * c-typeck.c (parser_build_binary_op): Don't call
   warn_logical_not_parentheses if the RHS is TRUTH_NOT_EXPR.
 
   * c-c++-common/pr49706-2.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch ARM/testsuite 00/22] Neon intrinsics executable tests

2014-06-23 Thread Christophe Lyon
On 11 June 2014 00:03, Ramana Radhakrishnan ramana@googlemail.com wrote:
 On Thu, Jun 5, 2014 at 11:04 PM, Christophe Lyon
 christophe.l...@linaro.org wrote:
 This is patch series is a more complete version of the patch I sent
 some time ago:
 https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00624.html

 I have created a series of patches to help review.  The 1st one adds
 some documentation, the common .h files defining helpers used in the
 actual tests, and two real tests (vaba and vld1) to show how the
 various macros are used.

 The next patches add other tests (grouped when they use a common
 framework).

 Looking at the .exp file, you'll notice that the tests are performed twice:
 * once using c-torture-execute to make sure they execute correctly
   under various levels of optimization. In this case dejagnu
   directives embedded in each .c test file are ignored.

 * once using gcc-dg-runtest, which enables compiling with various
   optimization levels and scanning the generated assembly for some
   code sequences. Currently, only the vadd test contains some
   scan-assembler-times directives, as an example. We can add such
   directives to other tests later.


 Regarding the results of these tests on target
 arm-none-linux-gnueabihf, note that:
 * vclz tests currently fail at optimization levels starting with -O1
 * vqadd test fails when compiled with -Os
 * vadd scan-assembler fails for vadd.i64 (because the compiler uses
   core registers instead of Neon ones. Not sure if this should be
   considered as a bug or if the test should be changed)
 * this gives 1164 PASS and 18 FAIL


 I am a bit ambivalent between getting folks to add scan-assembler
 tests here and worrying between this and getting the behaviour
 correct. Additionally if you add the complexity of scanning for
 aarch64 as well this starts getting messy.

 At this point I'm going to wait to see if any of the testsuite
 maintainers step in and comment and if not I'll start looking at this
 properly early next week.

 regards
 Ramana


Hi Ramana,

Did you have time to look at this patch series?

Thanks



 I have not looked at the results in detail on other arm* and aarch64*
 targets, but there are some other failures.

 I have many more tests to convert (currently 40 done, 96 remain), and
 my plan is to work on the rest once this set has been accepted.

 As of the ChangeLog entry, this patch only adds new files in
 testsuite/gcc.target/arm/neon-intrinsics (which is new too).

 OK for trunk?

 Thanks,

 Christophe.

 Christophe Lyon (22):
   Neon intrinsics execution tests initial framework.
   Add unary operators: vabs and vneg.
   Add binary operators: vadd, vand, vbic, veor, vorn, vorr, vsub.
   Add comparison operators: vceq, vcge, vcgt, vcle and vclt.
   Add comparison operators with floating-point operands: vcage, vcagt,
   vcale and cvalt.
   Add unary saturating operators: vqabs and vqneg.
   Add binary saturating operators: vqadd, vqsub.
   Add vabal tests.
   Add vabd tests.
   Add vabdl tests.
   Add vaddhn tests.
   Add vaddl tests.
   Add vaddw tests.
   Add vbsl tests.
   Add vclz tests.
   Add vdup and vmov tests.
   Add vld1_dup tests.
   Add vld2/vld3/vld4 tests.
   Add vld2_lane, vld3_lane and vld4_lane tests.
   Add vmul tests.
   Add vshl tests.
   Add vuzp and vzip tests.


[PATCH] gcc: fix segfault from calling free on non-malloc'd area

2014-06-23 Thread Paul Gortmaker
We see the following on a 32bit gcc installed on 64 bit host:

  Reading symbols from ./i586-pokymllib32-linux-gcc...done.
  (gdb) run
  Starting program: 
x86-pokymllib32-linux/lib32-gcc/4.9.0-r0/image/usr/bin/i586-pokymllib32-linux-gcc

  Program received signal SIGSEGV, Segmentation fault.
  0xf7e957e0 in free () from /lib/i386-linux-gnu/libc.so.6
  (gdb) bt
  #0  0xf7e957e0 in free () from /lib/i386-linux-gnu/libc.so.6
  #1  0x0804b73c in set_multilib_dir () at gcc-4.9.0/gcc/gcc.c:7827
  #2  main (argc=1, argv=0xd504) at gcc-4.9.0/gcc/gcc.c:6688
  (gdb)

The problem arises because the check on whether we are using
the internal string . or an allocated one is reversed.
We should be calling free() when the string is not equal to
the internal . string.

Signed-off-by: Paul Gortmaker paul.gortma...@windriver.com
---

[Found and fixed on gcc-4.9.0 but applies to git/master too]

 gcc/gcc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 6870a840e1b7..a580975a7057 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -7822,7 +7822,7 @@ set_multilib_dir (void)
 }
 
   if (multilib_dir == NULL  multilib_os_dir != NULL
-   strcmp (multilib_os_dir, .) == 0)
+   strcmp (multilib_os_dir, .) != 0)
 {
   free (CONST_CAST (char *, multilib_os_dir));
   multilib_os_dir = NULL;
-- 
1.9.1



Link not working on www.mail-archive.com

2014-06-23 Thread Lisa
Looks like there is a broken link on 
http://www.mail-archive.com/ftikumt2001@yahoogroups.com/msg00234.html to 
http://www.cs.cmu.edu/books.html. You might also want to add some of these 
resources I found useful...

http://online.sju.edu/resource/engineering-technology/trends-in-instructional-design-and-elearning-solutions

http://thejournal.com/articles/2013/12/13/the-10-biggest-trends-in-ed-tech.aspx


Thanks!

~lisa



Re: [C++ Patch] PR 33101

2014-06-23 Thread Jason Merrill

On 06/22/2014 10:42 AM, Paolo Carlini wrote:

I think the below would be most of it. Today, however, I did some
archeology, noticed that we would essentially revert to pre-PR9278
behavior (thus, per its audit trail, make again unhappy some people in
the template metaprogramming world?!? Was that known to Core when DR577
got resolved?)


It seems to me that the metaprogrammers wanted the typedef to be 
accepted, so this would make them happier.


The patch is OK.

Jason



PR61583, stage2 and stage3 compare failure due to value range loss

2014-06-23 Thread Alan Modra
This fixes a bootstrap compare failure on current mainline and 4.9
branch configured with --disable-checking, caused by losing value
range info when outputting debug info.  Lack of value range info leads
to loop bounds not being calculated, which in turn means a j  n
test is not converted to j != n.  Details in the PR.

Bootstrapped and regression tested powerpc64-linux, and committed
with Jakub's approval.

gcc/
PR bootstrap/61583
* tree-vrp.c (remove_range_assertions): Do not set is_unreachable
to zero on debug statements.
gcc/testsuite/
* gcc.dg/pr61583.c: New.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 211886)
+++ gcc/tree-vrp.c  (working copy)
@@ -6523,8 +6523,9 @@ remove_range_assertions (void)
  }
else
  {
+   if (!is_gimple_debug (gsi_stmt (si)))
+ is_unreachable = 0;
gsi_next (si);
-   is_unreachable = 0;
  }
   }
 }
Index: gcc/testsuite/gcc.dg/pr61583.c
===
--- gcc/testsuite/gcc.dg/pr61583.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr61583.c  (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fcompare-debug } */
+
+void
+f1 (int n, int b)
+{
+  extern void f2 (int);
+  int j;
+
+  if (b)
+n = 1;
+
+  if (n  1)
+__builtin_unreachable ();
+
+  for (j = 0; j  n; j++)
+f2 (j);
+}

-- 
Alan Modra
Australia Development Lab, IBM


[Patch, AArch64] Restructure arm_neon.h vector types' implementation.

2014-06-23 Thread Tejas Belagod


Hi,

Here is a patch that restructures neon builtins to use vector types based on 
standard base types. We previously defined arm_neon.h's neon vector 
types(int8x8_t) using gcc's front-end vector extensions. We now move away from 
that and use types built internally(e.g. __Int8x8_t). These internal types names 
are defined by the AAPCS64 and we build arm_neon.h's public vector types over 
these internal types. e.g.


  typedef __Int8x8_t int8x8_t;

as opposed to

  typedef __builtin_aarch64_simd_qi int8x8_t
__attribute__ ((__vector_size__ (8)));

Impact on mangling:

This patch does away with these builtin scalar types that the vector types were 
based on. These were previously used to look up mangling names. We now use the 
internal vector type names(e.g. __Int8x8_t) to lookup mangling for the 
arm_neon.h-exported vector types. There are a few internal scalar 
types(__builtin_aarch64_simd_oi etc.) that is needed to efficiently implement 
some NEON Intrinsics. These will be declared in the back-end and registered in 
the front-end and aarch64-specific builtin types, but are not user-visible. 
These, along with a few scalar __builtin types that aren't user-visible will 
have implementation-defined mangling. Because we don't have strong-typing across 
all builtins yet, we still have to maintain the old builtin scalar types - they 
will be removed once we move over to a strongly-typed builtin system implemented 
by the qualifier infrastructure.


Marc Glisse's patch in this thread exposed this issue 
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00618.html. I've tested my patch 
with the change that his patch introduced, and it seems to work fine - 
specifically these two lines:


+  for (tree t = registered_builtin_types; t; t = TREE_CHAIN (t))
+emit_support_tinfo_1 (TREE_VALUE (t));

Regressed on aarch64-none-elf. OK for trunk?

Thanks,
Tejas.

gcc/Changelog

2014-06-23  Tejas Belagod  tejas.bela...@arm.com

* config/aarch64/aarch64-builtins.c (aarch64_build_scalar_type): Remove.
(aarch64_scalar_builtin_types, aarch64_simd_type, aarch64_simd_types,
 aarch64_mangle_builtin_scalar_type, aarch64_mangle_builtin_vector_type,
 aarch64_mangle_builtin_type, aarch64_simd_builtin_std_type,
 aarch64_lookup_simd_builtin_type, aarch64_simd_builtin_type,
 aarch64_init_simd_builtin_types,
 aarch64_init_simd_builtin_scalar_types): New.
(aarch64_init_simd_builtins): Refactor.
(aarch64_fold_builtin): Remove redundant defn.
(aarch64_init_crc32_builtins): Use aarch64_simd_builtin_std_type.
* config/aarch64/aarch64-simd-builtin-types.def: New.
* config/aarch64/t-aarch64: Add aarch64-simd-builtin-types.def
dependency.
* config/aarch64/aarch64-protos.h (aarch64_mangle_builtin_type): Export.
* config/aarch64/aarch64-simd-builtins.def: Remove duplicates.
* config/aarch64/aarch64.c (aarch64_simd_mangle_map): Remove.
(aarch64_mangle_type): Refactor.
* config/aarch64/arm_neon.h: Declare vector types based on internal
type-names.diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index a94ef52..1119f33 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -471,256 +471,331 @@ static GTY(()) tree 
aarch64_builtin_decls[AARCH64_BUILTIN_MAX];
 #define NUM_DREG_TYPES 6
 #define NUM_QREG_TYPES 6
 
-/* Return a tree for a signed or unsigned argument of either
-   the mode specified by MODE, or the inner mode of MODE.  */
-tree
-aarch64_build_scalar_type (enum machine_mode mode,
-  bool unsigned_p,
-  bool poly_p)
-{
-#undef INT_TYPES
-#define INT_TYPES \
-  AARCH64_TYPE_BUILDER (QI) \
-  AARCH64_TYPE_BUILDER (HI) \
-  AARCH64_TYPE_BUILDER (SI) \
-  AARCH64_TYPE_BUILDER (DI) \
-  AARCH64_TYPE_BUILDER (EI) \
-  AARCH64_TYPE_BUILDER (OI) \
-  AARCH64_TYPE_BUILDER (CI) \
-  AARCH64_TYPE_BUILDER (XI) \
-  AARCH64_TYPE_BUILDER (TI) \
-
-/* Statically declare all the possible types we might need.  */
-#undef AARCH64_TYPE_BUILDER
-#define AARCH64_TYPE_BUILDER(X) \
-  static tree X##_aarch64_type_node_p = NULL; \
-  static tree X##_aarch64_type_node_s = NULL; \
-  static tree X##_aarch64_type_node_u = NULL;
-
-  INT_TYPES
-
-  static tree float_aarch64_type_node = NULL;
-  static tree double_aarch64_type_node = NULL;
-
-  gcc_assert (!VECTOR_MODE_P (mode));
-
-/* If we've already initialised this type, don't initialise it again,
-   otherwise ask for a new type of the correct size.  */
-#undef AARCH64_TYPE_BUILDER
-#define AARCH64_TYPE_BUILDER(X) \
-  case X##mode: \
-if (unsigned_p) \
-  return (X##_aarch64_type_node_u \
- ? X##_aarch64_type_node_u \
- : X##_aarch64_type_node_u \
- = make_unsigned_type (GET_MODE_PRECISION (mode))); \
-else if (poly_p) \
-   return (X##_aarch64_type_node_p \
- 

Re: calloc = malloc + memset

2014-06-23 Thread Marc Glisse

On Mon, 23 Jun 2014, Jakub Jelinek wrote:


Ok for trunk, sorry for the delay.


Thanks. Richard has moved the passes a bit since then, but I still have 
exactly one spot where the testsuite is ok :-) I need strlen to be after 
dom (for calloc.C) and before vrp (for several strlenopt-*.c). I'll commit 
it tomorrow if there aren't any comments on the pass placement.


2014-06-24  Marc Glisse  marc.gli...@inria.fr

PR tree-optimization/57742
gcc/
* tree-ssa-strlen.c (get_string_length): Ignore malloc.
(handle_builtin_malloc, handle_builtin_memset): New functions.
(strlen_optimize_stmt): Call them.
* passes.def: Move strlen after loop+dom but before vrp.
gcc/testsuite/
* g++.dg/tree-ssa/calloc.C: New testcase.
* gcc.dg/tree-ssa/calloc-1.c: Likewise.
* gcc.dg/tree-ssa/calloc-2.c: Likewise.
* gcc.dg/strlenopt-9.c: Adapt.

--
Marc GlisseIndex: gcc/passes.def
===
--- gcc/passes.def  (revision 211886)
+++ gcc/passes.def  (working copy)
@@ -179,21 +179,20 @@ along with GCC; see the file COPYING3.
 DOM and erroneous path isolation should be due to degenerate PHI nodes.
 So rather than run the full propagators, run a specialized pass which
 only examines PHIs to discover const/copy propagation
 opportunities.  */
   NEXT_PASS (pass_phi_only_cprop);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_reassoc);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt);
-  NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_ccp);
   /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_asan);
@@ -232,20 +231,21 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_loop_prefetch);
  NEXT_PASS (pass_iv_optimize);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_tree_loop_done);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_cse_reciprocals);
   NEXT_PASS (pass_reassoc);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_dominator);
+  NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_vrp);
   /* The only const/copy propagation opportunities left after
 DOM and VRP should be due to degenerate PHI nodes.  So rather than
 run the full propagators, run a specialized pass which
 only examines PHIs to discover const/copy propagation
 opportunities.  */
   NEXT_PASS (pass_phi_only_cprop);
   NEXT_PASS (pass_cd_dce);
   NEXT_PASS (pass_tracer);
   NEXT_PASS (pass_dse);
Index: gcc/testsuite/g++.dg/tree-ssa/calloc.C
===
--- gcc/testsuite/g++.dg/tree-ssa/calloc.C  (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C  (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-tree-optimized } */
+
+typedef __SIZE_TYPE__ size_t;
+inline void* operator new(size_t, void* p) throw() { return p; }
+
+typedef void (*handler_t)(void);
+extern handler_t get_handle();
+
+inline void* operator new(size_t sz)
+{
+  void *p;
+
+  if (sz == 0)
+sz = 1;
+
+  while ((p = __builtin_malloc (sz)) == 0)
+{
+  handler_t handler = get_handle ();
+  if (! handler)
+throw 42;
+  handler ();
+}
+  return p;
+}
+
+struct vect {
+  int *start, *end;
+  vect(size_t n) {
+start = end = 0;
+if (n  (size_t)-1 / sizeof(int))
+  throw 33;
+if (n != 0)
+  start = static_castint* (operator new (n * sizeof(int)));
+end = start + n;
+int *p = start;
+for (size_t l = n; l  0; --l, ++p)
+  *p = 0;
+  }
+};
+
+void f (void *p, int n)
+{
+  new (p) vect(n);
+}
+
+/* { dg-final { scan-tree-dump-times calloc 1 optimized } } */
+/* { dg-final { scan-tree-dump-not malloc optimized } } */
+/* { dg-final { scan-tree-dump-not memset optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/testsuite/gcc.dg/strlenopt-9.c
===
--- gcc/testsuite/gcc.dg/strlenopt-9.c  (revision 211886)
+++ gcc/testsuite/gcc.dg/strlenopt-9.c  (working copy)
@@ -11,21 +11,21 @@ fn1 (int r)
  optimized away.  */
   return strchr (p, '\0');
 }
 
 __attribute__((noinline, noclone)) size_t
 fn2 (int r)
 {
   char *p, q[10];
   strcpy (q, abc);
   p = r ? a : q;
-  /* String length for p varies, therefore strlen below isn't
+  /* String length is constant for both alternatives, and strlen is
  optimized away.  */
   return strlen (p);
 }
 
 __attribute__((noinline, noclone)) size_t
 fn3 (char *p, int n)
 {
   int 

Re: [PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-23 Thread Ramana Radhakrishnan



On 23/06/14 15:01, Richard Henderson wrote:

On 06/23/2014 02:29 AM, Ramana Radhakrishnan wrote:



On 20/06/14 21:28, Richard Henderson wrote:

There aren't too many users of the cmpelim pass, and previously they were all
small embedded targets without an FPU.

I'm a bit surprised that Ramana decided to enable this pass for aarch64, as
that target is not so limited as the block comment for the pass describes.
Honestly, whatever is being deleted here ought to have been found earlier,
either via combine or cse.  We ought to find out why any changes are made
during this pass for aarch64.


Agreed - Going back and looking at my notes I remember seeing a difference in
code generation with the elimination of a number of compares that prompted me
to turn this on in a number of benchmarks. I don't remember double checking why
CSE hadn't removed that at that time. This also probably explains the
equivalent patch for ARM and Thumb2 hasn't shown demonstrable differences.

Investigating this pass for Thumb1 may be interesting.


Isn't it true that thumb1 has only adds r,r,#i not add r,r,#i?
Lack of an addition that doesn't clobber the flags is one of the two
reasons why you'd want to enable the cmpelim pass.



Agreed, this is why cmpelim looks interesting for Thumb1. (We may need 
another hook or something to disable it in configurations we don't need 
it in, but you know ... )




Although for thumb1, can't you achieve a no-clobber addition with an
IT block with an always condition?


Except that IT instructions aren't in Thumb1 or indeed v6-m :(.



Seems like a good use for the addptr
pattern that Andreas Krebbel recently added for s390.  That might let you
make thumb1 look more like the rest of the arm port and emit separate compare
and branch instructions from the start.




regards
Ramana



r~



Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-06-23 Thread James Greenhalgh
On Mon, Jun 23, 2014 at 01:53:28PM +0100, Kyrill Tkachov wrote:
 On 19/06/14 14:12, James Greenhalgh wrote:
  diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
  index 
  266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4
   100644
  --- a/gcc/config/aarch64/aarch64.md
  +++ b/gcc/config/aarch64/aarch64.md
  @@ -1068,16 +1068,17 @@ (define_expand addmode3

(define_insn *addsi3_aarch64
  [(set
  -(match_operand:SI 0 register_operand =rk,rk,rk)
  +(match_operand:SI 0 register_operand =rk,rk,w,rk)
(plus:SI
  - (match_operand:SI 1 register_operand %rk,rk,rk)
  - (match_operand:SI 2 aarch64_plus_operand I,r,J)))]
  + (match_operand:SI 1 register_operand %rk,rk,w,rk)
  + (match_operand:SI 2 aarch64_plus_operand I,r,w,J)))]
  
  @
  add\\t%w0, %w1, %2
  add\\t%w0, %w1, %w2
  +  add\\t%0.2s, %1.2s, %2.2s
  sub\\t%w0, %w1, #%n2
  -  [(set_attr type alu_imm,alu_reg,alu_imm)]
  +  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)]
)

 Minor nit, you should set the simd attribute to yes for the added 
 alternative to make sure it doesn't get selected when !TARGET_SIMD

Hi Kyrill,

Thanks, you are of course correct. I wouldn't call this a minor nit though,
if we were unlucky, it could easily break a kernel build in some subtle and
annoying way!

Good catch!

I've committed the attached as obvious as revision 211899 after testing
on aarch64-none-elf.

Thanks,
James

---
gcc/

2014-06-23  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/aarch64.md (addsi3_aarch64): Set simd attr to
yes where needed.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5f5b4ff..8705ee9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1167,7 +1167,8 @@
   add\\t%w0, %w1, %w2
   add\\t%0.2s, %1.2s, %2.2s
   sub\\t%w0, %w1, #%n2
-  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)]
+  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)
+   (set_attr simd *,*,yes,*)]
 )
 
 ;; zero_extend version of above

Re: calloc = malloc + memset

2014-06-23 Thread Richard Biener
On June 23, 2014 5:51:30 PM CEST, Marc Glisse marc.gli...@inria.fr wrote:
On Mon, 23 Jun 2014, Jakub Jelinek wrote:

 Ok for trunk, sorry for the delay.

Thanks. Richard has moved the passes a bit since then, but I still have

exactly one spot where the testsuite is ok :-) I need strlen to be
after 
dom (for calloc.C) and before vrp (for several strlenopt-*.c). I'll
commit 
it tomorrow if there aren't any comments on the pass placement.

But vrp does not run at -O1 - does strlenopt?

2014-06-24  Marc Glisse  marc.gli...@inria.fr

   PR tree-optimization/57742
gcc/
   * tree-ssa-strlen.c (get_string_length): Ignore malloc.
   (handle_builtin_malloc, handle_builtin_memset): New functions.
   (strlen_optimize_stmt): Call them.
   * passes.def: Move strlen after loop+dom but before vrp.
gcc/testsuite/
   * g++.dg/tree-ssa/calloc.C: New testcase.
   * gcc.dg/tree-ssa/calloc-1.c: Likewise.
   * gcc.dg/tree-ssa/calloc-2.c: Likewise.
   * gcc.dg/strlenopt-9.c: Adapt.




Re: [PATCH] Trust TREE_ADDRESSABLE

2014-06-23 Thread Jan Hubicka
 I don't like this very much.  It's fragile and it will be very hard to
 detect bugs caused by it.
 
 Please don't spread uses of the DECL_NONALIASED hack.
 
 If we are only concerned about LTO I'd rather have a in_lto_p check
 in may_be_aliased and trust TREE_ADDRESSABLE there.

I do not like it ether, but I tought it was outcome of the discussion to use
it.

I do not see how in_lto_p helps here, but we probably want to go for the
altnerative where ipa-visibility sets TREE_ADDRESSABLE for all external
variables and then we trust it unconditonally?

Honza


Re: [PATCH] Trust TREE_ADDRESSABLE

2014-06-23 Thread Richard Biener
On June 23, 2014 6:15:10 PM CEST, Jan Hubicka hubi...@ucw.cz wrote:
 I don't like this very much.  It's fragile and it will be very hard
to
 detect bugs caused by it.
 
 Please don't spread uses of the DECL_NONALIASED hack.
 
 If we are only concerned about LTO I'd rather have a in_lto_p check
 in may_be_aliased and trust TREE_ADDRESSABLE there.

I do not like it ether, but I tought it was outcome of the discussion
to use
it.

I do not see how in_lto_p helps here, but we probably want to go for
the

If we are sure it is correctly set in lto1 it helps.

altnerative where ipa-visibility sets TREE_ADDRESSABLE for all external
variables and then we trust it unconditonally?

That works for me, too.  But at least add a checking assert that may-be-aliased 
is not invoked before that.

Richard.



Honza




Re: calloc = malloc + memset

2014-06-23 Thread Marc Glisse

On Mon, 23 Jun 2014, Richard Biener wrote:


On June 23, 2014 5:51:30 PM CEST, Marc Glisse marc.gli...@inria.fr wrote:

On Mon, 23 Jun 2014, Jakub Jelinek wrote:


Ok for trunk, sorry for the delay.


Thanks. Richard has moved the passes a bit since then, but I still have

exactly one spot where the testsuite is ok :-) I need strlen to be
after
dom (for calloc.C) and before vrp (for several strlenopt-*.c). I'll
commit
it tomorrow if there aren't any comments on the pass placement.


But vrp does not run at -O1 - does strlenopt?


{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
{ OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },

So that's just a missed optimization at -Os, I guess.

--
Marc Glisse


[Patch] Not very subtle fix for pr61510

2014-06-23 Thread James Greenhalgh

Hi,

pr61510 is a case where cgraphunit.c::analyze_functions can end up
dereferencing a NULL pointer. This is, to me, the obvious way to avoid
dereferencing NULL.

However, I'm not very confident that this isn't just masking some
horrible and subtle bug, so I'd like some review feedback on the patch.

Tested on aarch64-none-elf, where I was seeing the issue.

OK?

Thanks,
James

---
2014-06-19  James Greenhalgh  james.greenha...@arm.com

PR regression/61510
* cgraphunit.c (analyze_functions): Check we have an origin
node before dereferencing it.
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1b7ab33..82e5a68 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1051,7 +1051,8 @@ analyze_functions (void)
 		{
 		  struct cgraph_node *origin_node
 		  = cgraph_get_node (DECL_ABSTRACT_ORIGIN (decl));
-		  origin_node-used_as_abstract_origin = true;
+		  if (origin_node)
+		origin_node-used_as_abstract_origin = true;
 		}
 	}
 	  else

Re: [PATCH] Trust TREE_ADDRESSABLE

2014-06-23 Thread Jan Hubicka
 On June 23, 2014 6:15:10 PM CEST, Jan Hubicka hubi...@ucw.cz wrote:
  I don't like this very much.  It's fragile and it will be very hard
 to
  detect bugs caused by it.
  
  Please don't spread uses of the DECL_NONALIASED hack.
  
  If we are only concerned about LTO I'd rather have a in_lto_p check
  in may_be_aliased and trust TREE_ADDRESSABLE there.
 
 I do not like it ether, but I tought it was outcome of the discussion
 to use
 it.
 
 I do not see how in_lto_p helps here, but we probably want to go for
 the
 
 If we are sure it is correctly set in lto1 it helps.
 
 altnerative where ipa-visibility sets TREE_ADDRESSABLE for all external
 variables and then we trust it unconditonally?
 
 That works for me, too.  But at least add a checking assert that 
 may-be-aliased is not invoked before that.

OK, I suppose can check cgraph_state for that (after construction it will be
all set).  Will prepapre patch tonight.

Honza
 
 Richard.
 
 
 
 Honza
 


Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Jeff Law

On 06/23/14 08:32, Richard Biener wrote:

On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson r...@redhat.com wrote:

On 06/20/2014 02:59 PM, Kai Tietz wrote:

So I suggest following change of passes.def:

Index: passes.def
===
--- passes.def  (Revision 211850)
+++ passes.def  (Arbeitskopie)
@@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_rtl_dse2);
   NEXT_PASS (pass_stack_adjustments);
   NEXT_PASS (pass_jump2);
- NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_if_after_reload);
   NEXT_PASS (pass_regrename);
   NEXT_PASS (pass_cprop_hardreg);
@@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fast_rtl_dce);
   NEXT_PASS (pass_duplicate_computed_gotos);
   NEXT_PASS (pass_reorder_blocks);
+ NEXT_PASS (pass_peephole2);
   NEXT_PASS (pass_branch_target_load_optimize2);
   NEXT_PASS (pass_leaf_regs);
   NEXT_PASS (pass_split_before_sched2);


Looks good to me.  I guess just keep an eye out for bug reports for other ports.


Maybe put a comment here because it looks like a random placement to me
which would be obvious to revert.  peepholing before if-after-reload sounds
good anyway.

Definitely need a comment on the pass placement.


Btw, there is now no DCE after peephole2?  Is peephole2 expected to
cleanup after itself?
There were cases where we wanted to change the insns we would output to 
fit into the 4:1:1 issue model of the PPro, but to do so we needed to 
know what registers were live/dead so that we could rewrite the insns 
appropriately.  It didn't fit well into what we could do in the 
splitters and the old peephole ran too late.  Dead code wasn't ever 
really considered.  At least that's my recollection.  RTH might recall more.


I think it'd be worth an experiment here, but I think that can/should 
happen independently of Kai's patch.  Arguably the scheduler should have 
all the necessary dataflow information to quickly identify any dead code.


Jeff


Re: [PATCH] Fix forwporp pattern (T)(P + A) - (T)P - (T)A

2014-06-23 Thread Jeff Law

On 06/22/14 01:14, Bernd Edlinger wrote:

Hi,

I noticed that several testcases in the GMP-4.3.2 test suite are failing now 
which
did not happen with GCC 4.9.0.  I debugged the first one, mpz/convert, and found
the file mpn/generic/get_str.c was miscompiled.
It's interesting you stumbled on this.  I've seen this issue come and go 
as well in some of my integration testing work with the trunk.



Jeff


Re: [PATCH] Fix up -march=native handling under KVM (PR target/61570)

2014-06-23 Thread H.J. Lu
On Sun, Jun 22, 2014 at 11:23 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sat, Jun 21, 2014 at 8:07 PM, Jakub Jelinek ja...@redhat.com wrote:

  --- gcc/config/i386/driver-i386.c.jj2014-05-14 14:45:54.0 
  +0200
  +++ gcc/config/i386/driver-i386.c   2014-06-20 18:59:57.805006358 
  +0200
  @@ -745,6 +745,11 @@ const char *host_detect_local_cpu (int a
  /* Assume Core 2.  */
  cpu = core2;
  }
  + else if (has_longmode)
  +   /* Perhaps some emulator?  Assume x86-64, otherwise gcc
  +  -march=native would be unusable for 64-bit 
  compilations,
  +  as all the CPUs below are 32-bit only.  */
  +   cpu = x86-64;
else if (has_sse3)
  /* It is Core Duo.  */
  cpu = pentium-m;
 
  Jakub

 host_detect_local_cpu guesses the cpu based on the real processors.
 It doesn't work with emulators due to some conflicts.  This isn't the
 only only place which has the same issue.   I prefer something like
 this.

 I'm fine with your patch too.  Let's wait what Uros (or other i?86
 maintainers) pick up.

 This looks OK to me.

 Thanks,
 Uros.

This is what I checked in.

Thanks.

-- 
H.J.
---
Index: ChangeLog
===
--- ChangeLog (revision 211900)
+++ ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2014-06-23  H.J. Lu  hongjiu...@intel.com
+
+ PR target/61570
+ * config/i386/driver-i386.c (host_detect_local_cpu): Set arch
+ to x86-64 if a 32-bit processor supports SSE2 and 64-bit.
+
 2014-06-23  James Greenhalgh  james.greenha...@arm.com

  * config/aarch64/aarch64.md (addsi3_aarch64): Set simd attr to
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c (revision 211900)
+++ config/i386/driver-i386.c (working copy)
@@ -415,6 +415,7 @@ const char *host_detect_local_cpu (int a
   bool arch;

   unsigned int l2sizekb = 0;
+  unsigned int arch_64bit = 1;

   if (argc  1)
 return NULL;
@@ -656,11 +657,14 @@ const char *host_detect_local_cpu (int a
 {
 case PROCESSOR_I386:
   /* Default.  */
+  arch_64bit = 0;
   break;
 case PROCESSOR_I486:
+  arch_64bit = 0;
   cpu = i486;
   break;
 case PROCESSOR_PENTIUM:
+  arch_64bit = 0;
   if (arch  has_mmx)
  cpu = pentium-mmx;
   else
@@ -745,21 +749,25 @@ const char *host_detect_local_cpu (int a
 /* Assume Core 2.  */
 cpu = core2;
  }
-  else if (has_sse3)
- /* It is Core Duo.  */
- cpu = pentium-m;
-  else if (has_sse2)
- /* It is Pentium M.  */
- cpu = pentium-m;
-  else if (has_sse)
- /* It is Pentium III.  */
- cpu = pentium3;
-  else if (has_mmx)
- /* It is Pentium II.  */
- cpu = pentium2;
   else
- /* Default to Pentium Pro.  */
- cpu = pentiumpro;
+ {
+  arch_64bit = 0;
+  if (has_sse3)
+/* It is Core Duo.  */
+cpu = pentium-m;
+  else if (has_sse2)
+/* It is Pentium M.  */
+cpu = pentium-m;
+  else if (has_sse)
+/* It is Pentium III.  */
+cpu = pentium3;
+  else if (has_mmx)
+/* It is Pentium II.  */
+cpu = pentium2;
+  else
+/* Default to Pentium Pro.  */
+cpu = pentiumpro;
+ }
 }
   else
 /* For -mtune, we default to -mtune=generic.  */
@@ -773,21 +781,30 @@ const char *host_detect_local_cpu (int a
   if (has_longmode)
 cpu = nocona;
   else
-cpu = prescott;
+{
+  cpu = prescott;
+  arch_64bit = 0;
+}
  }
   else
- cpu = pentium4;
+ {
+  cpu = pentium4;
+  arch_64bit = 0;
+ }
   break;
 case PROCESSOR_GEODE:
+  arch_64bit = 0;
   cpu = geode;
   break;
 case PROCESSOR_K6:
+  arch_64bit = 0;
   if (arch  has_3dnow)
  cpu = k6-3;
   else
  cpu = k6;
   break;
 case PROCESSOR_ATHLON:
+  arch_64bit = 0;
   if (arch  has_sse)
  cpu = athlon-4;
   else
@@ -896,6 +913,10 @@ const char *host_detect_local_cpu (int a
   const char *xsavec = has_xsavec ?  -mxsavec :  -mno-xsavec;
   const char *xsaves = has_xsaves ?  -mxsaves :  -mno-xsaves;

+  /* Assume x86-64 if a 32-bit processor supports SSE2 and 64-bit.  */
+  if (arch_64bit == 0  has_sse2  has_longmode)
+ cpu = x86-64;
+
   options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3,
  sse4a, cx16, sahf, movbe, aes, sha, pclmul,
  popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2,


Re: [PATCH] DWARFv5 Emit DW_TAG_atomic_type.

2014-06-23 Thread Tom Tromey
 Mark == Mark Wielaard m...@redhat.com writes:

Mark The following is just a prototype to try out a new qualifier type tag
Mark proposed for DWARFv5. There is not even a draft yet of DWARFv5, so this
Mark is just based on a proposal that might or might not be adopted and/or
Mark changed http://dwarfstd.org/ShowIssue.php?issue=131112.1

Mark gcc/ChangeLog

Mark   * dwarf2out.h (enum dw_mod_flag): Add dw_mod_atomic.
Mark   * dwarf2out.c (dw_mod_decl_flags): Handle TYPE_ATOMIC.
Mark   (dw_mod_type_flags): Likewise.
Mark   (dw_mods_to_quals): Likewise.
Mark   (dw_mod_qualified_type): Likewise.
Mark   (modified_type_die): Likewise.
Mark   opts.c (common_handle_option): Accept -gdwarf-5.

The ChangeLog should mention PR debug/60782 -- I saw you did this in the
restrict patch so you probably just weren't aware of this one...

Tom


Re: [Patch] Not very subtle fix for pr61510

2014-06-23 Thread Richard Biener
On Mon, 23 Jun 2014, James Greenhalgh wrote:

 
 Hi,
 
 pr61510 is a case where cgraphunit.c::analyze_functions can end up
 dereferencing a NULL pointer. This is, to me, the obvious way to avoid
 dereferencing NULL.
 
 However, I'm not very confident that this isn't just masking some
 horrible and subtle bug, so I'd like some review feedback on the patch.
 
 Tested on aarch64-none-elf, where I was seeing the issue.
 
 OK?

Obvious in some sense to me, too, but I wonder why we don't have a cgraph 
node
for it and what happens if it is created later (and then doesn't
have the flag set)?

Honza?

Richard.

 Thanks,
 James
 
 ---
 2014-06-19  James Greenhalgh  james.greenha...@arm.com
 
   PR regression/61510
   * cgraphunit.c (analyze_functions): Check we have an origin
   node before dereferencing it.
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [GSoC][match-and-simplify] mark some more operators as commutative

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 4:38 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Jun 23, 2014 at 4:23 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 23 Jun 2014, Richard Biener wrote:

 On Mon, Jun 23, 2014 at 3:32 PM, Prathamesh Kulkarni
 bilbotheelffri...@gmail.com wrote:

 * match.pd: Mark operators in some bitwise and plus-minus
 patterns to be commutative.


 /* A - (A +- B) - -+ B */
 (match_and_simplify
 -  (minus @0 (plus @0 @1))
 +  (minus @0 (plus:c @0 @1))
   (negate @0))

 seems pointless


 Why? a-(a+b) and a-(b+a) are both wanted and don't appear elsewhere in the
 file, no? Should simplify to (negate @1) though.

 Ah, indeed.  So here commutation doesn't work because of correctness.

Or rather the pattern is broken from the start ... fixed.

Richard.

 Richard.

 --
 Marc Glisse


Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Richard Henderson
On 06/23/2014 09:22 AM, Jeff Law wrote:
 On 06/23/14 08:32, Richard Biener wrote:
 Btw, there is now no DCE after peephole2?  Is peephole2 expected to
 cleanup after itself?
 There were cases where we wanted to change the insns we would output to fit
 into the 4:1:1 issue model of the PPro, but to do so we needed to know what
 registers were live/dead so that we could rewrite the insns appropriately.  It
 didn't fit well into what we could do in the splitters and the old peephole 
 ran
 too late.  Dead code wasn't ever really considered.  At least that's my
 recollection.  RTH might recall more.

Yes, peep2 was about doing what the old peep1 rtl-text transformation did,
but as an rtl-rtl transformation so we can expose the result to the scheduler.

It's expected that all dead code be gone before sched2, so that the scheduler
sees exactly what needs to be scheduled, and can bundle insns appropriately.

I believe the peep2 pass to also want dead code to be gone, so that it gets an
accurate picture of what registers are live or dead at any point.  As far as I
know, there are no current transformations that create new garbage.



r~


Re: calloc = malloc + memset

2014-06-23 Thread Richard Biener
On Mon, Jun 23, 2014 at 6:19 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 23 Jun 2014, Richard Biener wrote:

 On June 23, 2014 5:51:30 PM CEST, Marc Glisse marc.gli...@inria.fr
 wrote:

 On Mon, 23 Jun 2014, Jakub Jelinek wrote:

 Ok for trunk, sorry for the delay.


 Thanks. Richard has moved the passes a bit since then, but I still have

 exactly one spot where the testsuite is ok :-) I need strlen to be
 after
 dom (for calloc.C) and before vrp (for several strlenopt-*.c). I'll
 commit
 it tomorrow if there aren't any comments on the pass placement.


 But vrp does not run at -O1 - does strlenopt?


 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },

 So that's just a missed optimization at -Os, I guess.

Ok, that's fine (not sure why we restrict all of strilenopt instead of
just those
transforms that are harmful for -Os).

Richard.

 --
 Marc Glisse


Re: Another AIX Bootstrap failure

2014-06-23 Thread Dominique Dhumieres
The tests gcc.dg/globalalias-2.c and gcc.dg/localalias-2.c fail on darwin with

/opt/gcc/work/gcc/testsuite/gcc.dg/globalalias-2.c:20:2: warning: alias 
definitions not supported in Mach-O; ignored

I think they should be protected by

/* { dg-require-alias  } */

Dominique


Re: [PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group

2014-06-23 Thread Richard Biener
On Wed, 18 Jun 2014, Jeff Law wrote:

 On 06/18/14 04:42, Richard Biener wrote:
  
  The following aims at reducing the number of pointless passes we run
  on functions containing no loops.  Those are at least two copyprop
  and one dce pass (two dce passes when vectorization is enabled,
  three dce passes and an additional copyprop pass when any graphite
  optimization is enabled).
  
  Simply gating pass_tree_loop on number_of_loops ()  1 would disable
  basic-block vectorization on loopless functions.  Moving
  basic-block vectorization out of pass_tree_loop works to the extent
  that you'd need to move IVOPTs as well as data-ref analysis cannot
  cope with TARGET_MEM_REFs.
  
  So the following introduces a pass_tree_no_loop pass group which
  is enabled whenever the pass_tree_loop group is disabled.
  As followup this would allow to skip cleanup work we do after the loop
  pipeline just to cleanup after it.
  
  Any comments?  Does such followup sound realistic or would it be
  better to take the opportunity to move IVOPTs a bit closer to
  RTL expansion and avoid that pass_tree_no_loop hack?
 Sounds good.  I've always believed that each pass should be bubbling back up
 some kind of status about what it did/found as well.
 
 It was more of an RTL issue, but we had a certain commercial testsuite which
 created large loopless tests (*) that consumed vast quantities of wall clock
 time.  I always wanted the RTL loop passes to signal back to toplev.c that no
 loops were found, which would in turn be used to say we really don't need
 cse-after-loop and friends.
 It's certainly more complex these days, but I'd still like to be able to do
 such things.
 
 Regardless, that's well outside the scope of what you're trying to accomplish.

Not really - it's just the start of a series of (possible) patches.  I
want to cut down the post-loop pass pipeline for functions without loops.

But indeed we may want to apply the same trick to RTL.

The following is what I ended up applying (ugh, we need a more 
elegant solution for those dump naming / cleanup issues).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-06-23  Richard Biener  rguent...@suse.de

* tree-ssa-loop.c (gate_loop): New function.
(pass_tree_loop::gate): Call it.
(pass_data_tree_no_loop, pass_tree_no_loop,
make_pass_tree_no_loop): New.
* tree-vectorizer.c: Include tree-scalar-evolution.c
(pass_slp_vectorize::execute): Initialize loops and SCEV if
required.
(pass_slp_vectorize::clone): New method.
* timevar.def (TV_TREE_NOLOOP): New.
* tree-pass.h (make_pass_tree_no_loop): Declare.
* passes.def (pass_tree_no_loop): New pass group with
SLP vectorizer.

* g++.dg/vect/slp-pr50413.cc: Scan and cleanup appropriate SLP dumps.
* g++.dg/vect/slp-pr50819.cc: Likewise.
* g++.dg/vect/slp-pr56812.cc: Likewise.
* gcc.dg/vect/bb-slp-1.c: Likewise.
* gcc.dg/vect/bb-slp-10.c: Likewise.
* gcc.dg/vect/bb-slp-11.c: Likewise.
* gcc.dg/vect/bb-slp-13.c: Likewise.
* gcc.dg/vect/bb-slp-14.c: Likewise.
* gcc.dg/vect/bb-slp-15.c: Likewise.
* gcc.dg/vect/bb-slp-16.c: Likewise.
* gcc.dg/vect/bb-slp-17.c: Likewise.
* gcc.dg/vect/bb-slp-18.c: Likewise.
* gcc.dg/vect/bb-slp-19.c: Likewise.
* gcc.dg/vect/bb-slp-2.c: Likewise.
* gcc.dg/vect/bb-slp-20.c: Likewise.
* gcc.dg/vect/bb-slp-21.c: Likewise.
* gcc.dg/vect/bb-slp-22.c: Likewise.
* gcc.dg/vect/bb-slp-23.c: Likewise.
* gcc.dg/vect/bb-slp-24.c: Likewise.
* gcc.dg/vect/bb-slp-25.c: Likewise.
* gcc.dg/vect/bb-slp-26.c: Likewise.
* gcc.dg/vect/bb-slp-27.c: Likewise.
* gcc.dg/vect/bb-slp-28.c: Likewise.
* gcc.dg/vect/bb-slp-29.c: Likewise.
* gcc.dg/vect/bb-slp-3.c: Likewise.
* gcc.dg/vect/bb-slp-30.c: Likewise.
* gcc.dg/vect/bb-slp-31.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Likewise.
* gcc.dg/vect/bb-slp-4.c: Likewise.
* gcc.dg/vect/bb-slp-5.c: Likewise.
* gcc.dg/vect/bb-slp-6.c: Likewise.
* gcc.dg/vect/bb-slp-7.c: Likewise.
* gcc.dg/vect/bb-slp-8.c: Likewise.
* gcc.dg/vect/bb-slp-8a.c: Likewise.
* gcc.dg/vect/bb-slp-8b.c: Likewise.
* gcc.dg/vect/bb-slp-9.c: Likewise.
* gcc.dg/vect/bb-slp-cond-1.c: Likewise.
* gcc.dg/vect/bb-slp-pattern-1.c: Likewise.
* gcc.dg/vect/bb-slp-pattern-2.c: Likewise.
* gcc.dg/vect/fast-math-bb-slp-call-1.c: Likewise.
* gcc.dg/vect/fast-math-bb-slp-call-2.c: Likewise.
* gcc.dg/vect/fast-math-bb-slp-call-3.c: Likewise.
* gcc.dg/vect/no-tree-reassoc-bb-slp-12.c: Likewise.
* gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Likewise.
* gcc.dg/vect/pr26359.c: Likewise.
* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c: Likewise.

Index: 

Re: [PATCH] Don't segv on __atomic_store (PR c/61553)

2014-06-23 Thread Joseph S. Myers
On Mon, 23 Jun 2014, Marek Polacek wrote:

 We ICEd on the following testcase since the void type has a NULL
 TYPE_SIZE_UNIT.  I took Andrew's patch from gcc@ ML and added
 a testcase.
 
 Regtested/bootstrapped on x86_64-linux, ok for trunk?
 
 2014-06-23  Marek Polacek  pola...@redhat.com
   Andrew MacLeod  amacl...@redhat.com
 
   PR c/61553
   * c-common.c (get_atomic_generic_size): Don't segfault if the
   type doesn't have a size.
 
   * c-c++-common/pr61553.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 3/3] add hash_map class

2014-06-23 Thread Richard Biener
On Fri, Jun 20, 2014 at 12:52 PM,  tsaund...@mozilla.com wrote:
 From: Trevor Saunders tsaund...@mozilla.com

 Hi,

 This patch adds a hash_map class so we can consolidate the boiler plate around
 using hash_table as a map, it also allows us to get rid of pointer_map which I
 do in this patch by converting its users to hash_map.

 bootstrapped + regtested without regression on x86_64-unknown-linux-gnu, ok?

Ok.

Thanks,
Richard.

 Trev

 gcc/

 * alloc-pool.c (alloc_pool_hash): Use hash_map instead of hash_table.
 * dominance.c (iterate_fix_dominators): Use hash_map instead of
 pointer_map.
 * hash-map.h: New file.
 * ipa-comdats.c: Use hash_map instead of pointer_map.
 * lto-section-out.c: Adjust.
 * lto-streamer.h: Replace pointer_map with hash_map.
 * symtab.c (verify_symtab): Likewise.
 * tree-ssa-strlen.c (decl_to_stridxlist_htab): Likewise.
 * tree-ssa-uncprop.c (val_ssa_equiv): Likewise.
 * tree-streamer.h: Likewise.
 * tree-streamer.c: Adjust.
 * pointer-set.h: Remove pointer_map.

 lto/

 * lto.c (canonical_type_hash_cache): Use hash_map instead of
 pointer_map.

 diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c
 index 49209ee..0d31835 100644
 --- a/gcc/alloc-pool.c
 +++ b/gcc/alloc-pool.c
 @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
  #include system.h
  #include alloc-pool.h
  #include hash-table.h
 +#include hash-map.h

  #define align_eight(x) (((x+7)  3)  3)

 @@ -69,7 +70,6 @@ static ALLOC_POOL_ID_TYPE last_id;
 size for that pool.  */
  struct alloc_pool_descriptor
  {
 -  const char *name;
/* Number of pools allocated.  */
unsigned long created;
/* Gross allocated storage.  */
 @@ -82,48 +82,17 @@ struct alloc_pool_descriptor
int elt_size;
  };

 -/* Hashtable helpers.  */
 -struct alloc_pool_hasher : typed_noop_remove alloc_pool_descriptor
 -{
 -  typedef alloc_pool_descriptor value_type;
 -  typedef char compare_type;
 -  static inline hashval_t hash (const alloc_pool_descriptor *);
 -  static inline bool equal (const value_type *, const compare_type *);
 -};
 -
 -inline hashval_t
 -alloc_pool_hasher::hash (const value_type *d)
 -{
 -  return htab_hash_pointer (d-name);
 -}
 -
 -inline bool
 -alloc_pool_hasher::equal (const value_type *d,
 -  const compare_type *p2)
 -{
 -  return d-name == p2;
 -}
 -
  /* Hashtable mapping alloc_pool names to descriptors.  */
 -static hash_tablealloc_pool_hasher *alloc_pool_hash;
 +static hash_mapconst char *, alloc_pool_descriptor *alloc_pool_hash;

  /* For given name, return descriptor, create new if needed.  */
  static struct alloc_pool_descriptor *
  allocate_pool_descriptor (const char *name)
  {
 -  struct alloc_pool_descriptor **slot;
 -
if (!alloc_pool_hash)
 -alloc_pool_hash = new hash_tablealloc_pool_hasher (10);
 -
 -  slot = alloc_pool_hash-find_slot_with_hash (name,
 -  htab_hash_pointer (name),
 -  INSERT);
 -  if (*slot)
 -return *slot;
 -  *slot = XCNEW (struct alloc_pool_descriptor);
 -  (*slot)-name = name;
 -  return *slot;
 +alloc_pool_hash = new hash_mapconst char *, alloc_pool_descriptor (10);
 +
 +  return alloc_pool_hash-get_or_insert (name);
  }

  /* Create a pool of things of size SIZE, with NUM in each block we
 @@ -375,23 +344,22 @@ struct output_info
unsigned long total_allocated;
  };

 -/* Called via hash_table.traverse.  Output alloc_pool descriptor pointed out 
 by
 +/* Called via hash_map.traverse.  Output alloc_pool descriptor pointed out by
 SLOT and update statistics.  */
 -int
 -print_alloc_pool_statistics (alloc_pool_descriptor **slot,
 +bool
 +print_alloc_pool_statistics (const char *const name,
 +const alloc_pool_descriptor d,
  struct output_info *i)
  {
 -  struct alloc_pool_descriptor *d = *slot;
 -
 -  if (d-allocated)
 +  if (d.allocated)
  {
fprintf (stderr,
%-22s %6d %10lu %10lu(%10lu) %10lu(%10lu) %10lu(%10lu)\n,
 -  d-name, d-elt_size, d-created, d-allocated,
 -  d-allocated / d-elt_size, d-peak, d-peak / d-elt_size,
 -  d-current, d-current / d-elt_size);
 -  i-total_allocated += d-allocated;
 -  i-total_created += d-created;
 +  name, d.elt_size, d.created, d.allocated,
 +  d.allocated / d.elt_size, d.peak, d.peak / d.elt_size,
 +  d.current, d.current / d.elt_size);
 +  i-total_allocated += d.allocated;
 +  i-total_created += d.created;
  }
return 1;
  }
 diff --git a/gcc/dominance.c b/gcc/dominance.c
 index 7adec4f..be0a439 100644
 --- a/gcc/dominance.c
 +++ b/gcc/dominance.c
 @@ -43,6 +43,7 @@
  #include diagnostic-core.h
  #include et-forest.h
  #include timevar.h
 +#include hash-map.h
  #include pointer-set.h
  #include 

Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen

2014-06-23 Thread Maxim Ostapenko

Hi,

when I applied this patch (r211846), I made a little mistake in output 
test patterns. This patch fixes this.


Tested on x86_64-unknown-linux-gnu.

Ok to commit?

-Maxim
gcc/testsuite/ChangeLog:

2014-06-23  Max Ostapenko  m.ostape...@partner.samsung.com

	* c-c++-common/asan/strlen-overflow-1.c: Change match patterns.

diff --git a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
index bf6bf66..f58f554 100644
--- a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
@@ -26,4 +26,5 @@ int main () {
 }
 
 /* { dg-output READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r) } */
-/* { dg-output #0 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*strlen-overflow-1.c:24|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */
+/* { dg-output #0 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*strlen-overflow-1.c:25|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */
+/* { dg-output \[^\n\r]*0x\[0-9a-f\]+ is located 1 bytes inside of global variable } */


Re: [PATCH 2/N] allow storing values directly in hash tables

2014-06-23 Thread Richard Biener
On Fri, Jun 20, 2014 at 12:52 PM,  tsaund...@mozilla.com wrote:
 From: Trevor Saunders tsaund...@mozilla.com

 Hi,

 this patch allows you to define the type the hash table stores as elements
 instead of the type elements point at by having your hash descriptor define 
 the
 type store_values_directly.  It turns out trying to implement both cases with
 the same code is really confusing, so I ended up providing one partial
 specialization for each case.  Its a lot of coppying, but I'm hoping the next
 patch will get rid of many direct users of hash_table, and the rest can all 
 get
 converted to tell the hash table the type entries should have at which point
 the dupplication can be removed.

 bootstrapped + regtested without regression on x86_64-unknown-linux-gnu, ok?

Ok.  I hope that de-duplication works out.

Thanks,
Richard.

 Trev

 gcc/

 * hash-table.h: Add a template arg to choose between storing values
 and storing pointers to values, and then provide partial
 specializations for both.
 * tree-browser.c (tree_upper_hasher): Provide the type the hash table
 should store, not the type values should point to.
 * tree-into-ssa.c (var_info_hasher): Likewise.
 * tree-ssa-dom.c (expr_elt_hasher): Likewise.
 * tree-complex.c: Adjust.
 * tree-hasher.h (int_tree_hasher): store int_tree_map in the hash
 table instead of int_tree_map *.
 * tree-parloops.c: Adjust.
 * tree-ssa-reassoc.c (ocount_hasher): Don't lie to hash_map about what
 type is being stored.
 * tree-vectorizer.c: Adjust.

 diff --git a/gcc/hash-table.h b/gcc/hash-table.h
 index 41cc19e..22af12f 100644
 --- a/gcc/hash-table.h
 +++ b/gcc/hash-table.h
 @@ -272,19 +272,18 @@ typed_noop_remove Type::remove (Type *p 
 ATTRIBUTE_UNUSED)
  template typename Type
  struct pointer_hash : typed_noop_remove Type
  {
 -  typedef Type value_type;
 -  typedef Type compare_type;
 +  typedef Type *value_type;
 +  typedef Type *compare_type;
 +  typedef int store_values_directly;

 -  static inline hashval_t
 -  hash (const value_type *);
 +  static inline hashval_t hash (const value_type );

 -  static inline int
 -  equal (const value_type *existing, const compare_type *candidate);
 +  static inline bool equal (const value_type existing, const compare_type 
 candidate);
  };

  template typename Type
  inline hashval_t
 -pointer_hash Type::hash (const value_type *candidate)
 +pointer_hash Type::hash (const value_type candidate)
  {
/* This is a really poor hash function, but it is what the current code 
 uses,
   so I am reusing it to avoid an additional axis in testing.  */
 @@ -292,9 +291,9 @@ pointer_hash Type::hash (const value_type *candidate)
  }

  template typename Type
 -inline int
 -pointer_hash Type::equal (const value_type *existing,
 -  const compare_type *candidate)
 +inline bool
 +pointer_hash Type::equal (const value_type existing,
 +  const compare_type candidate)
  {
return existing == candidate;
  }
 @@ -319,10 +318,147 @@ extern unsigned int hash_table_higher_prime_index 
 (unsigned long n);
  extern hashval_t hash_table_mod1 (hashval_t hash, unsigned int index);
  extern hashval_t hash_table_mod2 (hashval_t hash, unsigned int index);

 +/* The below is some template meta programming to decide if we should use the
 +   hash table partial specialization that directly stores value_type instead 
 of
 +   pointers to value_type.  If the Descriptor type defines the type
 +   Descriptor::store_values_directly then values are stored directly 
 otherwise
 +   pointers to them are stored.  */
 +templatetypename T struct notype { typedef void type; };
 +
 +templatetypename T, typename = void
 +struct storage_tester
 +{
 +  static const bool value = false;
 +};
 +
 +templatetypename T
 +struct storage_testerT, typename notypetypename
 +T::store_values_directly::type
 +{
 +  static const bool value = true;
 +};
 +
 + templatetypename Traits
 + struct has_is_deleted
 +{
 +  templatetypename U, bool (*)(U ) struct helper {};
 +  templatetypename U static char test (helperU, U::is_deleted *);
 +  templatetypename U static int test (...);
 +  static const bool value = sizeof (testTraits (0)) == sizeof (char);
 +};
 +
 +templatetypename Type, typename Traits, bool = 
 has_is_deletedTraits::value
 +struct is_deleted_helper
 +{
 +  static inline bool
 +  call (Type v)
 +  {
 +return Traits::is_deleted (v);
 +  }
 +};
 +
 +templatetypename Type, typename Traits
 +struct is_deleted_helperType *, Traits, false
 +{
 +  static inline bool
 +  call (Type *v)
 +  {
 +return v == HTAB_DELETED_ENTRY;
 +  }
 +};
 +
 + templatetypename Traits
 + struct has_is_empty
 +{
 +  templatetypename U, bool (*)(U ) struct helper {};
 +  templatetypename U static char test (helperU, U::is_empty *);
 +  templatetypename U static int test (...);
 +  static const 

  1   2   >