Re: [build] Remove crt0, mcrt0 support

2011-07-13 Thread Paolo Bonzini

On 07/12/2011 06:45 PM, Rainer Orth wrote:

+crt0.o: $(srcdir)/config/i386/netware-crt0.c
+   $(crt_commpile) $(CRTSTUFF_T_CFLAGS) -c $


Typo here.  Otherwise looks good, thanks.

Paolo


Re: [build] Move i386/crtprec to toplevel libgcc

2011-07-13 Thread Paolo Bonzini

On 07/12/2011 06:37 PM, Rainer Orth wrote:

The next easy step in toplevel libgcc migration is moving
i386/crtprec.c.  I noticed that -mpc{32, 64, 80} wasn't supported on
Solaris/x86 yet and corrected that.  The only testcase using the switch
was adapted to also do so on Darwin/x86 (which already has the support,
but didn't exercise it).

For the reasons already described, I'm not yet removing crtprec??.o from
gcc/config/i386/t-linux64 (EXTRA_MULTILIB_PARTS).

Bootstrapped without regressions on i386-pc-solaris2.11,
x86_64-unknown-linux-gnu.  Bootstrap on i386-apple-darwin9.8.0 is
currently running.

Ok for mainline?

Thanks.
Rainer


2011-07-10  Rainer Orthr...@cebitec.uni-bielefeld.de

gcc:
* config/i386/crtprec.c: Move to ../libgcc/config/i386.
* config/i386/t-crtpc: Remove.
* config/t-darwin (EXTRA_MULTILIB_PARTS): Remove.
* config.gcc (i[34567]86-*-darwin*): Remove i386/t-crtpc from
tmake_file.
(x86_64-*-darwin*): Likewise.
(i[34567]86-*-linux*): Likewise.
(x86_64-*-linux*): Likewise.

* config/i386/sol2.h (ENDFILE_SPEC): Redefine.
Handle -mpc32, -mpc64, -mpc80.

libgcc:
* config/i386/crtprec.c: New file.
* config/i386/t-crtpc: Use $(srcdir) to refer to crtprec.c.
* config.host (i[34567]86-*-darwin*): Add i386/t-crtpc to tmake_file.
Add crtprec32.o, crtprec64.o, crtprec80.o to extra_parts.
(x86_64-*-darwin*): Likewise.
(i[34567]86-*-solaris2*: Likewise.

gcc/testsuite:
* gcc.c-torture/execute/990127-2.x: Use -mpc64 on i?86-*-darwin*,
i?86-*-solaris2*, x86_64-*-darwin*, x86_64-*-solaris2*.

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1208,12 +1208,12 @@ i[34567]86-*-darwin*)
need_64bit_isa=yes
# Baseline choice for a machine that allows m64 support.
with_cpu=${with_cpu:-core2}
-   tmake_file=${tmake_file} t-slibgcc-dummy i386/t-crtpc
+   tmake_file=${tmake_file} t-slibgcc-dummy
libgcc_tm_file=$libgcc_tm_file i386/darwin-lib.h
;;
  x86_64-*-darwin*)
with_cpu=${with_cpu:-core2}
-   tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-dummy 
i386/t-crtpc
+   tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-dummy
tm_file=${tm_file} ${cpu_type}/darwin64.h
libgcc_tm_file=$libgcc_tm_file i386/darwin-lib.h
;;
@@ -1311,7 +1311,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfree
i[34567]86-*-kopensolaris*-gnu) tm_file=${tm_file} i386/gnu-user.h 
kopensolaris-gnu.h i386/kopensolaris-gnu.h ;;
i[34567]86-*-gnu*) tm_file=$tm_file i386/gnu-user.h gnu.h i386/gnu.h;;
esac
-   tmake_file=${tmake_file} i386/t-crtstuff i386/t-crtpc
+   tmake_file=${tmake_file} i386/t-crtstuff
;;
  x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | x86_64-*-knetbsd*-gnu)
tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h gnu-user.h 
glibc-stdint.h \
@@ -1323,7 +1323,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu
x86_64-*-kfreebsd*-gnu) tm_file=${tm_file} kfreebsd-gnu.h 
i386/kfreebsd-gnu64.h ;;
x86_64-*-knetbsd*-gnu) tm_file=${tm_file} knetbsd-gnu.h ;;
esac
-   tmake_file=${tmake_file} i386/t-linux64 i386/t-crtstuff i386/t-crtpc
+   tmake_file=${tmake_file} i386/t-linux64 i386/t-crtstuff
x86_multilibs=${with_multilib_list}
if test $x86_multilibs = default; then
x86_multilibs=m64,m32
diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -70,6 +70,14 @@ along with GCC; see the file COPYING3.
  #undef ASM_SPEC
  #define ASM_SPEC ASM_SPEC_BASE

+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC \
+  %{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
+   %{mpc32:crtprec32.o%s} \
+   %{mpc64:crtprec64.o%s} \
+   %{mpc80:crtprec80.o%s} \
+   crtend.o%s crtn.o%s
+
  #define SUBTARGET_CPU_EXTRA_SPECS \
{ cpp_subtarget, CPP_SUBTARGET_SPEC },  \
{ asm_cpu,   ASM_CPU_SPEC },\
diff --git a/gcc/config/i386/t-crtpc b/gcc/config/i386/t-crtpc
deleted file mode 100644
diff --git a/gcc/testsuite/gcc.c-torture/execute/990127-2.x 
b/gcc/testsuite/gcc.c-torture/execute/990127-2.x
--- a/gcc/testsuite/gcc.c-torture/execute/990127-2.x
+++ b/gcc/testsuite/gcc.c-torture/execute/990127-2.x
@@ -3,12 +3,16 @@
  # Use -mpc64 to force 80387 floating-point precision to 64 bits.  This option
  # has no effect on SSE, but it is needed in case of -m32 on x86_64 targets.

-if { [istarget i?86-*-linux*]
+if { [istarget i?86-*-darwin*]
+ || [istarget i?86-*-linux*]
   || [istarget i?86-*-kfreebsd*-gnu]
   || [istarget i?86-*-knetbsd*-gnu]
+ || [istarget i?86-*-solaris2*]
+ || [istarget x86_64-*-darwin*]
   || [istarget x86_64-*-linux*]
   || [istarget x86_64-*-kfreebsd*-gnu]
- || [istarget 

[patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Kai Tietz
Hello,

I split my old patch into 8 speparate pieces for easier review.  These patches
are a prerequist for enabling boolification of comparisons in gimplifier and
the necessary type-cast preserving in gimple from/to boolean-type.

This patch adds support to fold_truth_not_expr for one-bit precision typed
bitwise-binary and bitwise-not expressions.

ChangeLog

2011-07-13  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_truth_not_expr): Add
support for one-bit bitwise operations.

Bootstrapped and regression tested for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c   2011-07-13 07:48:29.0 +0200
+++ gcc/gcc/fold-const.c2011-07-13 08:59:36.865620200 +0200
@@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre
 case INTEGER_CST:
   return constant_boolean_node (integer_zerop (arg), type);

+case BIT_AND_EXPR:
+  if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
+return NULL_TREE;
+  if (integer_onep (TREE_OPERAND (arg, 1)))
+   return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0));
+  /* fall through */
 case TRUTH_AND_EXPR:
   loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
   loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
-  return build2_loc (loc, TRUTH_OR_EXPR, type,
+  return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR
+   : TRUTH_OR_EXPR), type,
 invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
 invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

+case BIT_IOR_EXPR:
+  if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
+return NULL_TREE;
+  /* fall through.  */
 case TRUTH_OR_EXPR:
   loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
   loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
-  return build2_loc (loc, TRUTH_AND_EXPR, type,
+  return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR
+   : TRUTH_AND_EXPR), type,
 invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
 invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

+case BIT_XOR_EXPR:
+  if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
+return NULL_TREE;
+  /* fall through.  */
 case TRUTH_XOR_EXPR:
   /* Here we can invert either operand.  We invert the first operand
 unless the second operand is a TRUTH_NOT_EXPR in which case our
@@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre
 negation of the second operand.  */

   if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR)
-   return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0),
+   return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
+  TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
+  else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR
+   TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1)
+   return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
   TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
   else
-   return build2_loc (loc, TRUTH_XOR_EXPR, type,
+   return build2_loc (loc, code, type,
   invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)),
   TREE_OPERAND (arg, 1));

@@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre
 invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
 invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

+case BIT_NOT_EXPR:
+  if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
+return NULL_TREE;
+  /* fall through */
 case TRUTH_NOT_EXPR:
   return TREE_OPERAND (arg, 0);

@@ -3158,11 +3182,6 @@ fold_truth_not_expr (location_t loc, tre
   return build1_loc (loc, TREE_CODE (arg), type,
 invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)));

-case BIT_AND_EXPR:
-  if (!integer_onep (TREE_OPERAND (arg, 1)))
-   return NULL_TREE;
-  return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0));
-
 case SAVE_EXPR:
   return build1_loc (loc, TRUTH_NOT_EXPR, type, arg);


[patch 2/8 tree-optimization]: Bitwise logic for fold_range_test and fold_truthop.

2011-07-13 Thread Kai Tietz
Hello,

This patch adds support to fold_range_test and to fold_truthop for
one-bit precision
typed bitwise-binary and bitwise-not expressions.

ChangeLog

2011-07-13  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_range_test): Add
support for one-bit bitwise operations.
(fold_truthop): Likewise.

Bootstrapped and regression tested with prior patches of this series
for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c   2011-07-13 08:07:59.0 +0200
+++ gcc/gcc/fold-const.c2011-07-13 08:59:26.117620200 +0200
@@ -4819,7 +4819,8 @@ fold_range_test (location_t loc, enum tr
 tree op0, tree op1)
 {
   int or_op = (code == TRUTH_ORIF_EXPR
-  || code == TRUTH_OR_EXPR);
+  || code == TRUTH_OR_EXPR
+  || code == BIT_IOR_EXPR);
   int in0_p, in1_p, in_p;
   tree low0, low1, low, high0, high1, high;
   bool strict_overflow_p = false;
@@ -4890,7 +4891,7 @@ fold_range_test (location_t loc, enum tr
}
 }

-  return 0;
+  return NULL_TREE;
 }
 
 /* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
@@ -5118,8 +5119,9 @@ fold_truthop (location_t loc, enum tree_
}
 }

-  code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
- ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
+  if (code != BIT_AND_EXPR  code != BIT_IOR_EXPR)
+code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
+   ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);

   /* If the RHS can be evaluated unconditionally and its operands are
  simple, it wins to evaluate the RHS unconditionally on machines
@@ -5134,7 +5136,7 @@ fold_truthop (location_t loc, enum tree_
simple_operand_p (rr_arg))
 {
   /* Convert (a != 0) || (b != 0) into (a | b) != 0.  */
-  if (code == TRUTH_OR_EXPR
+  if ((code == TRUTH_OR_EXPR || code == BIT_IOR_EXPR)
   lcode == NE_EXPR  integer_zerop (lr_arg)
   rcode == NE_EXPR  integer_zerop (rr_arg)
   TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
@@ -5145,7 +5147,7 @@ fold_truthop (location_t loc, enum tree_
   build_int_cst (TREE_TYPE (ll_arg), 0));

   /* Convert (a == 0)  (b == 0) into (a | b) == 0.  */
-  if (code == TRUTH_AND_EXPR
+  if ((code == TRUTH_AND_EXPR || code == BIT_AND_EXPR)
   lcode == EQ_EXPR  integer_zerop (lr_arg)
   rcode == EQ_EXPR  integer_zerop (rr_arg)
   TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
@@ -5209,7 +5211,8 @@ fold_truthop (location_t loc, enum tree_
  fail.  However, we can convert a one-bit comparison against zero into
  the opposite comparison against that bit being set in the field.  */

-  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
+  wanted_code = ((code == TRUTH_AND_EXPR
+ || code == BIT_AND_EXPR) ? EQ_EXPR : NE_EXPR);
   if (lcode != wanted_code)
 {
   if (l_const  integer_zerop (l_const)  integer_pow2p (ll_mask))


[patch 3/8 tree-optimization]: Bitwise logic for fold_truth_andor.

2011-07-13 Thread Kai Tietz
Hello,

This patch adds support to fold_truth_andor for one-bit precision
typed bitwise-binary and bitwise-not expressions.

ChangeLog

2011-07-13  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_truth_andor): Add
support for one-bit bitwise operations.

Bootstrapped and regression tested with prior patches of this series
for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c   2011-07-13 08:19:22.0 +0200
+++ gcc/gcc/fold-const.c2011-07-13 08:59:14.261620200 +0200
@@ -8248,6 +8248,12 @@ fold_truth_andor (location_t loc, enum t
   if (!optimize)
 return NULL_TREE;

+  /* If code is BIT_AND_EXPR or BIT_IOR_EXPR, type precision has to be
+ one.  Otherwise return NULL_TREE.  */
+  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
+   (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1))
+return NULL_TREE;
+
   /* Check for things like (A || B)  (A || C).  We can convert this
  to A || (B  C).  Note that either operator can be any of the four
  truth and/or operations and the transformation will still be
@@ -8258,7 +8264,9 @@ fold_truth_andor (location_t loc, enum t
(TREE_CODE (arg0) == TRUTH_ANDIF_EXPR
  || TREE_CODE (arg0) == TRUTH_ORIF_EXPR
  || TREE_CODE (arg0) == TRUTH_AND_EXPR
- || TREE_CODE (arg0) == TRUTH_OR_EXPR)
+ || TREE_CODE (arg0) == TRUTH_OR_EXPR
+ || TREE_CODE (arg0) == BIT_AND_EXPR
+ || TREE_CODE (arg0) == BIT_IOR_EXPR)
! TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)))
 {
   tree a00 = TREE_OPERAND (arg0, 0);
@@ -8266,9 +8274,13 @@ fold_truth_andor (location_t loc, enum t
   tree a10 = TREE_OPERAND (arg1, 0);
   tree a11 = TREE_OPERAND (arg1, 1);
   int commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR
- || TREE_CODE (arg0) == TRUTH_AND_EXPR)
+ || TREE_CODE (arg0) == TRUTH_AND_EXPR
+ || TREE_CODE (arg0) == BIT_IOR_EXPR
+ || TREE_CODE (arg0) == BIT_AND_EXPR)
  (code == TRUTH_AND_EXPR
-|| code == TRUTH_OR_EXPR));
+|| code == TRUTH_OR_EXPR
+|| code == BIT_AND_EXPR
+|| code == BIT_IOR_EXPR));

   if (operand_equal_p (a00, a10, 0))
return fold_build2_loc (loc, TREE_CODE (arg0), type, a00,
@@ -9484,21 +9496,29 @@ fold_binary_loc (location_t loc,

   if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
|| code == EQ_EXPR || code == NE_EXPR)
-   ((truth_value_p (TREE_CODE (arg0))
-   (truth_value_p (TREE_CODE (arg1))
+   ((truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))
+   (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1))
   || (TREE_CODE (arg1) == BIT_AND_EXPR
integer_onep (TREE_OPERAND (arg1, 1)
- || (truth_value_p (TREE_CODE (arg1))
-  (truth_value_p (TREE_CODE (arg0))
+ || (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1))
+  (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))
  || (TREE_CODE (arg0) == BIT_AND_EXPR
   integer_onep (TREE_OPERAND (arg0, 1)))
 {
-  tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
-: code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
-: TRUTH_XOR_EXPR,
-boolean_type_node,
-fold_convert_loc (loc, boolean_type_node, arg0),
-fold_convert_loc (loc, boolean_type_node, arg1));
+  enum tree_code ncode;
+
+  /* Do we operate on a non-boolified tree?  */
+  if (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1)
+ncode = code == BIT_AND_EXPR ? TRUTH_AND_EXPR
+: (code == BIT_IOR_EXPR
+   ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR);
+  else
+ncode = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR) ? code
+  : BIT_XOR_EXPR;
+  tem = fold_build2_loc (loc, ncode,
+  boolean_type_node,
+  fold_convert_loc (loc, boolean_type_node, arg0),
+  fold_convert_loc (loc, boolean_type_node, arg1));

   if (code == EQ_EXPR)
tem = invert_truthvalue_loc (loc, tem);


[patch 6/8 tree-optimization]: Bitwise and logic for fold_binary_loc.

2011-07-13 Thread Kai Tietz
Hello,

This patch adds support to fold_binary_loc for one-bit precision
typed bitwise-and expression.

ChangeLog

2011-07-13  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_binary_loc): Add
support for one-bit bitwise-and optimizeation.

Bootstrapped and regression tested with prior patches of this series
for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c   2011-07-13 08:43:37.0 +0200
+++ gcc/gcc/fold-const.c2011-07-13 08:58:38.692620200 +0200
@@ -11062,6 +11062,48 @@ fold_binary_loc (location_t loc,
   if (operand_equal_p (arg0, arg1, 0))
return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));

+  if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
+{
+ if (TREE_CODE (arg0) == INTEGER_CST  ! integer_zerop (arg0))
+   return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
+ if (TREE_CODE (arg1) == INTEGER_CST  ! integer_zerop (arg1))
+   return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
+ /* Likewise for first arg.  */
+ if (integer_zerop (arg0))
+   return omit_one_operand_loc (loc, type, arg0, arg1);
+
+ /* !X  X is always false.  ~X  X is always false.  */
+ if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
+  || TREE_CODE (arg0) == BIT_NOT_EXPR)
+  operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
+   return omit_one_operand_loc (loc, type, integer_zero_node, arg1);
+ /* X  !X is always false.  X  ~X is always false.  */
+ if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
+  || TREE_CODE (arg1) == BIT_NOT_EXPR)
+  operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
+   return omit_one_operand_loc (loc, type, integer_zero_node, arg0);
+
+ /* (A  X)  (A + 1  Y) == (A  X)  (A = Y).  Normally
+A + 1  Y means (A = Y)  (A != MAX), but in this case
+we know that A  X = MAX.  */
+
+ if (!TREE_SIDE_EFFECTS (arg0)  !TREE_SIDE_EFFECTS (arg1))
+   {
+ tem = fold_to_nonsharp_ineq_using_bound (loc, arg0, arg1);
+ if (tem  !operand_equal_p (tem, arg0, 0))
+   return fold_build2_loc (loc, code, type, tem, arg1);
+
+ tem = fold_to_nonsharp_ineq_using_bound (loc, arg1, arg0);
+ if (tem  !operand_equal_p (tem, arg1, 0))
+   return fold_build2_loc (loc, code, type, arg0, tem);
+   }
+
+ tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1);
+ if (tem)
+   return tem;
+
+   }
+
   /* ~X  X, (X == 0)  X, and !X  X are always zero.  */
   if ((TREE_CODE (arg0) == BIT_NOT_EXPR
   || TREE_CODE (arg0) == TRUTH_NOT_EXPR


[patch 7/8 tree-optimization]: Bitwise not logic for fold_unary_loc.

2011-07-13 Thread Kai Tietz
Hello,

This patch adds support to fold_unary_loc for one-bit precision
typed bitwise-not expression.

ChangeLog

2011-07-13  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_unary_loc): Add
support for one-bit bitwise-not optimizeation.

Bootstrapped and regression tested with prior patches of this series
for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c   2011-07-13 08:49:50.0 +0200
+++ gcc/gcc/fold-const.c2011-07-13 08:56:45.170171300 +0200
@@ -8094,6 +8094,12 @@ fold_unary_loc (location_t loc, enum tre
  if (i == count)
return build_vector (type, nreverse (list));
}
+  if (INTEGRAL_TYPE_P (type)  TYPE_PRECISION (type) == 1)
+{
+ tem = fold_truth_not_expr (loc, arg0);
+ if (tem)
+   return fold_convert_loc (loc, type, tem);
+   }

   return NULL_TREE;


[patch 8/8 tree-optimization]: Add truth_value_type_p function

2011-07-13 Thread Kai Tietz
Hello,

This patch adds new truth_value_type_p function, which has in contrast
to truth_value_p the ability to detect also bitwise-operation with boolean
characteristics.  This patch has to be applied first for this series, but
it requires the other patches as prerequist of this series.

2011-07-13  Kai Tietz  kti...@redhat.com

(fold_ternary_loc): Use truth_value_type_p instead
of truth_value_p.
* gimple.c (canonicalize_cond_expr_cond): Likewise.
* gimplify.c (gimple_boolify): Likewise.
* tree-ssa-structalias.c (find_func_aliases): Likewise.
* tree-ssa-forwprop.c (truth_valued_ssa_name): Likewise.
* tree.h (truth_value_type_p): New function.
(truth_value_p): Implemented as macro via truth_value_type_p.

Bootstrapped and regression tested with prior patches of this series
for x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc-head/gcc/fold-const.c
===
--- gcc-head.orig/gcc/fold-const.c
+++ gcc-head/gcc/fold-const.c
@@ -13416,7 +13581,7 @@ fold_ternary_loc (location_t loc, enum t

   /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
-  if (truth_value_p (TREE_CODE (arg0))
+  if (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))
   tree_swap_operands_p (op1, op2, false))
{
  location_t loc0 = expr_location_or (arg0, loc);
@@ -13442,7 +13607,7 @@ fold_ternary_loc (location_t loc, enum t
 over COND_EXPR in cases such as floating point comparisons.  */
   if (integer_zerop (op1)
   integer_onep (op2)
-  truth_value_p (TREE_CODE (arg0)))
+  truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)))
return pedantic_non_lvalue_loc (loc,
fold_convert_loc (loc, type,
  invert_truthvalue_loc (loc,
Index: gcc-head/gcc/gimple.c
===
--- gcc-head.orig/gcc/gimple.c
+++ gcc-head/gcc/gimple.c
@@ -3160,7 +3160,8 @@ canonicalize_cond_expr_cond (tree t)
 {
   /* Strip conversions around boolean operations.  */
   if (CONVERT_EXPR_P (t)
-   truth_value_p (TREE_CODE (TREE_OPERAND (t, 0
+   truth_value_type_p (TREE_CODE (TREE_OPERAND (t, 0)),
+TREE_TYPE (TREE_OPERAND (t, 0
 t = TREE_OPERAND (t, 0);

   /* For !x use x == 0.  */
Index: gcc-head/gcc/gimplify.c
===
--- gcc-head.orig/gcc/gimplify.c
+++ gcc-head/gcc/gimplify.c
@@ -2837,7 +2837,7 @@ gimple_boolify (tree expr)
  if (TREE_CODE (arg) == NOP_EXPR
   TREE_TYPE (arg) == TREE_TYPE (call))
arg = TREE_OPERAND (arg, 0);
- if (truth_value_p (TREE_CODE (arg)))
+ if (truth_value_type_p (TREE_CODE (arg), TREE_TYPE (arg)))
{
  arg = gimple_boolify (arg);
  CALL_EXPR_ARG (call, 0)
Index: gcc-head/gcc/tree-ssa-structalias.c
===
--- gcc-head.orig/gcc/tree-ssa-structalias.c
+++ gcc-head/gcc/tree-ssa-structalias.c
@@ -4416,7 +4416,8 @@ find_func_aliases (gimple origt)
  !POINTER_TYPE_P (TREE_TYPE (rhsop
   || gimple_assign_single_p (t))
get_constraint_for_rhs (rhsop, rhsc);
- else if (truth_value_p (code))
+ else if (truth_value_type_p (code,
+  TREE_TYPE (lhsop)))
/* Truth value results are not pointer (parts).  Or at least
   very very unreasonable obfuscation of a part.  */
;
Index: gcc-head/gcc/tree.h
===
--- gcc-head.orig/gcc/tree.h
+++ gcc-head/gcc/tree.h
@@ -5307,13 +5307,22 @@ extern tree combine_comparisons (locatio
 extern void debug_fold_checksum (const_tree);

 /* Return nonzero if CODE is a tree code that represents a truth value.  */
+#define truth_value_p(CODE)  truth_value_type_p ((CODE), NULL_TREE)
+
+/* Return nonzero if CODE is a tree code that represents a truth value.
+   If TYPE is an integral type, unsigned, and has precision of one, then
+   additionally return for bitwise-binary and bitwise-invert nonzero.  */
 static inline bool
-truth_value_p (enum tree_code code)
+truth_value_type_p (enum tree_code code, tree type)
 {
   return (TREE_CODE_CLASS (code) == tcc_comparison
  || code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
  || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
- || code == TRUTH_XOR_EXPR || code == TRUTH_NOT_EXPR);
+ || code == TRUTH_XOR_EXPR || code == TRUTH_NOT_EXPR
+ || ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
+  || code == BIT_XOR_EXPR || code == 

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
Hello William,

 However, it does not fix http://gcc.gnu.org/PR45671, which surprises me
 as it was marked as a duplicate of this one.  Any thoughts on why this
 isn't sufficient to reassociate the linear chain of adds?

 Test case:

 int myfunction (int a, int b, int c, int d, int e, int f, int g, int h)
 {
  int ret;

  ret = a + b + c + d + e + f + g + h;
  return ret;

 }




Reassociation does not work for signed integers because signed integer
is not wrap-around type in C. You can change it by passing -fwrapv
option but it will disable other useful optimization. Reassociation of
signed integers without this option is not a trivial question because
in that case you may introduce overflows and therefore undefined
behavior.

BR
Ilya


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
 Ilya, please mention PR middle-end/44382
 in ChangeLog.


Thanks for notice. Here is corrected ChangeLog:

gcc/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

   PR middle-end/44382
   * target.def (reassociation_width): New hook.

   * doc/tm.texi.in (reassociation_width): New hook documentation.

   * doc/tm.texi (reassociation_width): Likewise.

   * hooks.h (hook_int_const_gimple_1): New default hook.

   * hooks.c (hook_int_const_gimple_1): Likewise.

   * config/i386/i386.h (ix86_tune_indices): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.

   (TARGET_REASSOC_INT_TO_PARALLEL): New.
   (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

   * config/i386/i386.c (initial_ix86_tune_features): Add
   X86_TUNE_REASSOC_INT_TO_PARALLEL and
   X86_TUNE_REASSOC_FP_TO_PARALLEL.

   (ix86_reassociation_width) implementation of
   new hook for i386 target.

   * common.opt (ftree-reassoc-width): New option added.

   * tree-ssa-reassoc.c (get_required_cycles): New function.
   (get_reassociation_width): Likewise.
   (rewrite_expr_tree_parallel): Likewise.

   (reassociate_bb): Now checks reassociation width to be used
   and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
   if needed.

   (pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-12  Enkovich Ilya  ilya.enkov...@intel.com

   * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
   -ftree-reassoc-width=1.

   * gcc.dg/tree-ssa/reassoc-24.c: New test.
   * gcc.dg/tree-ssa/reassoc-25.c: Likewise.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 11:52:25AM +0400, Ilya Enkovich wrote:
  However, it does not fix http://gcc.gnu.org/PR45671, which surprises me
  as it was marked as a duplicate of this one.  Any thoughts on why this
  isn't sufficient to reassociate the linear chain of adds?
 
  Test case:
 
  int myfunction (int a, int b, int c, int d, int e, int f, int g, int h)
  {
   int ret;
 
   ret = a + b + c + d + e + f + g + h;
   return ret;
 
  }
 
 
 
 
 Reassociation does not work for signed integers because signed integer
 is not wrap-around type in C. You can change it by passing -fwrapv
 option but it will disable other useful optimization. Reassociation of
 signed integers without this option is not a trivial question because
 in that case you may introduce overflows and therefore undefined
 behavior.

Well, if it is clearly a win to reassociate, you can always reassociate
them by doing arithmetics in corresponding unsigned type and afterwards
converting back to the signed type.

Jakub


Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)

2011-07-13 Thread Richard Guenther
On Tue, 12 Jul 2011, Ulrich Weigand wrote:

 Richard Guenther wrote:
 
  2011-07-11  Richard Guenther  rguent...@suse.de
  
  * tree-vrp.c (simplify_conversion_using_ranges): Manually
  translate the source value-range through the conversion chain.
 
 This causes a build failure in cachemgr.c on spu-elf.  A slightly
 modified simplified test case also fails on i386-linux:
 
 void *
 test (unsigned long long x, unsigned long long y)
 {
   return (void *) (unsigned int) (x / y);
 }
 
 compiled with -O2 results in:
 
 test.i: In function 'test':
 test.i:3:1: error: invalid types in nop conversion
 void *
 long long unsigned int
 D.1962_5 = (void *) D.1963_3;
 
 test.i:3:1: internal compiler error: verify_gimple failed
 
 Any thoughts?

Fix in testing.

Richard.

2011-07-13  Richard Guenther  rguent...@suse.de

* tree-vrp.c (simplify_conversion_using_ranges): Make sure
the final type is integral.

* gcc.dg/torture/20110713-1.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 176224)
+++ gcc/tree-vrp.c  (working copy)
@@ -7353,6 +7353,8 @@ simplify_conversion_using_ranges (gimple
   double_int innermin, innermax, middlemin, middlemax;
 
   finaltype = TREE_TYPE (gimple_assign_lhs (stmt));
+  if (!INTEGRAL_TYPE_P (finaltype))
+return false;
   middleop = gimple_assign_rhs1 (stmt);
   def_stmt = SSA_NAME_DEF_STMT (middleop);
   if (!is_gimple_assign (def_stmt)
Index: gcc/testsuite/gcc.dg/torture/20110713-1.c
===
--- gcc/testsuite/gcc.dg/torture/20110713-1.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/20110713-1.c   (revision 0)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ilp32 } */
+
+void *
+test (unsigned long long x, unsigned long long y)
+{
+return (void *) (unsigned int) (x / y);
+}


Re: [Patch, AVR]: Fix PR49687: Better widening mul 16=8*8

2011-07-13 Thread Richard Earnshaw
On 12/07/11 12:11, Bernd Schmidt wrote:
 On 07/12/11 13:04, Andrew Stubbs wrote:
 On 12/07/11 11:35, Georg-Johann Lay wrote:
 +(define_insn *mulsu
 +  [(set (match_operand:HI 0
 register_operand =r)
 +(mult:HI (sign_extend:HI (match_operand:QI 1
 register_operand a))
 + (zero_extend:HI (match_operand:QI 2
 register_operand a]
 +  AVR_HAVE_MUL
 +  mulsu %1,%2
 +movw %0,r0
 +clr __zero_reg__
 +  [(set_attr length 3)
 +   (set_attr cc clobber)])
 +
 +(define_insn *mulus
 +  [(set (match_operand:HI 0
 register_operand =r)
 +(mult:HI (zero_extend:HI (match_operand:QI 1
 register_operand a))
 + (sign_extend:HI (match_operand:QI 2
 register_operand a]
 +  AVR_HAVE_MUL
 +  mulsu %2,%1
 +movw %0,r0
 +clr __zero_reg__
 +  [(set_attr length 3)
 +   (set_attr cc clobber)])

 1. You should name that usmulqihi3 (no star), so the optimizers can
 see it.

 2. There's no need to define both of these. For one thing, putting a '%'
 at the start of the constraint list  for operand 1 does precisely this,
 
 Unfortunately it doesn't. It won't swap the sign/zero-extend.
 
 
 Bernd
 

And what is more, zero-extending one operand and sign-extending another
is definitely not commutative, even if the outer multiply is.



Re: AVX generic mode tuning discussion.

2011-07-13 Thread Richard Guenther
On Tue, Jul 12, 2011 at 11:56 PM, Richard Henderson r...@redhat.com wrote:
 On 07/12/2011 02:22 PM, harsha.jaga...@amd.com wrote:
 We would like to propose changing AVX generic mode tuning to generate 128-bit
 AVX instead of 256-bit AVX.

 You indicate a 3% reduction on bulldozer with avx256.
 How does avx128 compare to -mno-avx -msse4.2?
 Will the next AMD generation have a useable avx256?

 I'm not keen on the idea of generic mode being tune
 for a single processor revision that maybe shouldn't
 actually be using avx at all.

Btw, it looks like the data is massively skewed by
436.cactusADM.  What are the overall numbers if you
disregard cactus?  It's also for sure the case that the vectorizer
cost model has not been touched for avx256 vs. avx128 vs. sse,
so a more sensible approach would be to look at differentiating
things there to improve the cactus numbers.  Harsha, did you
investigate why avx256 is such a loss for cactus or why it is
so much of a win for SB?

I suppose generic tuning is of less importance for AVX as
people need to enable that manually anyway (and will possibly
do so only via means of -march=native).

Thanks,
Richard.


 r~



Re: Use of vector instructions in memmov/memset expanding

2011-07-13 Thread Uros Bizjak
Hello!

 Please don't use -m32/-m64 in testcases directly.
 You should use

 /* { dg-do compile { target { ! ia32 } } } */

 for 32bit insns and

 /* { dg-do compile { target { ia32 } } } */

 for 64bit insns.

Also, there is no need to add -mtune if -march is already specified.
-mtune will follow -march.
To scan for the %xmm register, you don't have to add -dp to compile
flags. -dp will also dump pattern name to file, so unless you are
looking for specific pattern name, you should omit -dp.

Uros.


Re: PATCH: Remove -mfused-madd and add -mfma

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 3:00 AM, H.J. Lu hongjiu...@intel.com wrote:
 Hi,

 -mfused-madd is deprecated and -mfma is undocumented.  This patch
 removes -mfused-madd and documents -mfma.  OK for trunk?

Ok.

Thanks,
Richard.

 Thanks.

 H.J.
 ---
 2011-07-12  H.J. Lu  hongjiu...@intel.com

        * doc/invoke.texi (x86): Remove -mfused-madd and add -mfma.

 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
 index f146cc5..3429b31 100644
 --- a/gcc/doc/invoke.texi
 +++ b/gcc/doc/invoke.texi
 @@ -600,7 +600,7 @@ Objective-C and Objective-C++ Dialects}.
  -mincoming-stack-boundary=@var{num} @gol
  -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip -mvzeroupper @gol
  -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 --maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfused-madd @gol
 +-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma @gol
  -msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlwp @gol
  -mthreads  -mno-align-stringops  -minline-all-stringops @gol
  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
 @@ -12587,6 +12587,8 @@ preferred alignment to 
 @option{-mpreferred-stack-boundary=2}.
  @itemx -mno-rdrnd
  @itemx -mf16c
  @itemx -mno-f16c
 +@itemx -mfma
 +@itemx -mno-fma
  @itemx -msse4a
  @itemx -mno-sse4a
  @itemx -mfma4
 @@ -12612,9 +12614,9 @@ preferred alignment to 
 @option{-mpreferred-stack-boundary=2}.
  @opindex mno-sse
  @opindex m3dnow
  @opindex mno-3dnow
 -These switches enable or disable the use of instructions in the MMX,
 -SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, FSGSBASE, RDRND,
 -F16C, SSE4A, FMA4, XOP, LWP, ABM, BMI, or 3DNow!@: extended instruction sets.
 +These switches enable or disable the use of instructions in the MMX, SSE,
 +SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA,
 +SSE4A, FMA4, XOP, LWP, ABM, BMI, or 3DNow!@: extended instruction sets.
  These extensions are also available as built-in functions: see
  @ref{X86 Built-in Functions}, for details of the functions enabled and
  disabled by these switches.
 @@ -12633,13 +12635,6 @@ supported architecture, using the appropriate flags. 
  In particular,
  the file containing the CPU detection code should be compiled without
  these options.

 -@item -mfused-madd
 -@itemx -mno-fused-madd
 -@opindex mfused-madd
 -@opindex mno-fused-madd
 -Do (don't) generate code that uses the fused multiply/add or 
 multiply/subtract
 -instructions.  The default is to use these instructions.
 -
  @item -mcld
  @opindex mcld
  This option instructs GCC to emit a @code{cld} instruction in the prologue



Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 I split my old patch into 8 speparate pieces for easier review.  These patches
 are a prerequist for enabling boolification of comparisons in gimplifier and
 the necessary type-cast preserving in gimple from/to boolean-type.

 This patch adds support to fold_truth_not_expr for one-bit precision typed
 bitwise-binary and bitwise-not expressions.

It seems this is only necessary because we still have TRUTH_NOT_EXPR
in our IL and did not replace that with BIT_NOT_EXPR consistently yet.

So no, this is not ok.  fold-const.c is really mostly supposed to deal
with GENERIC where we distinguish TRUTH_* and BIT_* variants.

Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple.

Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_not_expr): Add
        support for one-bit bitwise operations.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 07:48:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:36.865620200 +0200
 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre
     case INTEGER_CST:
       return constant_boolean_node (integer_zerop (arg), type);

 +    case BIT_AND_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      if (integer_onep (TREE_OPERAND (arg, 1)))
 +       return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0));
 +      /* fall through */
     case TRUTH_AND_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_OR_EXPR, type,
 +      return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR
 +                                                   : TRUTH_OR_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_IOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_OR_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_AND_EXPR, type,
 +      return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR
 +                                                   : TRUTH_AND_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_XOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_XOR_EXPR:
       /* Here we can invert either operand.  We invert the first operand
         unless the second operand is a TRUTH_NOT_EXPR in which case our
 @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre
         negation of the second operand.  */

       if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR)
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0),
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
 +                          TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
 +      else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR
 +               TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1)
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
                           TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
       else
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type,
 +       return build2_loc (loc, code, type,
                           invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)),
                           TREE_OPERAND (arg, 1));

 @@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_NOT_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through */
     case TRUTH_NOT_EXPR:
       return TREE_OPERAND (arg, 0);

 @@ -3158,11 +3182,6 @@ fold_truth_not_expr (location_t loc, tre
       return build1_loc (loc, TREE_CODE (arg), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)));

 -    case BIT_AND_EXPR:
 -      if (!integer_onep (TREE_OPERAND (arg, 1)))
 -       return NULL_TREE;
 -      return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0));
 -
     case SAVE_EXPR:
       return build1_loc (loc, TRUTH_NOT_EXPR, type, arg);



[patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X

2011-07-13 Thread Kai Tietz
Hello,

this patch fixes that for replaced uses, we call fold_stmt_inplace. Additionally
it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 for
X with one-bit precision type.

ChangeLog gcc/

2011-07-13  Kai Tietz  kti...@redhat.com

* gimple-fold.c (fold_gimple_assign): Add normalization
for compares of 1-bit integer precision operands.
* tree-ssa-propagate.c (replace_uses_in): Call
fold_stmt_inplace on modified statement.

ChangeLog gcc/testsuite

2011-07-13  Kai Tietz  kti...@redhat.com

* gcc.dg/tree-ssa/fold-1.c: New test.

Bootstrapped and regression tested for x86_64-pc-linux-gnu.  Ok for apply?

Regards,
Kai

Index: gcc/gcc/gimple-fold.c
===
--- gcc.orig/gcc/gimple-fold.c  2011-07-13 10:37:32.0 +0200
+++ gcc/gcc/gimple-fold.c   2011-07-13 10:39:05.100843400 +0200
@@ -815,6 +815,17 @@ fold_gimple_assign (gimple_stmt_iterator
 gimple_assign_rhs2 (stmt));
}

+  if (!result  (subcode == EQ_EXPR || subcode == NE_EXPR)
+  INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+  TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) == 1
+  integer_onep (gimple_assign_rhs2 (stmt)))
+   result = build2_loc (loc, (subcode == EQ_EXPR ? NE_EXPR : EQ_EXPR),
+TREE_TYPE (gimple_assign_lhs (stmt)),
+gimple_assign_rhs1 (stmt),
+fold_convert_loc (loc,
+   TREE_TYPE (gimple_assign_rhs1 (stmt)),
+   integer_zero_node));
+   
   if (!result)
 result = fold_binary_loc (loc, subcode,
   TREE_TYPE (gimple_assign_lhs (stmt)),
Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c  2011-07-13
10:50:38.294367800 +0200
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int foo (_Bool a, _Bool b)
+{
+  return a != ((b | !b));
+}
+/* { dg-final { scan-tree-dump-not  != 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
Index: gcc/gcc/tree-ssa-propagate.c
===
--- gcc.orig/gcc/tree-ssa-propagate.c   2011-07-13 10:37:42.0 +0200
+++ gcc/gcc/tree-ssa-propagate.c2011-07-13 10:40:25.688576800 +0200
@@ -904,6 +904,8 @@ replace_uses_in (gimple stmt, ssa_prop_g

   propagate_value (use, val);

+  fold_stmt_inplace (stmt);
+
   replaced = true;
 }


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich

 Well, if it is clearly a win to reassociate, you can always reassociate
 them by doing arithmetics in corresponding unsigned type and afterwards
 converting back to the signed type.

        Jakub


You are right. But in this case we again make all operands have
wrap-around type and thus disable some other optimization. It would be
nice to have opportunity to reassociate and still have undefined
behavior on overflow for optimizations. One way to do it for add/sub
is to use wider type (long long instead of int).

Ilya


Re: AVX generic mode tuning discussion.

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 10:42:41AM +0200, Richard Guenther wrote:
 I suppose generic tuning is of less importance for AVX as
 people need to enable that manually anyway (and will possibly
 do so only via means of -march=native).

Yeah, but if somebody does compile with -mavx -mtune=generic,
I'd expect the intent is that he wants fastest code not just on current
generation of CPUs, but on the next few following ones, and I'd say that
being able to use twice as big vectorization factor ought to be a win in
most cases if the cost model gets it right.  If not for the vectorization
factor doubling, what would be reasons why somebody would compile
code with -mavx -mtune=generic and rule out support for many recent chips?
Yeah, there are the  2 operand forms and such code can avoid penalty when
mixed with AVX256 code, but would that be strong reason enough to lose the
support of most of the recent CPUs?  When targeting just a particular CPU
and using -march= with CPU which already includes AVX, -mtune=generic probably
doesn't make much sense, you probably want -march=native and you are
optimizing for the CPU you have.

Jakub


Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Kai Tietz
Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
issue is that fold-const re-inttroduces TRUTH_AND/OR and co.  To avoid
it, it needs to learn to handle 1-bit precision folding for those
bitwise-operation on 1-bit integer types special.
As gimple replies on this FE fold for now, it has to be learn about
that. As soon as gimple_fold (and other passes) don't rely anymore on
FE's fold-const, then we can remove those parts again.  Otherwise this
boolification of compares (and also the transition of TRUTH_NOT -
BIT_NOT, simply doesn't work so long.

Regards,
Kai

2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 I split my old patch into 8 speparate pieces for easier review.  These 
 patches
 are a prerequist for enabling boolification of comparisons in gimplifier and
 the necessary type-cast preserving in gimple from/to boolean-type.

 This patch adds support to fold_truth_not_expr for one-bit precision typed
 bitwise-binary and bitwise-not expressions.

 It seems this is only necessary because we still have TRUTH_NOT_EXPR
 in our IL and did not replace that with BIT_NOT_EXPR consistently yet.

 So no, this is not ok.  fold-const.c is really mostly supposed to deal
 with GENERIC where we distinguish TRUTH_* and BIT_* variants.

 Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple.

 Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_not_expr): Add
        support for one-bit bitwise operations.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 07:48:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:36.865620200 +0200
 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre
     case INTEGER_CST:
       return constant_boolean_node (integer_zerop (arg), type);

 +    case BIT_AND_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      if (integer_onep (TREE_OPERAND (arg, 1)))
 +       return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0));
 +      /* fall through */
     case TRUTH_AND_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_OR_EXPR, type,
 +      return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR
 +                                                   : TRUTH_OR_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_IOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_OR_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_AND_EXPR, type,
 +      return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR
 +                                                   : TRUTH_AND_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_XOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_XOR_EXPR:
       /* Here we can invert either operand.  We invert the first operand
         unless the second operand is a TRUTH_NOT_EXPR in which case our
 @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre
         negation of the second operand.  */

       if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR)
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0),
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
 +                          TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
 +      else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR
 +               TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1)
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
                           TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
       else
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type,
 +       return build2_loc (loc, code, type,
                           invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)),
                           TREE_OPERAND (arg, 1));

 @@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1)));

 +    case BIT_NOT_EXPR:
 +  

Re: [Patch 1/3] ARM 64 bit atomic operations

2011-07-13 Thread David Gilbert
On 12 July 2011 22:07, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:
 Hi Dave,

Hi Ramana,
  Thanks for the review.

 Could you split this further into a patch that deals with the
 case for disabling MCR memory barriers for Thumb1 so that it
 maybe backported to the release branches ? I have commented inline
 as well.

Sure.

 Could you also provide a proper changelog entry for this that will
 also help with review of the patch ?

Yep, no problem.

 I've not yet managed to fully review all the bits in this patch but
 here's some initial comments that should be looked at.

 On 1 July 2011 16:54, Dr. David Alan Gilbert david.gilb...@linaro.org wrote:
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
snip

 +      if (is_di)
 +        {
 +          arm_output_asm_insn (emit, 0, operands, it\teq);

 This should be guarded with a if (TARGET_THUMB2) - there's no point in
 accounting for the length of this instruction in the compiler and then
 have the assembler fold it away in ARM state.

OK; the length accounting seems pretty broken anyway; I think it assumes
all instructions are 4 bytes.

 diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
 index c32ef1a..3fdd22f 100644
 --- a/gcc/config/arm/arm.h
 +++ b/gcc/config/arm/arm.h
 @@ -282,7 +282,8 @@ extern void 
 (*arm_lang_output_object_attributes_hook)(void);
 -#define TARGET_HAVE_DMB_MCR    (arm_arch6k  ! TARGET_HAVE_DMB)
 +#define TARGET_HAVE_DMB_MCR    (arm_arch6k  ! TARGET_HAVE_DMB \
 +                                 ! TARGET_THUMB1)

 This hunk (TARGET_HAVE_DMB_MCR) should probably be backported to
 release branches because this is technically fixing an issue and
 hence should be a separate patch that can be looked at separately.

OK, will do.

  /* Nonzero if this chip implements a memory barrier instruction.  */
  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
 @@ -290,8 +291,12 @@ extern void 
 (*arm_lang_output_object_attributes_hook)(void);

 sync.md changes -

 (define_mode_iterator NARROW [QI HI])
+(define_mode_iterator QHSD [QI HI SI DI])
+(define_mode_iterator SIDI [SI DI])
+
+(define_mode_attr sync_predtab [(SI TARGET_HAVE_LDREX  
TARGET_HAVE_MEMORY_BARRIER)
+                              (QI TARGET_HAVE_LDREXBH  
TARGET_HAVE_MEMORY_BARRIER)
+                              (HI TARGET_HAVE_LDREXBH  
TARGET_HAVE_MEMORY_BARRIER)
+                              (DI TARGET_HAVE_LDREXD  
ARM_DOUBLEWORD_ALIGN  TARGET_HAVE_MEMORY_BARRIER)])
+

 Can we move all the iterators to iterators.md and then arrange
 includes to work automatically ? Minor nit - could you align the entries
 for QI, HI and DI with the start of the SI ?

Yes I can do - the only odd thing is I guess the sync_predtab is very
sync.md specific, does it really make sense for that
to be in iterators.md ?

+(define_mode_attr sync_atleastsi [(SI SI)
+                                (DI DI)
+                                (HI SI)
+                                (QI SI)])


 I couldn't spot where this was being used. Can this be removed if not
 necessary ?

Ah - yes I think that's dead; it's a relic from an attempt to merge some of the
other narrow cases into the same iterator but it got way too messy.

-(define_insn arm_sync_new_nandsi
+(define_insn arm_sync_new_sync_optabmode
   [(set (match_operand:SI 0 s_register_operand =r)
-        (unspec_volatile:SI [(not:SI (and:SI
-                               (match_operand:SI 1 arm_sync_memory_operand 
+Q)
-                               (match_operand:SI 2 s_register_operand 
r)))
-                          ]
-                          VUNSPEC_SYNC_NEW_OP))
+        (unspec_volatile:SI [(syncop:SI
+                             (zero_extend:SI
+                               (match_operand:NARROW 1 
arm_sync_memory_operand +Q))
+                             (match_operand:SI 2 s_register_operand r))
+                          ]
+                          VUNSPEC_SYNC_NEW_OP))
    (set (match_dup 1)
-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
-                          VUNSPEC_SYNC_NEW_OP))
+      (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
+                              VUNSPEC_SYNC_NEW_OP))
    (clobber (reg:CC CC_REGNUM))
    (clobber (match_scratch:SI 3 =r))]
-  TARGET_HAVE_LDREX  TARGET_HAVE_MEMORY_BARRIER
+  TARGET_HAVE_LDREXBH  TARGET_HAVE_MEMORY_BARRIER

 Can't this just use sync_predtab instead since the condition is identical
 for QImode and HImode from that mode attribute and in quite a few
 places below. ?

Hmm yes it can - I'd only been using predtab in the places where it was
varying on the mode; but as you say this can be converted as well.

@@ -461,19 +359,19 @@
         (unspec_volatile:SI
         [(not:SI
            (and:SI
-               (zero_extend:SI
-               (match_operand:NARROW 1 arm_sync_memory_operand +Q))
-               (match_operand:SI 2 s_register_operand r)))
+             (zero_extend:SI
+               

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 01:01:59PM +0400, Ilya Enkovich wrote:
  Well, if it is clearly a win to reassociate, you can always reassociate
  them by doing arithmetics in corresponding unsigned type and afterwards
  converting back to the signed type.
 
 You are right. But in this case we again make all operands have
 wrap-around type and thus disable some other optimization. It would be
 nice to have opportunity to reassociate and still have undefined
 behavior on overflow for optimizations. One way to do it for add/sub
 is to use wider type (long long instead of int).

I disagree.  Widening would result in worse code in most cases, as you need
to sign extend all the operands.  On the other side, I doubt you can
actually usefully use the undefinedness of signed overflow for a series of
3 or more operands of the associative operation.

Jakub


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Ilya Enkovich
2011/7/13 Jakub Jelinek ja...@redhat.com:

 I disagree.  Widening would result in worse code in most cases, as you need
 to sign extend all the operands.  On the other side, I doubt you can
 actually usefully use the undefinedness of signed overflow for a series of
 3 or more operands of the associative operation.

        Jakub


Sounds reasonable. Type casting to unsigned should be a better solution here.

Ilya


Re: CFT: [build] Move soft-fp support to toplevel libgcc

2011-07-13 Thread Thomas Schwinge
Hallo Rainer!

On Tue, 12 Jul 2011 19:22:51 +0200, Rainer Orth r...@cebitec.uni-bielefeld.de 
wrote:
 2011-07-09  Rainer Orth  r...@cebitec.uni-bielefeld.de
 
   gcc: [...]
   * config.gcc ([...]
   (i[34567]86-*-darwin*): Remove i386/t-fprules-softfp,
   soft-fp/t-softfp from tmake_file.
   (i[34567]86-*-linux*): Likewise.
   [...]

   i[34567]86-*-linux* | x86_64-*-linux* | \
 i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
 i[34567]86-*-gnu*)
 - tmake_file=${tmake_file} i386/t-fprules-softfp 
 soft-fp/t-softfp i386/t-linux
   ;;

This also removes i386/t-linux from tmake_file, which might not be what
you intended?


Grüße,
 Thomas


pgpPt13NlR0kd.pgp
Description: PGP signature


Re: [build] Remove crt0, mcrt0 support

2011-07-13 Thread Rainer Orth
Paolo Bonzini bonz...@gnu.org writes:

 On 07/12/2011 06:45 PM, Rainer Orth wrote:
 +crt0.o: $(srcdir)/config/i386/netware-crt0.c
 +$(crt_commpile) $(CRTSTUFF_T_CFLAGS) -c $

 Typo here.  Otherwise looks good, thanks.

Fixed and installed.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: CFT: [build] Move soft-fp support to toplevel libgcc

2011-07-13 Thread Rainer Orth
Hi Thomas,

  i[34567]86-*-linux* | x86_64-*-linux* | \
i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
i[34567]86-*-gnu*)
 -tmake_file=${tmake_file} i386/t-fprules-softfp 
 soft-fp/t-softfp i386/t-linux
  ;;

 This also removes i386/t-linux from tmake_file, which might not be what
 you intended?

indeed not.  Will fix in my local copy.

Thanks for noticing.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: The TI C6X port

2011-07-13 Thread Bernd Schmidt
On 05/25/11 02:29, Vladimir Makarov wrote:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00750.html
 Ok.  But changelog entry for sched_change_pattern is absent.

I've committed this with a slight change in sched_change_pattern;
another patch I'm working on showed a need to also clear the cached cost
for resolved dependencies.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 176225)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2011-07-13  Bernd Schmidt  ber...@codesourcery.com
+
+   * sched-int.h (struct _dep): Add member cost.
+   (DEP_COST, UNKNOWN_DEP_COST): New macros.
+   * sched-deps.c (init_dep_1): Initialize DEP_COST.
+   * haifa-sched.c (dep_cost_1): Use and set DEP_COST.
+   (sched_change_pattern): Reset it for dependent insns.
+
 2011-07-13  Rainer Orth  r...@cebitec.uni-bielefeld.de
 
* Makefile.in (CRT0STUFF_T_CFLAGS): Remove.
Index: gcc/haifa-sched.c
===
--- gcc/haifa-sched.c   (revision 176171)
+++ gcc/haifa-sched.c   (working copy)
@@ -854,6 +854,9 @@ dep_cost_1 (dep_t link, dw_t dw)
   rtx used = DEP_CON (link);
   int cost;
 
+  if (DEP_COST (link) != UNKNOWN_DEP_COST)
+return DEP_COST (link);
+
   /* A USE insn should never require the value used to be computed.
  This allows the computation of a function's result and parameter
  values to overlap the return and call.  We don't care about the
@@ -911,6 +914,7 @@ dep_cost_1 (dep_t link, dw_t dw)
cost = 0;
 }
 
+  DEP_COST (link) = cost;
   return cost;
 }
 
@@ -4864,11 +4868,21 @@ fix_recovery_deps (basic_block rec)
 void
 sched_change_pattern (rtx insn, rtx new_pat)
 {
+  sd_iterator_def sd_it;
+  dep_t dep;
   int t;
 
   t = validate_change (insn, PATTERN (insn), new_pat, 0);
   gcc_assert (t);
   dfa_clear_single_insn_cache (insn);
+
+  for (sd_it = sd_iterator_start (insn, (SD_LIST_FORW | SD_LIST_BACK
+| SD_LIST_RES_BACK));
+   sd_iterator_cond (sd_it, dep);)
+{
+  DEP_COST (dep) = UNKNOWN_DEP_COST;
+  sd_iterator_next (sd_it);
+}
 }
 
 /* Change pattern of INSN to NEW_PAT.  Invalidate cached haifa
Index: gcc/sched-deps.c
===
--- gcc/sched-deps.c(revision 176171)
+++ gcc/sched-deps.c(working copy)
@@ -107,6 +107,7 @@ init_dep_1 (dep_t dep, rtx pro, rtx con,
   DEP_CON (dep) = con;
   DEP_TYPE (dep) = type;
   DEP_STATUS (dep) = ds;
+  DEP_COST (dep) = UNKNOWN_DEP_COST;
 }
 
 /* Init DEP with the arguments.
Index: gcc/sched-int.h
===
--- gcc/sched-int.h (revision 176171)
+++ gcc/sched-int.h (working copy)
@@ -215,6 +215,9 @@ struct _dep
   /* Dependency status.  This field holds all dependency types and additional
  information for speculative dependencies.  */
   ds_t status;
+
+  /* Cached cost of the dependency.  */
+  int cost;
 };
 
 typedef struct _dep dep_def;
@@ -224,6 +227,9 @@ typedef dep_def *dep_t;
 #define DEP_CON(D) ((D)-con)
 #define DEP_TYPE(D) ((D)-type)
 #define DEP_STATUS(D) ((D)-status)
+#define DEP_COST(D) ((D)-cost)
+
+#define UNKNOWN_DEP_COST INT_MIN
 
 /* Functions to work with dep.  */
 


Re: [patch 2/8 tree-optimization]: Bitwise logic for fold_range_test and fold_truthop.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_range_test and to fold_truthop for
 one-bit precision
 typed bitwise-binary and bitwise-not expressions.

This looks reasonable but I'd like to see testcases excercising the
foldings (by scanning the .original dump).

Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_range_test): Add
        support for one-bit bitwise operations.
        (fold_truthop): Likewise.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:07:59.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:26.117620200 +0200
 @@ -4819,7 +4819,8 @@ fold_range_test (location_t loc, enum tr
                 tree op0, tree op1)
  {
   int or_op = (code == TRUTH_ORIF_EXPR
 -              || code == TRUTH_OR_EXPR);
 +              || code == TRUTH_OR_EXPR
 +              || code == BIT_IOR_EXPR);
   int in0_p, in1_p, in_p;
   tree low0, low1, low, high0, high1, high;
   bool strict_overflow_p = false;
 @@ -4890,7 +4891,7 @@ fold_range_test (location_t loc, enum tr
        }
     }

 -  return 0;
 +  return NULL_TREE;
  }

  /* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
 @@ -5118,8 +5119,9 @@ fold_truthop (location_t loc, enum tree_
        }
     }

 -  code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
 -         ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);
 +  if (code != BIT_AND_EXPR  code != BIT_IOR_EXPR)
 +    code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR)
 +           ? TRUTH_AND_EXPR : TRUTH_OR_EXPR);

   /* If the RHS can be evaluated unconditionally and its operands are
      simple, it wins to evaluate the RHS unconditionally on machines
 @@ -5134,7 +5136,7 @@ fold_truthop (location_t loc, enum tree_
        simple_operand_p (rr_arg))
     {
       /* Convert (a != 0) || (b != 0) into (a | b) != 0.  */
 -      if (code == TRUTH_OR_EXPR
 +      if ((code == TRUTH_OR_EXPR || code == BIT_IOR_EXPR)
           lcode == NE_EXPR  integer_zerop (lr_arg)
           rcode == NE_EXPR  integer_zerop (rr_arg)
           TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
 @@ -5145,7 +5147,7 @@ fold_truthop (location_t loc, enum tree_
                           build_int_cst (TREE_TYPE (ll_arg), 0));

       /* Convert (a == 0)  (b == 0) into (a | b) == 0.  */
 -      if (code == TRUTH_AND_EXPR
 +      if ((code == TRUTH_AND_EXPR || code == BIT_AND_EXPR)
           lcode == EQ_EXPR  integer_zerop (lr_arg)
           rcode == EQ_EXPR  integer_zerop (rr_arg)
           TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg)
 @@ -5209,7 +5211,8 @@ fold_truthop (location_t loc, enum tree_
      fail.  However, we can convert a one-bit comparison against zero into
      the opposite comparison against that bit being set in the field.  */

 -  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
 +  wanted_code = ((code == TRUTH_AND_EXPR
 +                 || code == BIT_AND_EXPR) ? EQ_EXPR : NE_EXPR);
   if (lcode != wanted_code)
     {
       if (l_const  integer_zerop (l_const)  integer_pow2p (ll_mask))



Re: [patch 3/8 tree-optimization]: Bitwise logic for fold_truth_andor.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_truth_andor for one-bit precision
 typed bitwise-binary and bitwise-not expressions.

Quickly checking some testcases shows we already perform all
the foldings in other places.  So please _always_ check for
all transformations you add if there is a testcase that fails before
and passes after your patch.

(A|B)(A|C) is already folded to (BC)|A.

Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_andor): Add
        support for one-bit bitwise operations.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:19:22.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:14.261620200 +0200
 @@ -8248,6 +8248,12 @@ fold_truth_andor (location_t loc, enum t
   if (!optimize)
     return NULL_TREE;

 +  /* If code is BIT_AND_EXPR or BIT_IOR_EXPR, type precision has to be
 +     one.  Otherwise return NULL_TREE.  */
 +  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
 +       (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1))
 +    return NULL_TREE;
 +
   /* Check for things like (A || B)  (A || C).  We can convert this
      to A || (B  C).  Note that either operator can be any of the four
      truth and/or operations and the transformation will still be
 @@ -8258,7 +8264,9 @@ fold_truth_andor (location_t loc, enum t
        (TREE_CODE (arg0) == TRUTH_ANDIF_EXPR
          || TREE_CODE (arg0) == TRUTH_ORIF_EXPR
          || TREE_CODE (arg0) == TRUTH_AND_EXPR
 -         || TREE_CODE (arg0) == TRUTH_OR_EXPR)
 +         || TREE_CODE (arg0) == TRUTH_OR_EXPR
 +         || TREE_CODE (arg0) == BIT_AND_EXPR
 +         || TREE_CODE (arg0) == BIT_IOR_EXPR)
        ! TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)))
     {
       tree a00 = TREE_OPERAND (arg0, 0);
 @@ -8266,9 +8274,13 @@ fold_truth_andor (location_t loc, enum t
       tree a10 = TREE_OPERAND (arg1, 0);
       tree a11 = TREE_OPERAND (arg1, 1);
       int commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR
 -                         || TREE_CODE (arg0) == TRUTH_AND_EXPR)
 +                         || TREE_CODE (arg0) == TRUTH_AND_EXPR
 +                         || TREE_CODE (arg0) == BIT_IOR_EXPR
 +                         || TREE_CODE (arg0) == BIT_AND_EXPR)
                          (code == TRUTH_AND_EXPR
 -                            || code == TRUTH_OR_EXPR));
 +                            || code == TRUTH_OR_EXPR
 +                            || code == BIT_AND_EXPR
 +                            || code == BIT_IOR_EXPR));

       if (operand_equal_p (a00, a10, 0))
        return fold_build2_loc (loc, TREE_CODE (arg0), type, a00,
 @@ -9484,21 +9496,29 @@ fold_binary_loc (location_t loc,

   if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
        || code == EQ_EXPR || code == NE_EXPR)
 -       ((truth_value_p (TREE_CODE (arg0))
 -           (truth_value_p (TREE_CODE (arg1))
 +       ((truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))
 +           (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1))
               || (TREE_CODE (arg1) == BIT_AND_EXPR
                    integer_onep (TREE_OPERAND (arg1, 1)
 -         || (truth_value_p (TREE_CODE (arg1))
 -              (truth_value_p (TREE_CODE (arg0))
 +         || (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1))
 +              (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))
                  || (TREE_CODE (arg0) == BIT_AND_EXPR
                       integer_onep (TREE_OPERAND (arg0, 1)))
     {
 -      tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
 -                        : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
 -                        : TRUTH_XOR_EXPR,
 -                        boolean_type_node,
 -                        fold_convert_loc (loc, boolean_type_node, arg0),
 -                        fold_convert_loc (loc, boolean_type_node, arg1));
 +      enum tree_code ncode;
 +
 +      /* Do we operate on a non-boolified tree?  */
 +      if (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1)
 +        ncode = code == BIT_AND_EXPR ? TRUTH_AND_EXPR
 +                                    : (code == BIT_IOR_EXPR
 +                                       ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR);
 +      else
 +        ncode = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR) ? code
 +                                                              : BIT_XOR_EXPR;
 +      tem = fold_build2_loc (loc, ncode,
 +                          boolean_type_node,
 +                          fold_convert_loc (loc, boolean_type_node, arg0),
 +                          fold_convert_loc (loc, boolean_type_node, arg1));

       if (code == EQ_EXPR)
        tem = invert_truthvalue_loc 

Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_binary_loc for one-bit precision
 typed bitwise-or expression.

Seems to be a fallout of the missing TRUTH_NOT conversion as well.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_binary_loc): Add
        support for one-bit bitwise-or optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:23:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:04.011620200 +0200
 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc,
          return omit_one_operand_loc (loc, type, t1, arg0);
        }

 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         /* If arg0 is constant zero, drop it.  */
 +         if (TREE_CODE (arg0) == INTEGER_CST  integer_zerop (arg0))
 +           return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
 +         if (TREE_CODE (arg0) == INTEGER_CST  ! integer_zerop (arg0))
 +           return omit_one_operand_loc (loc, type, arg0, arg1);
 +
 +         /* !X | X is always true. ~X | X is always true.  */
 +         if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg0) == BIT_NOT_EXPR)
 +              operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg1);
 +         /* X | !X is always true. X | ~X is always true.  */
 +         if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
 +             || TREE_CODE (arg1) == BIT_NOT_EXPR)
 +              operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg0);
 +
 +         /* (X  !Y) | (!X  Y) is X ^ Y */
 +         if (TREE_CODE (arg0) == BIT_AND_EXPR
 +              TREE_CODE (arg1) == BIT_AND_EXPR)
 +           {
 +             tree a0, a1, l0, l1, n0, n1;
 +
 +             a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
 +             a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
 +
 +             l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
 +             l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
 +
 +             n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0);
 +             n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1);
 +
 +             if ((operand_equal_p (n0, a0, 0)
 +                   operand_equal_p (n1, a1, 0))
 +                 || (operand_equal_p (n0, a1, 0)
 +                      operand_equal_p (n1, a0, 0)))
 +               return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1);
 +           }
 +
 +         tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1);
 +         if (tem)
 +           return tem;
 +        }
 +
       /* Canonicalize (X  C1) | C2.  */
       if (TREE_CODE (arg0) == BIT_AND_EXPR
           TREE_CODE (arg1) == INTEGER_CST



Re: [patch 5/8 tree-optimization]: Bitwise xor logic for fold_binary_loc.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:34 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_binary_loc for one-bit precision
 typed bitwise-xor expression.

Similar - we don't want to build a TRUTH_NOT_EXPR from a BIT_XOR_EXPR.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_binary_loc): Add
        support for one-bit bitwise-xor optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:38:06.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:58:52.686620200 +0200
 @@ -10872,11 +10872,35 @@ fold_binary_loc (location_t loc,
     case BIT_XOR_EXPR:
       if (integer_zerop (arg1))
        return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
 -      if (integer_all_onesp (arg1))
 -       return fold_build1_loc (loc, BIT_NOT_EXPR, type, op0);
       if (operand_equal_p (arg0, arg1, 0))
        return omit_one_operand_loc (loc, type, integer_zero_node, arg0);

 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         /* If the second arg is constant true, this is a logical inversion. 
  */
 +         if (integer_onep (arg1))
 +           {
 +             tem = invert_truthvalue_loc (loc, arg0);
 +             return non_lvalue_loc (loc, fold_convert_loc (loc, type, tem));
 +           }
 +       }
 +      else if (integer_all_onesp (arg1))
 +       return fold_build1_loc (loc, BIT_NOT_EXPR, type, op0);
 +
 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         /* !X ^ X is always true.  ~X ^X is always true.  */
 +         if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg0) == BIT_NOT_EXPR)
 +              operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg1);
 +         /* X ^ !X is always true.  X ^ ~X is always true.  */
 +         if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg1) == BIT_NOT_EXPR)
 +              operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg0);
 +       }
 +
       /* ~X ^ X is -1.  */
       if (TREE_CODE (arg0) == BIT_NOT_EXPR
           operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 @@ -10911,7 +10935,7 @@ fold_binary_loc (location_t loc,
          goto bit_ior;
        }

 -      /* (X | Y) ^ X - Y  ~ X*/
 +      /* (X | Y) ^ X - Y  ~ X.  */
       if (TREE_CODE (arg0) == BIT_IOR_EXPR
            operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
         {



Re: [patch 6/8 tree-optimization]: Bitwise and logic for fold_binary_loc.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:34 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_binary_loc for one-bit precision
 typed bitwise-and expression.

Similar ... your patch descriptions are useless btw.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_binary_loc): Add
        support for one-bit bitwise-and optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:43:37.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:58:38.692620200 +0200
 @@ -11062,6 +11062,48 @@ fold_binary_loc (location_t loc,
       if (operand_equal_p (arg0, arg1, 0))
        return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));

 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         if (TREE_CODE (arg0) == INTEGER_CST  ! integer_zerop (arg0))
 +           return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
 +         if (TREE_CODE (arg1) == INTEGER_CST  ! integer_zerop (arg1))
 +           return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
 +         /* Likewise for first arg.  */
 +         if (integer_zerop (arg0))
 +           return omit_one_operand_loc (loc, type, arg0, arg1);
 +
 +         /* !X  X is always false.  ~X  X is always false.  */
 +         if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg0) == BIT_NOT_EXPR)
 +              operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 +           return omit_one_operand_loc (loc, type, integer_zero_node, arg1);
 +         /* X  !X is always false.  X  ~X is always false.  */
 +         if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg1) == BIT_NOT_EXPR)
 +              operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 +           return omit_one_operand_loc (loc, type, integer_zero_node, arg0);
 +
 +         /* (A  X)  (A + 1  Y) == (A  X)  (A = Y).  Normally
 +            A + 1  Y means (A = Y)  (A != MAX), but in this case
 +            we know that A  X = MAX.  */
 +
 +         if (!TREE_SIDE_EFFECTS (arg0)  !TREE_SIDE_EFFECTS (arg1))
 +           {
 +             tem = fold_to_nonsharp_ineq_using_bound (loc, arg0, arg1);
 +             if (tem  !operand_equal_p (tem, arg0, 0))
 +               return fold_build2_loc (loc, code, type, tem, arg1);
 +
 +             tem = fold_to_nonsharp_ineq_using_bound (loc, arg1, arg0);
 +             if (tem  !operand_equal_p (tem, arg1, 0))
 +               return fold_build2_loc (loc, code, type, arg0, tem);
 +           }
 +
 +         tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1);
 +         if (tem)
 +           return tem;
 +
 +       }
 +
       /* ~X  X, (X == 0)  X, and !X  X are always zero.  */
       if ((TREE_CODE (arg0) == BIT_NOT_EXPR
           || TREE_CODE (arg0) == TRUTH_NOT_EXPR



Re: [patch 7/8 tree-optimization]: Bitwise not logic for fold_unary_loc.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_unary_loc for one-bit precision
 typed bitwise-not expression.

Similar ...

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_unary_loc): Add
        support for one-bit bitwise-not optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:49:50.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:56:45.170171300 +0200
 @@ -8094,6 +8094,12 @@ fold_unary_loc (location_t loc, enum tre
          if (i == count)
            return build_vector (type, nreverse (list));
        }
 +      if (INTEGRAL_TYPE_P (type)  TYPE_PRECISION (type) == 1)
 +        {
 +         tem = fold_truth_not_expr (loc, arg0);
 +         if (tem)
 +           return fold_convert_loc (loc, type, tem);
 +       }

       return NULL_TREE;



Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.

2011-07-13 Thread Kai Tietz
2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_binary_loc for one-bit precision
 typed bitwise-or expression.

 Seems to be a fallout of the missing TRUTH_NOT conversion as well.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_binary_loc): Add
        support for one-bit bitwise-or optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:23:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:04.011620200 +0200
 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc,
          return omit_one_operand_loc (loc, type, t1, arg0);
        }

 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         /* If arg0 is constant zero, drop it.  */
 +         if (TREE_CODE (arg0) == INTEGER_CST  integer_zerop (arg0))
 +           return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
 +         if (TREE_CODE (arg0) == INTEGER_CST  ! integer_zerop (arg0))
 +           return omit_one_operand_loc (loc, type, arg0, arg1);
 +
 +         /* !X | X is always true. ~X | X is always true.  */
 +         if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg0) == BIT_NOT_EXPR)
 +              operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg1);
 +         /* X | !X is always true. X | ~X is always true.  */
 +         if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
 +             || TREE_CODE (arg1) == BIT_NOT_EXPR)
 +              operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg0);
 +
 +         /* (X  !Y) | (!X  Y) is X ^ Y */
 +         if (TREE_CODE (arg0) == BIT_AND_EXPR
 +              TREE_CODE (arg1) == BIT_AND_EXPR)
 +           {
 +             tree a0, a1, l0, l1, n0, n1;
 +
 +             a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
 +             a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
 +
 +             l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
 +             l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
 +
 +             n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0);
 +             n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1);
 +
 +             if ((operand_equal_p (n0, a0, 0)
 +                   operand_equal_p (n1, a1, 0))
 +                 || (operand_equal_p (n0, a1, 0)
 +                      operand_equal_p (n1, a0, 0)))
 +               return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1);
 +           }
 +
 +         tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1);
 +         if (tem)
 +           return tem;
 +        }
 +
       /* Canonicalize (X  C1) | C2.  */
       if (TREE_CODE (arg0) == BIT_AND_EXPR
           TREE_CODE (arg1) == INTEGER_CST

Well, I wouldn't call it fallout.  As by this we are able to handle
things like ~(X = B) and see that it can be converted to X  B.  The
point here is that we avoid that fold re-introduces here the TRUTH
variants for the bitwise ones (for sure some parts are redudant and
might be something to be factored out like we did for truth_andor
function). Also we catch by this patterns like ~X op ~Y and convert
them to ~(X op Y), which is just valid for one-bit precision typed X
and Y.
As in general !x is not the same as ~x, beside x has one-bit precision
integeral type.

 I will adjust patches so, that for one-bit precision type we alway
use here instead BIT_NOT_EXPR (instead of TRUTH_NOT). This is
reasonable.


Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 this patch fixes that for replaced uses, we call fold_stmt_inplace. 
 Additionally
 it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 
 for
 X with one-bit precision type.

 ChangeLog gcc/

 2011-07-13  Kai Tietz  kti...@redhat.com

        * gimple-fold.c (fold_gimple_assign): Add normalization
        for compares of 1-bit integer precision operands.
        * tree-ssa-propagate.c (replace_uses_in): Call
        fold_stmt_inplace on modified statement.

err - sure not.  The caller already does that.

Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value,
fold_fn=0, do_dce=1 '\001')
at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134
1134  if (get_value_fn)
D.2696_8 = a_1(D) != D.2704_10;

(gdb) n
1135did_replace |= replace_uses_in (stmt, get_value_fn);
(gdb)
1138  if (did_replace)
(gdb) call debug_gimple_stmt (stmt)
D.2696_8 = a_1(D) != 1;

(gdb) p did_replace
$1 = 1 '\001'
(gdb) n
1139fold_stmt (oldi);

so figure out why fold_stmt does not do its work instead.  Which I
quickly checked in gdb and it dispatches to fold_binary with
boolean-typed arguments as a_1 != 1 where you can see the
canonical form for this is !(int) a_1 because of a bug I think.

  /* bool_var != 1 becomes !bool_var. */
  if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE  integer_onep (arg1)
   code == NE_EXPR)
return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
fold_convert_loc (loc, type, arg0));

at least I don't see why we need to convert arg0 to the type of the
comparison.

You need to improve your debugging skills and see why existing
transformations are not working before adding new ones.

Richard.

 ChangeLog gcc/testsuite

 2011-07-13  Kai Tietz  kti...@redhat.com

        * gcc.dg/tree-ssa/fold-1.c: New test.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.  Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/gimple-fold.c
 ===
 --- gcc.orig/gcc/gimple-fold.c  2011-07-13 10:37:32.0 +0200
 +++ gcc/gcc/gimple-fold.c       2011-07-13 10:39:05.100843400 +0200
 @@ -815,6 +815,17 @@ fold_gimple_assign (gimple_stmt_iterator
                                             gimple_assign_rhs2 (stmt));
        }

 +      if (!result  (subcode == EQ_EXPR || subcode == NE_EXPR)
 +          INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt)))
 +          TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) == 1
 +          integer_onep (gimple_assign_rhs2 (stmt)))
 +       result = build2_loc (loc, (subcode == EQ_EXPR ? NE_EXPR : EQ_EXPR),
 +                            TREE_TYPE (gimple_assign_lhs (stmt)),
 +                            gimple_assign_rhs1 (stmt),
 +                            fold_convert_loc (loc,
 +                               TREE_TYPE (gimple_assign_rhs1 (stmt)),
 +                               integer_zero_node));
 +
       if (!result)
         result = fold_binary_loc (loc, subcode,
                               TREE_TYPE (gimple_assign_lhs (stmt)),
 Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c
 ===
 --- /dev/null   1970-01-01 00:00:00.0 +
 +++ gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c  2011-07-13
 10:50:38.294367800 +0200
 @@ -0,0 +1,9 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +int foo (_Bool a, _Bool b)
 +{
 +  return a != ((b | !b));
 +}
 +/* { dg-final { scan-tree-dump-not  != 1 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 Index: gcc/gcc/tree-ssa-propagate.c
 ===
 --- gcc.orig/gcc/tree-ssa-propagate.c   2011-07-13 10:37:42.0 +0200
 +++ gcc/gcc/tree-ssa-propagate.c        2011-07-13 10:40:25.688576800 +0200
 @@ -904,6 +904,8 @@ replace_uses_in (gimple stmt, ssa_prop_g

       propagate_value (use, val);

 +      fold_stmt_inplace (stmt);
 +
       replaced = true;
     }



Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
 issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
when you present it with BIT_ variants as input.

  To avoid
 it, it needs to learn to handle 1-bit precision folding for those
 bitwise-operation on 1-bit integer types special.
 As gimple replies on this FE fold for now, it has to be learn about
 that. As soon as gimple_fold (and other passes) don't rely anymore on
 FE's fold-const, then we can remove those parts again.  Otherwise this
 boolification of compares (and also the transition of TRUTH_NOT -
 BIT_NOT, simply doesn't work so long.

I do not believe that.


 Regards,
 Kai

 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 I split my old patch into 8 speparate pieces for easier review.  These 
 patches
 are a prerequist for enabling boolification of comparisons in gimplifier and
 the necessary type-cast preserving in gimple from/to boolean-type.

 This patch adds support to fold_truth_not_expr for one-bit precision typed
 bitwise-binary and bitwise-not expressions.

 It seems this is only necessary because we still have TRUTH_NOT_EXPR
 in our IL and did not replace that with BIT_NOT_EXPR consistently yet.

 So no, this is not ok.  fold-const.c is really mostly supposed to deal
 with GENERIC where we distinguish TRUTH_* and BIT_* variants.

 Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple.

 Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_not_expr): Add
        support for one-bit bitwise operations.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 07:48:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:36.865620200 +0200
 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre
     case INTEGER_CST:
       return constant_boolean_node (integer_zerop (arg), type);

 +    case BIT_AND_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      if (integer_onep (TREE_OPERAND (arg, 1)))
 +       return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 
 0));
 +      /* fall through */
     case TRUTH_AND_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_OR_EXPR, type,
 +      return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR
 +                                                   : TRUTH_OR_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 
 1)));

 +    case BIT_IOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_OR_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc);
 -      return build2_loc (loc, TRUTH_AND_EXPR, type,
 +      return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR
 +                                                   : TRUTH_AND_EXPR), type,
                         invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)),
                         invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 
 1)));

 +    case BIT_XOR_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      /* fall through.  */
     case TRUTH_XOR_EXPR:
       /* Here we can invert either operand.  We invert the first operand
         unless the second operand is a TRUTH_NOT_EXPR in which case our
 @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre
         negation of the second operand.  */

       if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR)
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0),
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
 +                          TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
 +      else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR
 +               TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1)
 +       return build2_loc (loc, code, type, TREE_OPERAND (arg, 0),
                           TREE_OPERAND (TREE_OPERAND (arg, 1), 0));
       else
 -       return build2_loc (loc, TRUTH_XOR_EXPR, type,
 +       return build2_loc (loc, code, type,
                           invert_truthvalue_loc (loc, TREE_OPERAND (arg, 
 0)),
                           TREE_OPERAND (arg, 1));

 @@ 

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 11:18 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/7/13 Jakub Jelinek ja...@redhat.com:

 I disagree.  Widening would result in worse code in most cases, as you need
 to sign extend all the operands.  On the other side, I doubt you can
 actually usefully use the undefinedness of signed overflow for a series of
 3 or more operands of the associative operation.

        Jakub


 Sounds reasonable. Type casting to unsigned should be a better solution here.

Well, the solution of course lies in the no-undefined-overflow branch
where we have separate tree codes for arithmetic with/without
undefined overflow.

Richard.

 Ilya



Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X

2011-07-13 Thread Kai Tietz
2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 this patch fixes that for replaced uses, we call fold_stmt_inplace. 
 Additionally
 it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 
 for
 X with one-bit precision type.

 ChangeLog gcc/

 2011-07-13  Kai Tietz  kti...@redhat.com

        * gimple-fold.c (fold_gimple_assign): Add normalization
        for compares of 1-bit integer precision operands.
        * tree-ssa-propagate.c (replace_uses_in): Call
        fold_stmt_inplace on modified statement.

 err - sure not.  The caller already does that.

 Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value,
    fold_fn=0, do_dce=1 '\001')
    at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134
 1134              if (get_value_fn)
 D.2696_8 = a_1(D) != D.2704_10;

 (gdb) n
 1135                did_replace |= replace_uses_in (stmt, get_value_fn);
 (gdb)
 1138              if (did_replace)
 (gdb) call debug_gimple_stmt (stmt)
 D.2696_8 = a_1(D) != 1;

 (gdb) p did_replace
 $1 = 1 '\001'
 (gdb) n
 1139                fold_stmt (oldi);

 so figure out why fold_stmt does not do its work instead.  Which I
 quickly checked in gdb and it dispatches to fold_binary with
 boolean-typed arguments as a_1 != 1 where you can see the
 canonical form for this is !(int) a_1 because of a bug I think.

      /* bool_var != 1 becomes !bool_var. */
      if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE  integer_onep (arg1)
           code == NE_EXPR)
        return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
                            fold_convert_loc (loc, type, arg0));

 at least I don't see why we need to convert arg0 to the type of the
 comparison.

Well, this type-cast is required by C specification - integer
autopromotion - AFAIR.  So I don't think FE maintainer would be happy
about this change.
Nevertheless I saw this pattern before, and was wondering why we check
here for boolean_type at all. This might be in Ada-case even a latent
bug due type-precision, and it prevents signed case detection too.
IMHO this check should look like that:

  /* bool_var != 1 becomes !bool_var. */
  if (INTEGRAL_TYPE_P (TREE_TYPE (arg0))
  TYPE_PRECISION (TREE_TYPE (arg0)) == 1
  integer_onep (arg1)  code == NE_EXPR)
return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
fold_convert_loc (loc, type, arg0));

For thie BIT_NOT_EXPR variant, the cast of arg0 would be of course
false, as ~(bool) is of course different in result then ~(int)

 You need to improve your debugging skills and see why existing
 transformations are not working before adding new ones.

I work on that.

Kai


RFA: Tighten vector aliasing check

2011-07-13 Thread Richard Sandiford
tree-vect-loop-manip.c assumes there is an alias if:

 ((store_ptr_0 + store_segment_length_0)  load_ptr_0)
 || (load_ptr_0 + load_segment_length_0)  store_ptr_0))

which means that contiguous arrays are unnecessarily considered to alias.

This patch changes the  to =.  Tested on x86_64-linux-gnu
(all languages).  OK to install?

Richard


gcc/
* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Tighten
overlap check.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2011-06-22 16:46:34.0 +0100
+++ gcc/tree-vect-loop-manip.c  2011-07-13 11:12:06.0 +0100
@@ -2409,13 +2409,13 @@ vect_create_cond_for_alias_checks (loop_
   tree part_cond_expr, length_factor;
 
   /* Create expression
- ((store_ptr_0 + store_segment_length_0)  load_ptr_0)
- || (load_ptr_0 + load_segment_length_0)  store_ptr_0))
+ ((store_ptr_0 + store_segment_length_0) = load_ptr_0)
+ || (load_ptr_0 + load_segment_length_0) = store_ptr_0))
  
  ...
  
- ((store_ptr_n + store_segment_length_n)  load_ptr_n)
- || (load_ptr_n + load_segment_length_n)  store_ptr_n))  */
+ ((store_ptr_n + store_segment_length_n) = load_ptr_n)
+ || (load_ptr_n + load_segment_length_n) = store_ptr_n))  */
 
   if (VEC_empty (ddr_p, may_alias_ddrs))
 return;
@@ -2484,8 +2484,8 @@ vect_create_cond_for_alias_checks (loop_
 
   part_cond_expr =
fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
- fold_build2 (LT_EXPR, boolean_type_node, seg_a_max, seg_b_min),
- fold_build2 (LT_EXPR, boolean_type_node, seg_b_max, seg_a_min));
+ fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min),
+ fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min));
 
   if (*cond_expr)
*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,


Re: [patch tree-optimization]: [3 of 3]: Boolify compares more

2011-07-13 Thread Richard Guenther
On Tue, Jul 12, 2011 at 6:55 PM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 As discussed on IRC, I reuse here the do_dce flag to choose folding
 direction within BB.

 Bootstrapped and regression tested for all standard-languages (plus
 Ada and Obj-C++) on host x86_64-pc-linux-gnu.

 Ok for apply?

The tree-ssa-propagate.c change is ok on its own.

For the tree-vrp.c changes you didn't follow the advise of removing
the TRUTH_ op support and instead generalizing the BIT_ op support
properly.  There should be no 1-bit type thing left.

Richard.

 Regards,
 Kai

 ChangeLog gcc/

 2011-07-12  Kai Tietz  kti...@redhat.com

        * tree-ssa-propagate.c (substitute_and_fold):
        Only use last to first scanning direction if do_cde
        is true.
        * tree-vrp.c (extract_range_from_binary_expr): Add
        handling for BIT_IOR_EXPR, BIT_AND_EXPR, and BIT_NOT_EXPR.
        (register_edge_assert_for_1): Add handling for 1-bit
        BIT_IOR_EXPR and BIT_NOT_EXPR.
        (register_edge_assert_for): Add handling for 1-bit
        BIT_IOR_EXPR.
        (ssa_name_get_inner_ssa_name_p): New helper function.
        (ssa_name_get_cast_to_p): New helper function.
        (simplify_truth_ops_using_ranges): Handle prefixed
        cast instruction for result, and add support for one
        bit precision BIT_IOR_EXPR, BIT_AND_EXPR, BIT_XOR_EXPR,
        and BIT_NOT_EXPR.
        (simplify_stmt_using_ranges): Add handling for one bit
        precision BIT_IOR_EXPR, BIT_AND_EXPR, BIT_XOR_EXPR,
        and BIT_NOT_EXPR.

 ChangeLog gcc/testsuite

 2011-07-08  Kai Tietz  kti...@redhat.com

        * gcc.dg/tree-ssa/vrp47.c: Remove dom-output
        and adjust testcase for vrp output analysis.

 Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
 ===
 --- gcc.orig/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c      2011-07-12
 15:21:23.793440400 +0200
 +++ gcc/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c   2011-07-12
 15:27:11.892259100 +0200
 @@ -4,7 +4,7 @@
    jumps when evaluating an  condition.  VRP is not able to optimize
    this.  */
  /* { dg-do compile { target { ! mips*-*-* s390*-*-*  avr-*-*
 mn10300-*-* } } } */
 -/* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom } */
 +/* { dg-options -O2 -fdump-tree-vrp } */
  /* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom -march=i586 {
 target { i?86-*-*  ilp32 } } } */

  int h(int x, int y)
 @@ -36,13 +36,10 @@ int f(int x)
    0 or 1.  */
  /* { dg-final { scan-tree-dump-times \[xy\]\[^ \]* != 0 vrp1 } } */

 -/* This one needs more copy propagation that only happens in dom1.  */
 -/* { dg-final { scan-tree-dump-times x\[^ \]*  y 1 dom1 } } */
 -/* { dg-final { scan-tree-dump-times x\[^ \]*  y 1 vrp1 { xfail
 *-*-* } } } */
 +/* { dg-final { scan-tree-dump-times x\[^ \]*  y 1 vrp1 } } */

  /* These two are fully simplified by VRP.  */
  /* { dg-final { scan-tree-dump-times x\[^ \]* \[|\] y 1 vrp1 } } */
  /* { dg-final { scan-tree-dump-times x\[^ \]* \\^ 1 1 vrp1 } } */

  /* { dg-final { cleanup-tree-dump vrp\[0-9\] } } */
 -/* { dg-final { cleanup-tree-dump dom\[0-9\] } } */
 Index: gcc/gcc/tree-ssa-propagate.c
 ===
 --- gcc.orig/gcc/tree-ssa-propagate.c   2011-07-12 15:21:23.804440400 +0200
 +++ gcc/gcc/tree-ssa-propagate.c        2011-07-12 15:28:22.83100 +0200
 @@ -979,6 +979,9 @@ replace_phi_args_in (gimple phi, ssa_pro

    DO_DCE is true if trivially dead stmts can be removed.

 +   If DO_DCE is true, the statements within a BB are walked from
 +   last to first element.  Otherwise we scan from first to last element.
 +
    Return TRUE when something changed.  */

  bool
 @@ -1059,9 +1062,10 @@ substitute_and_fold (ssa_prop_get_value_
        for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (i))
          replace_phi_args_in (gsi_stmt (i), get_value_fn);

 -      /* Propagate known values into stmts.  Do a backward walk to expose
 -        more trivially deletable stmts.  */
 -      for (i = gsi_last_bb (bb); !gsi_end_p (i);)
 +      /* Propagate known values into stmts.  Do a backward walk if
 +         do_dce is true. In some case it exposes
 +        more trivially deletable stmts to walk backward.  */
 +      for (i = (do_dce ? gsi_last_bb (bb) : gsi_start_bb (bb));
 !gsi_end_p (i);)
        {
           bool did_replace;
          gimple stmt = gsi_stmt (i);
 @@ -1070,7 +1074,10 @@ substitute_and_fold (ssa_prop_get_value_
          gimple_stmt_iterator oldi;

          oldi = i;
 -         gsi_prev (i);
 +         if (do_dce)
 +           gsi_prev (i);
 +         else
 +           gsi_next (i);

          /* Ignore ASSERT_EXPRs.  They are used by VRP to generate
             range information for names and they are discarded
 Index: gcc/gcc/tree-vrp.c
 ===
 --- gcc.orig/gcc/tree-vrp.c     2011-07-12 15:21:23.838440400 +0200
 +++ gcc/gcc/tree-vrp.c  

Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 12:39 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 This patch adds support to fold_binary_loc for one-bit precision
 typed bitwise-or expression.

 Seems to be a fallout of the missing TRUTH_NOT conversion as well.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_binary_loc): Add
        support for one-bit bitwise-or optimizeation.

 Bootstrapped and regression tested with prior patches of this series
 for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 08:23:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:04.011620200 +0200
 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc,
          return omit_one_operand_loc (loc, type, t1, arg0);
        }

 +      if (TYPE_PRECISION (type) == 1  INTEGRAL_TYPE_P (type))
 +        {
 +         /* If arg0 is constant zero, drop it.  */
 +         if (TREE_CODE (arg0) == INTEGER_CST  integer_zerop (arg0))
 +           return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1));
 +         if (TREE_CODE (arg0) == INTEGER_CST  ! integer_zerop (arg0))
 +           return omit_one_operand_loc (loc, type, arg0, arg1);
 +
 +         /* !X | X is always true. ~X | X is always true.  */
 +         if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR
 +              || TREE_CODE (arg0) == BIT_NOT_EXPR)
 +              operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg1);
 +         /* X | !X is always true. X | ~X is always true.  */
 +         if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR
 +             || TREE_CODE (arg1) == BIT_NOT_EXPR)
 +              operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 +           return omit_one_operand_loc (loc, type, integer_one_node, arg0);
 +
 +         /* (X  !Y) | (!X  Y) is X ^ Y */
 +         if (TREE_CODE (arg0) == BIT_AND_EXPR
 +              TREE_CODE (arg1) == BIT_AND_EXPR)
 +           {
 +             tree a0, a1, l0, l1, n0, n1;
 +
 +             a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
 +             a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
 +
 +             l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
 +             l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
 +
 +             n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0);
 +             n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1);
 +
 +             if ((operand_equal_p (n0, a0, 0)
 +                   operand_equal_p (n1, a1, 0))
 +                 || (operand_equal_p (n0, a1, 0)
 +                      operand_equal_p (n1, a0, 0)))
 +               return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1);
 +           }
 +
 +         tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1);
 +         if (tem)
 +           return tem;
 +        }
 +
       /* Canonicalize (X  C1) | C2.  */
       if (TREE_CODE (arg0) == BIT_AND_EXPR
           TREE_CODE (arg1) == INTEGER_CST

 Well, I wouldn't call it fallout.  As by this we are able to handle
 things like ~(X = B) and see that it can be converted to X  B.  The
 point here is that we avoid that fold re-introduces here the TRUTH
 variants for the bitwise ones (for sure some parts are redudant and
 might be something to be factored out like we did for truth_andor
 function). Also we catch by this patterns like ~X op ~Y and convert
 them to ~(X op Y), which is just valid for one-bit precision typed X
 and Y.
 As in general !x is not the same as ~x, beside x has one-bit precision
 integeral type.

  I will adjust patches so, that for one-bit precision type we alway
 use here instead BIT_NOT_EXPR (instead of TRUTH_NOT). This is
 reasonable.

Sorry, but no.

fold-const.c should not look at 1-bitness at all.  fold-const.c should
special-case BOOLEAN_TYPEs - and it does that already.

This patch series makes me think that it is premature given that
on gimple we still mix TRUTH_NOT_EXPR and BIT_*_EXPRs.

Richard.


Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 12:56 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 this patch fixes that for replaced uses, we call fold_stmt_inplace. 
 Additionally
 it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 
 for
 X with one-bit precision type.

 ChangeLog gcc/

 2011-07-13  Kai Tietz  kti...@redhat.com

        * gimple-fold.c (fold_gimple_assign): Add normalization
        for compares of 1-bit integer precision operands.
        * tree-ssa-propagate.c (replace_uses_in): Call
        fold_stmt_inplace on modified statement.

 err - sure not.  The caller already does that.

 Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value,
    fold_fn=0, do_dce=1 '\001')
    at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134
 1134              if (get_value_fn)
 D.2696_8 = a_1(D) != D.2704_10;

 (gdb) n
 1135                did_replace |= replace_uses_in (stmt, get_value_fn);
 (gdb)
 1138              if (did_replace)
 (gdb) call debug_gimple_stmt (stmt)
 D.2696_8 = a_1(D) != 1;

 (gdb) p did_replace
 $1 = 1 '\001'
 (gdb) n
 1139                fold_stmt (oldi);

 so figure out why fold_stmt does not do its work instead.  Which I
 quickly checked in gdb and it dispatches to fold_binary with
 boolean-typed arguments as a_1 != 1 where you can see the
 canonical form for this is !(int) a_1 because of a bug I think.

      /* bool_var != 1 becomes !bool_var. */
      if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE  integer_onep (arg1)
           code == NE_EXPR)
        return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
                            fold_convert_loc (loc, type, arg0));

 at least I don't see why we need to convert arg0 to the type of the
 comparison.

 Well, this type-cast is required by C specification - integer
 autopromotion - AFAIR.  So I don't think FE maintainer would be happy
 about this change.

?  fold-const.c isn't supposed to perform integer promotion.  It's input
will have integer promotions if the frontend requires them.  If they are
semantically not needed fold-const.c strips them away anyway.

 Nevertheless I saw this pattern before, and was wondering why we check
 here for boolean_type at all. This might be in Ada-case even a latent
 bug due type-precision, and it prevents signed case detection too.
 IMHO this check should look like that:

      /* bool_var != 1 becomes !bool_var. */
      if (INTEGRAL_TYPE_P (TREE_TYPE (arg0))
          TYPE_PRECISION (TREE_TYPE (arg0)) == 1
          integer_onep (arg1)  code == NE_EXPR)
        return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
                            fold_convert_loc (loc, type, arg0));

No it should not.  The BOOLEAN_TYPE check is exactly correct.

 For thie BIT_NOT_EXPR variant, the cast of arg0 would be of course
 false, as ~(bool) is of course different in result then ~(int)

 You need to improve your debugging skills and see why existing
 transformations are not working before adding new ones.

 I work on that.

 Kai



Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Kai Tietz
2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
 issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

 I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
 when you present it with BIT_ variants as input.

Well, look into fold-const's fold_binary_loc function and see

  /* ARG0 is the first operand of EXPR, and ARG1 is the second operand.

 First check for cases where an arithmetic operation is applied to a
 compound, conditional, or comparison operation.  Push the arithmetic
 operation inside the compound or conditional to see if any folding
 can then be done.  Convert comparison to conditional for this purpose.
 The also optimizes non-constant cases that used to be done in
 expand_expr.

 Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR,
 one of the operands is a comparison and the other is a comparison, a
 BIT_AND_EXPR with the constant 1, or a truth value.  In that case, the
 code below would make the expression more complex.  Change it to a
 TRUTH_{AND,OR}_EXPR.  Likewise, convert a similar NE_EXPR to
 TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR.  */

  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
   || code == EQ_EXPR || code == NE_EXPR)
   ((truth_value_p (TREE_CODE (arg0))
(truth_value_p (TREE_CODE (arg1))
   || (TREE_CODE (arg1) == BIT_AND_EXPR
integer_onep (TREE_OPERAND (arg1, 1)
  || (truth_value_p (TREE_CODE (arg1))
   (truth_value_p (TREE_CODE (arg0))
  || (TREE_CODE (arg0) == BIT_AND_EXPR
   integer_onep (TREE_OPERAND (arg0, 1)))
{
  tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
 : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
 : TRUTH_XOR_EXPR,
 boolean_type_node,
 fold_convert_loc (loc, boolean_type_node, arg0),
 fold_convert_loc (loc, boolean_type_node, arg1));

  if (code == EQ_EXPR)
tem = invert_truthvalue_loc (loc, tem);

  return fold_convert_loc (loc, type, tem);
}

Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands
are of kind truth.  This is btw the point, why you see that those
cases are handled.  But as soon as this part is turned off for BIT_-
IOR/AND, we need to do the folding for 1-bit precision case explicit.



   To avoid
 it, it needs to learn to handle 1-bit precision folding for those
 bitwise-operation on 1-bit integer types special.
 As gimple replies on this FE fold for now, it has to be learn about
 that. As soon as gimple_fold (and other passes) don't rely anymore on
 FE's fold-const, then we can remove those parts again.  Otherwise this
 boolification of compares (and also the transition of TRUTH_NOT -
 BIT_NOT, simply doesn't work so long.

 I do not believe that.


 Regards,
 Kai

 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 I split my old patch into 8 speparate pieces for easier review.  These 
 patches
 are a prerequist for enabling boolification of comparisons in gimplifier 
 and
 the necessary type-cast preserving in gimple from/to boolean-type.

 This patch adds support to fold_truth_not_expr for one-bit precision typed
 bitwise-binary and bitwise-not expressions.

 It seems this is only necessary because we still have TRUTH_NOT_EXPR
 in our IL and did not replace that with BIT_NOT_EXPR consistently yet.

 So no, this is not ok.  fold-const.c is really mostly supposed to deal
 with GENERIC where we distinguish TRUTH_* and BIT_* variants.

 Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple.

 Richard.

 ChangeLog

 2011-07-13  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_not_expr): Add
        support for one-bit bitwise operations.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.
 Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c   2011-07-13 07:48:29.0 +0200
 +++ gcc/gcc/fold-const.c        2011-07-13 08:59:36.865620200 +0200
 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre
     case INTEGER_CST:
       return constant_boolean_node (integer_zerop (arg), type);

 +    case BIT_AND_EXPR:
 +      if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1)
 +        return NULL_TREE;
 +      if (integer_onep (TREE_OPERAND (arg, 1)))
 +       return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 
 0));
 +      /* fall through */
     case TRUTH_AND_EXPR:
       loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
       

Re: RFA: Tighten vector aliasing check

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 1:00 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 tree-vect-loop-manip.c assumes there is an alias if:

     ((store_ptr_0 + store_segment_length_0)  load_ptr_0)
     || (load_ptr_0 + load_segment_length_0)  store_ptr_0))

 which means that contiguous arrays are unnecessarily considered to alias.

 This patch changes the  to =.  Tested on x86_64-linux-gnu
 (all languages).  OK to install?

Ok.

Thanks,
Richard.

 Richard


 gcc/
        * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Tighten
        overlap check.

 Index: gcc/tree-vect-loop-manip.c
 ===
 --- gcc/tree-vect-loop-manip.c  2011-06-22 16:46:34.0 +0100
 +++ gcc/tree-vect-loop-manip.c  2011-07-13 11:12:06.0 +0100
 @@ -2409,13 +2409,13 @@ vect_create_cond_for_alias_checks (loop_
   tree part_cond_expr, length_factor;

   /* Create expression
 -     ((store_ptr_0 + store_segment_length_0)  load_ptr_0)
 -     || (load_ptr_0 + load_segment_length_0)  store_ptr_0))
 +     ((store_ptr_0 + store_segment_length_0) = load_ptr_0)
 +     || (load_ptr_0 + load_segment_length_0) = store_ptr_0))
      
      ...
      
 -     ((store_ptr_n + store_segment_length_n)  load_ptr_n)
 -     || (load_ptr_n + load_segment_length_n)  store_ptr_n))  */
 +     ((store_ptr_n + store_segment_length_n) = load_ptr_n)
 +     || (load_ptr_n + load_segment_length_n) = store_ptr_n))  */

   if (VEC_empty (ddr_p, may_alias_ddrs))
     return;
 @@ -2484,8 +2484,8 @@ vect_create_cond_for_alias_checks (loop_

       part_cond_expr =
        fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
 -         fold_build2 (LT_EXPR, boolean_type_node, seg_a_max, seg_b_min),
 -         fold_build2 (LT_EXPR, boolean_type_node, seg_b_max, seg_a_min));
 +         fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min),
 +         fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min));

       if (*cond_expr)
        *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,



Re: RFA: Tighten vector aliasing check

2011-07-13 Thread Richard Sandiford
Richard Sandiford rdsandif...@googlemail.com writes:
 tree-vect-loop-manip.c assumes there is an alias if:

I meant _unless_.

  ((store_ptr_0 + store_segment_length_0)  load_ptr_0)
  || (load_ptr_0 + load_segment_length_0)  store_ptr_0))

 which means that contiguous arrays are unnecessarily considered to alias.

Richard


Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)

2011-07-13 Thread Diego Novillo
On Wed, Jul 13, 2011 at 03:12, Guozhi Wei car...@google.com wrote:
 Hi

 This patch fixes a testing error on arm backend. It has been tested on both
 x86 and arm target with following commands.

 make check-g++ RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a 
 dg.exp=anon-ns1.C
 make check-g++ RUNTESTFLAGS=dg.exp=anon-ns1.C

Carrot, did you backport this patch with svnmerge.py?


Thanks.  Diego.


Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
 issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

 I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
 when you present it with BIT_ variants as input.

 Well, look into fold-const's fold_binary_loc function and see

  /* ARG0 is the first operand of EXPR, and ARG1 is the second operand.

     First check for cases where an arithmetic operation is applied to a
     compound, conditional, or comparison operation.  Push the arithmetic
     operation inside the compound or conditional to see if any folding
     can then be done.  Convert comparison to conditional for this purpose.
     The also optimizes non-constant cases that used to be done in
     expand_expr.

     Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR,
     one of the operands is a comparison and the other is a comparison, a
     BIT_AND_EXPR with the constant 1, or a truth value.  In that case, the
     code below would make the expression more complex.  Change it to a
     TRUTH_{AND,OR}_EXPR.  Likewise, convert a similar NE_EXPR to
     TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR.  */

  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
       || code == EQ_EXPR || code == NE_EXPR)
       ((truth_value_p (TREE_CODE (arg0))
            (truth_value_p (TREE_CODE (arg1))
               || (TREE_CODE (arg1) == BIT_AND_EXPR
                    integer_onep (TREE_OPERAND (arg1, 1)
          || (truth_value_p (TREE_CODE (arg1))
               (truth_value_p (TREE_CODE (arg0))
                  || (TREE_CODE (arg0) == BIT_AND_EXPR
                       integer_onep (TREE_OPERAND (arg0, 1)))
    {
      tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
                         : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
                         : TRUTH_XOR_EXPR,
                         boolean_type_node,
                         fold_convert_loc (loc, boolean_type_node, arg0),
                         fold_convert_loc (loc, boolean_type_node, arg1));

      if (code == EQ_EXPR)
        tem = invert_truthvalue_loc (loc, tem);

      return fold_convert_loc (loc, type, tem);
    }

 Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands
 are of kind truth.  This is btw the point, why you see that those
 cases are handled.  But as soon as this part is turned off for BIT_-
 IOR/AND, we need to do the folding for 1-bit precision case explicit.

First of all this checks for a quite complex pattern - where do we pass
such complex pattern from the gimple level to fold?  For the EQ/NE_EXPR
case forwprop probably might be able to feed it that, but then how does
it go wrong?  The above could also simply be guarded by !in_gimple_form.

Richard.


Re: Ping: C-family stack check for threads

2011-07-13 Thread Hans-Peter Nilsson
On Sun, 3 Jul 2011, Thomas Klein wrote:
 Ye Joey wrote:
   Thomas,
 
   I think your are working on a very useful feature. I have ARM MCU
   applications running of out stack space and resulting strange
   behaviors silently. I'd like to try your patch and probably give
   further comments

I also think this will be a very useful feature (not just for
threads), and I hope you'll persevere through the review
process. ;)  Not your first patch and you have copyright
assignments in place, so that's covered.

The first thing I see is that you need to fix the issues
regarding the GCC coding standards,
http://gcc.gnu.org/codingconventions.html as that is a hurdle
for reviewers, and you don't want that.  Be very careful.  I
haven't ran contrib/check_GNU_style.sh myself but maybe it'll be
helpful.

The second issue I see is that documentation for the new
patterns is missing, that should go in gcc/doc/md.texi,
somewhere under @node Standard Names.  I can imagine there'll be
a thing or two to tweak regarding them and that best reviewed
through the documentation.

Generally, as much as possible should be general and not
ARM-specific.  If you need helper functions, add them to libgcc.



 Regards
   Thomas Klein

 gcc/ChangeLog
 2011-07-03  Thomas Kleinth.r.kl...@web.de  mailto:th.r.kl...@web.de

 * opts.c (common_handle_option): introduce additional stack checking
 parameters direct and indirect
 * flag-types.h (enum stack_check_type): Likewise
 * explow.c (allocate_dynamic_stack_space):
 - suppress stack probing if parameter direct, indirect or if a
 stack-limit is given
 - do additional read of limit value if parameter indirect and a
 stack-limit symbol is given
 - emit a call to a stack_failure function [as an alternative to a trap
 call]

No bullet list in the changelog, please.  Individual sentences.
Follow the existing format; full sentences with capitalization
and all that.

brgds, H-P


Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-13 Thread Kai Tietz
2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote:
 Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
 issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

 I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
 when you present it with BIT_ variants as input.

 Well, look into fold-const's fold_binary_loc function and see

  /* ARG0 is the first operand of EXPR, and ARG1 is the second operand.

     First check for cases where an arithmetic operation is applied to a
     compound, conditional, or comparison operation.  Push the arithmetic
     operation inside the compound or conditional to see if any folding
     can then be done.  Convert comparison to conditional for this purpose.
     The also optimizes non-constant cases that used to be done in
     expand_expr.

     Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR,
     one of the operands is a comparison and the other is a comparison, a
     BIT_AND_EXPR with the constant 1, or a truth value.  In that case, the
     code below would make the expression more complex.  Change it to a
     TRUTH_{AND,OR}_EXPR.  Likewise, convert a similar NE_EXPR to
     TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR.  */

  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
       || code == EQ_EXPR || code == NE_EXPR)
       ((truth_value_p (TREE_CODE (arg0))
            (truth_value_p (TREE_CODE (arg1))
               || (TREE_CODE (arg1) == BIT_AND_EXPR
                    integer_onep (TREE_OPERAND (arg1, 1)
          || (truth_value_p (TREE_CODE (arg1))
               (truth_value_p (TREE_CODE (arg0))
                  || (TREE_CODE (arg0) == BIT_AND_EXPR
                       integer_onep (TREE_OPERAND (arg0, 1)))
    {
      tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
                         : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
                         : TRUTH_XOR_EXPR,
                         boolean_type_node,
                         fold_convert_loc (loc, boolean_type_node, arg0),
                         fold_convert_loc (loc, boolean_type_node, arg1));

      if (code == EQ_EXPR)
        tem = invert_truthvalue_loc (loc, tem);

      return fold_convert_loc (loc, type, tem);
    }

 Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands
 are of kind truth.  This is btw the point, why you see that those
 cases are handled.  But as soon as this part is turned off for BIT_-
 IOR/AND, we need to do the folding for 1-bit precision case explicit.

 First of all this checks for a quite complex pattern - where do we pass
 such complex pattern from the gimple level to fold?  For the EQ/NE_EXPR
 case forwprop probably might be able to feed it that, but then how does
 it go wrong?  The above could also simply be guarded by !in_gimple_form.

 Richard.

See reassoc pass as example and this hacky maybe_fold_and_comparisons
/ maybe_fold_or_comparisons functions.  As indeed we want still be
able to do comparison foldings without getting back an TRUTH-op.
Additionally we have a lot of passes - like vectorizer - which are
happily try to build new condition on tree-level.  This is another
place I saw issues and tree-cfg failures. And last but not least those
truth-ops might be reintroduced in gimple_fold, as soon as we see
bitwise-ops on one-bit precision integral type as truth_value.

Kai


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread William J. Schmidt
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote:
 Ilya, thanks for posting this!  This patch is useful also on powerpc64.
 Applying it solved a performance degradation with bwaves due to loss of
 reassociation somewhere between 4.5 and 4.6 (still tracking it down).
 When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
 generation returns.

On further investigation, this is improving the code generation but not
reverting all of the performance loss.  We'll open a bug on this one
once we have it narrowed down a little further.

Bill
 
 Bill
 




Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)

2011-07-13 Thread H.J. Lu
On Wed, Jul 13, 2011 at 1:28 AM, Richard Guenther rguent...@suse.de wrote:
 On Tue, 12 Jul 2011, Ulrich Weigand wrote:

 Richard Guenther wrote:

  2011-07-11  Richard Guenther  rguent...@suse.de
 
      * tree-vrp.c (simplify_conversion_using_ranges): Manually
      translate the source value-range through the conversion chain.

 This causes a build failure in cachemgr.c on spu-elf.  A slightly
 modified simplified test case also fails on i386-linux:

 void *
 test (unsigned long long x, unsigned long long y)
 {
   return (void *) (unsigned int) (x / y);
 }

 compiled with -O2 results in:

 test.i: In function 'test':
 test.i:3:1: error: invalid types in nop conversion
 void *
 long long unsigned int
 D.1962_5 = (void *) D.1963_3;

 test.i:3:1: internal compiler error: verify_gimple failed

 Any thoughts?

 Fix in testing.

 Richard.

 2011-07-13  Richard Guenther  rguent...@suse.de

        * tree-vrp.c (simplify_conversion_using_ranges): Make sure
        the final type is integral.

        * gcc.dg/torture/20110713-1.c: New testcase.

 Index: gcc/tree-vrp.c
 ===
 --- gcc/tree-vrp.c      (revision 176224)
 +++ gcc/tree-vrp.c      (working copy)
 @@ -7353,6 +7353,8 @@ simplify_conversion_using_ranges (gimple
   double_int innermin, innermax, middlemin, middlemax;

   finaltype = TREE_TYPE (gimple_assign_lhs (stmt));
 +  if (!INTEGRAL_TYPE_P (finaltype))
 +    return false;
   middleop = gimple_assign_rhs1 (stmt);
   def_stmt = SSA_NAME_DEF_STMT (middleop);
   if (!is_gimple_assign (def_stmt)
 Index: gcc/testsuite/gcc.dg/torture/20110713-1.c
 ===
 --- gcc/testsuite/gcc.dg/torture/20110713-1.c   (revision 0)
 +++ gcc/testsuite/gcc.dg/torture/20110713-1.c   (revision 0)
 @@ -0,0 +1,8 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target ilp32 } */
 +
 +void *
 +test (unsigned long long x, unsigned long long y)
 +{
 +    return (void *) (unsigned int) (x / y);
 +}


This also fixed:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49731


-- 
H.J.


[Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)

2011-07-13 Thread Georg-Johann Lay
This is a patch to fix PR49487.

As Denis will be off-line for some time, it'd be great if
a global reviewer would review it.  It appears that he is
the only AVR maintainer who approves patches.

The reason for the ICE is as explained in the PR:

Rotate pattern use X as constraint for an operand which is
used as scratch.  However, the operand is a match_operand
and not a match_scratch.

Because the scratch is not needed in all situations, I choose
to use match_scratch instead of match_operand and not to fix
the constraints.  Fixing constraints would lead to superfluous
allocation of register if no scratch was needed.

Tested with 2 FAILs less: gcc.c-torture/compile/pr46883.c
ICEs without this patch and passes with it.

The test case in the PR passes, too. That test case
passes also the current unpatched 4.7, but it's obvious that
the constraint/operand combination is a bug.

Ok to commit and back-port to 4.6?

Johann

PR target/49487
* config/avr/avr.md (rotlmode3): Generate SCRATCH instead
of REG.
(*rotwmode): Use const_int_operand for operands2.
Use match_scatch for operands3.
(*rotbmode): Ditto
* config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH.


[PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-13 Thread Andreas Krebbel
Hi,

the widening_mul pass might increase the number of multiplications in
the code by transforming

a = b * c
d = a + 2
e = a + 3

into:

d = b * c + 2
e = b * c + 3

under the assumption that an FMA instruction is not more expensive
than a simple add.  This certainly isn't always true.  While e.g. on
s390 an fma is indeed not slower than an add execution-wise it has
disadvantages regarding instruction grouping.  It doesn't group with
any other instruction what has a major impact on the instruction
dispatch bandwidth.

The following patch tries to figure out the costs for adds, mults and
fmas by building an RTX and asking the backends cost function in order
to estimate whether it is whorthwhile doing the transformation.

With that patch the 436.cactus hotloop contains 28 less
multiplications than before increasing performance slightly (~2%).

Bootstrapped and regtested on x86_64 and s390x.

Bye,

-Andreas-

2011-07-13  Andreas Krebbel  andreas.kreb...@de.ibm.com

* tree-ssa-math-opts.c (compute_costs): New function.
(convert_mult_to_fma): Take costs into account when propagating
multiplications into several additions.
* config/s390/s390.c (z196_costs): Adjust costs for madbr and
maebr.


Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c.orig
--- gcc/tree-ssa-math-opts.c
*** convert_plusminus_to_widen (gimple_stmt_
*** 2185,2190 
--- 2185,2252 
return true;
  }
  
+ /* Computing the costs for calculating RTX with CODE in MODE.  */
+ 
+ static unsigned
+ compute_costs (enum machine_mode mode, enum rtx_code code, bool speed)
+ {
+   rtx seq;
+   rtx set;
+   unsigned cost = 0;
+ 
+   start_sequence ();
+ 
+   switch (GET_RTX_LENGTH (code))
+ {
+ case 2:
+   force_operand (gen_rtx_fmt_ee (code, mode,
+  gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+  gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
+NULL_RTX);
+   break;
+ case 3:
+   /* FMA expressions are not handled by force_operand.  */
+   expand_ternary_op (mode, fma_optab,
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2),
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3),
+NULL_RTX, false);
+   break;
+ default:
+   gcc_unreachable ();
+ }
+ 
+   seq = get_insns ();
+   end_sequence ();
+ 
+   if (dump_file  (dump_flags  TDF_DETAILS))
+ {
+   fprintf (dump_file, Calculating costs of %s in %s mode.  Sequence 
is:\n,
+  GET_RTX_NAME (code), GET_MODE_NAME (mode));
+   print_rtl (dump_file, seq);
+ }
+ 
+   for (; seq; seq = NEXT_INSN (seq))
+ {
+   set = single_set (seq);
+   if (set)
+   cost += rtx_cost (set, SET, speed);
+   else
+   cost++;
+ }
+ 
+   /* If the backend returns a cost of zero it is most certainly lying.
+  Set this to one in order to notice that we already calculated it
+  once.  */
+   cost = cost ? cost : 1;
+ 
+   if (dump_file  (dump_flags  TDF_DETAILS))
+ fprintf (dump_file, %s in %s costs %d\n\n,
+  GET_RTX_NAME (code), GET_MODE_NAME (mode), cost);
+ 
+   return cost;
+ }
+ 
  /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
 with uses in additions and subtractions to form fused multiply-add
 operations.  Returns true if successful and MUL_STMT should be removed.  */
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2197,2202 
--- 2259,2270 
gimple use_stmt, neguse_stmt, fma_stmt;
use_operand_p use_p;
imm_use_iterator imm_iter;
+   enum machine_mode mode;
+   int uses = 0;
+   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
+   static unsigned mul_cost[NUM_MACHINE_MODES];
+   static unsigned add_cost[NUM_MACHINE_MODES];
+   static unsigned fma_cost[NUM_MACHINE_MODES];
  
if (FLOAT_TYPE_P (type)
 flag_fp_contract_mode == FP_CONTRACT_OFF)
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2213, 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
/* Make sure that the multiplication statement becomes dead after
!  the transformation, thus that all uses are transformed to FMAs.
!  This means we assume that an FMA operation has the same cost
!  as an addition.  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
  {
enum tree_code use_code;
--- 2281,2297 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
+   mode = TYPE_MODE (type);
+ 
+   if (!fma_cost[mode])
+ {
+   fma_cost[mode] = compute_costs (mode, FMA, speed);
+   add_cost[mode] = compute_costs (mode, PLUS, speed);
+   mul_cost[mode] = compute_costs (mode, MULT, speed);

Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread H.J. Lu
PING.

On Sun, Jul 10, 2011 at 12:43 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Jul 10, 2011 at 7:32 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sat, Jul 9, 2011 at 11:28 PM, H.J. Lu hongjiu...@intel.com wrote:

 X32 psABI requires promoting pointers to Pmode when passing/returning
 in registers.  OK for trunk?

 Thanks.

 H.J.
 --
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_promote_function_mode): New.
        (TARGET_PROMOTE_FUNCTION_MODE): Likewise.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 04cb07d..c852719 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -7052,6 +7061,23 @@ ix86_function_value (const_tree valtype, const_tree 
 fntype_or_decl,
   return ix86_function_value_1 (valtype, fntype_or_decl, orig_mode, mode);
  }

 +/* Pointer function arguments and return values are promoted to
 +   Pmode.  */
 +
 +static enum machine_mode
 +ix86_promote_function_mode (const_tree type, enum machine_mode mode,
 +                           int *punsignedp, const_tree fntype,
 +                           int for_return)
 +{
 +  if (for_return != 1  type != NULL_TREE  POINTER_TYPE_P (type))
 +    {
 +      *punsignedp = POINTERS_EXTEND_UNSIGNED;
 +      return Pmode;
 +    }
 +  return default_promote_function_mode (type, mode, punsignedp, fntype,
 +                                       for_return);
 +}

 Please rewrite the condition to:

 if (for_return == 1)
  /* Do not promote function return values.  */
  ;
 else if (type != NULL_TREE  ...)

 Also, please add some comments.

 Your comment also says that pointer return arguments are promoted to
 Pmode. The documentation says that:

     FOR_RETURN allows to distinguish the promotion of arguments and
     return values.  If it is `1', a return value is being promoted and
     `TARGET_FUNCTION_VALUE' must perform the same promotions done here.
     If it is `2', the returned mode should be that of the register in
     which an incoming parameter is copied, or the outgoing result is
     computed; then the hook should return the same mode as
     `promote_mode', though the signedness may be different.

 You bypass promotions when FOR_RETURN is 1.

 Uros.


 Here is the updated patch. OK for trunk?

 Thanks.

 --
 H.J.
 --
 2011-07-10  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_promote_function_mode): New.
        (TARGET_PROMOTE_FUNCTION_MODE): Likewise.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 04cb07d..1b02312 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -7052,6 +7061,27 @@ ix86_function_value (const_tree valtype,
 const_tree fntype_or_decl,
   return ix86_function_value_1 (valtype, fntype_or_decl, orig_mode, mode);
  }

 +/* Pointer function arguments and return values are promoted to Pmode.
 +   If FOR_RETURN is 1, this function must behave in the same way with
 +   regard to function returns as TARGET_FUNCTION_VALUE.  */
 +
 +static enum machine_mode
 +ix86_promote_function_mode (const_tree type, enum machine_mode mode,
 +                           int *punsignedp, const_tree fntype,
 +                           int for_return)
 +{
 +  if (for_return == 1)
 +    /* Do not promote function return values.  */
 +    ;
 +  else if (type != NULL_TREE  POINTER_TYPE_P (type))
 +    {
 +      *punsignedp = POINTERS_EXTEND_UNSIGNED;
 +      return Pmode;
 +    }
 +  return default_promote_function_mode (type, mode, punsignedp, fntype,
 +                                       for_return);
 +}
 +
  rtx
  ix86_libcall_value (enum machine_mode mode)
  {
 @@ -34810,6 +35157,9 @@ ix86_autovectorize_vector_sizes (void)
  #undef TARGET_FUNCTION_VALUE_REGNO_P
  #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p

 +#undef TARGET_PROMOTE_FUNCTION_MODE
 +#define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
 +
  #undef TARGET_SECONDARY_RELOAD
  #define TARGET_SECONDARY_RELOAD ix86_secondary_reload




-- 
H.J.


Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread Uros Bizjak
On Wed, Jul 13, 2011 at 3:17 PM, H.J. Lu hjl.to...@gmail.com wrote:
 PING.

 2011-07-10  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_promote_function_mode): New.
        (TARGET_PROMOTE_FUNCTION_MODE): Likewise.

You have discussed this with rth, the final approval should be from him.

Uros.


Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)

2011-07-13 Thread Ulrich Weigand
Richard Guenther wrote:

 2011-07-13  Richard Guenther  rguent...@suse.de
 
   * tree-vrp.c (simplify_conversion_using_ranges): Make sure
   the final type is integral.

This fixes the spu-elf build failure.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [build] Move libgcov support to toplevel libgcc

2011-07-13 Thread Rainer Orth
Jan,

 I would also preffer libgcov to go into its own toplevel directory,
 especially because there are plans to add non-stdlib i/o into it i.e.
 for kernel profiling.  that way it would be handy to have libgcov
 as a toplevel library with its own configure that allows it to be build
 independently of rest of GCC.

I'm probably not going to try that.  There's so much cleanup possible in
the toplevel libgcc move as is that will keep me busy for some time
(provided that I can testing and approval for the parts I can't easily
test myself ;-).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-13 Thread Jason Merrill

On 07/12/2011 07:46 PM, Jakub Jelinek wrote:

The aim is to include just (or primarily) code generation affecting options
explicitly passed on the command line.  So that the merging actually works,
options or arguments which include filenames or paths shouldn't be added,
on Roland's request -D*/-U* options aren't added either (that should be
covered by .debug_macinfo)


...but only with -g3.


Ideally we'd just include explicitly passed options from command line that
haven't been overridden by other command line options, and would sort them,
so that there are higher chances of DW_AT_producer strings being merged
(e.g. -O2 -ffast-math vs. -ffast-math -O2 are now different strings, and
similarly -O2 vs. -O3 -O2 vs. -O0 -O1 -Ofast -O2), but I'm not sure if it is
easily possible using current option handling framework.


Why not?  Sorting sounds pretty straightforward to me, though you might 
want to copy the array first.


On the other hand, it probably isn't worthwhile; presumably most 
relocatables being linked together will share the same CFLAGS, so you'll 
get a high degree of merging without any sorting.



--- gcc/testsuite/lib/dg-pch.exp.jj 2011-01-03 18:58:03.0 +0100
+++ gcc/testsuite/lib/dg-pch.exp2011-07-12 23:13:50.943670171 +0200
-   dg-test -keep-output ./$bname$suffix $otherflags $flags 
+   dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags 
$flags 


Why is this necessary?

Jason


Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread H.J. Lu
Hi Richard,

Is my patch OK?

Thanks.


H.J.

On Sun, Jul 10, 2011 at 6:14 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Jul 10, 2011 at 5:48 PM, Richard Henderson r...@redhat.com wrote:
 On 07/10/2011 03:01 PM, H.J. Lu wrote:
 We only want to promote function parameters passed/returned in register.
 But  I can't tell if a value will be passed in register or memory inside of
 TARGET_FUNCTION_VALUE.  So when FOR_RETURN is 1, we don't
 promote. Even if we don't promote it explicitly, hardware still zero-extends
 it for us. So it isn't a real issue.

 The hardware *usually* zero-extends.  It all depends on where the data is
 coming from.  Without certainty, i.e. actually representing it properly,
 you're designing a broken ABI.

 What you wrote above re T_F_V not being able to tell register or
 memory doesn't make sense.  Did you really mean inside 
 TARGET_PROMOTE_FUNCTION_MODE?
 And why exactly wouldn't you be able to tell there?  Can you not find out
 via a call to ix86_return_in_memory?


 TARGET_PROMOTE_FUNCTION_MODE is for passing/returning
 value in a register and the documentation says that:

    FOR_RETURN allows to distinguish the promotion of arguments and
    return values.  If it is `1', a return value is being promoted and
    `TARGET_FUNCTION_VALUE' must perform the same promotions done here.
    If it is `2', the returned mode should be that of the register in
    which an incoming parameter is copied, or the outgoing result is
    computed; then the hook should return the same mode as
    `promote_mode', though the signedness may be different.


 But for TARGET_FUNCTION_VALUE, there is no difference for
 register and memory.  That is why I don't promote when
 FOR_RETURN is 1.  mmix/mmix.c and rx/rx.c have similar
 treatment.



Re: [ARM] Fix PR49641

2011-07-13 Thread Richard Earnshaw
On 07/07/11 21:02, Bernd Schmidt wrote:
 This corrects an error in store_multiple_operation. We're only
 generating the writeback version of the instruction on Thumb-1, so
 that's where we must make sure the base register isn't also stored.
 
 The ARMv7 manual is unfortunately not totally clear that this does in
 fact produce unpredictable results; it seems to suggest that this is the
 case only for the T2 encoding. Older documentation makes it clear.
 
 Tested on arm-eabi{,mthumb}.
 

I agree that the wording here is unclear, but the pseudo code for the
decode makes the situation clearer, and does reflect what I really
believe to be the case.  Put explicitly:

For LDM:

- Encoding A1: Unpredictable if writeback and base in list (I believe
this is true for all architecture versions, despite what it says in the
current ARM ARM -- at least, my v5 copy certainly says unpredictable)

- Encoding T1: Not unpredictable, but deprecated (for base in list, the
loaded value used and writeback ignored).  Note, however, that in UAL
the ! operator on the base register must not be used if the base
register appears in the list.

- Encoding T2: Unpredictable if writeback and base in list



For STM:

- Encoding T2: Unpredictable if writeback and base in list regardless of
the position.

- Encodings T1 and A1: Unpredictable if writeback and base in list and
not lowest numbered register (note that encoding T1 always has
writeback).  In the case where the base is the first register in the
list, then the original value of base will be stored; deprecated.

This is all quite complicated, I hope I've expressed it correctly... :-)

R.

 
 Bernd
 
 
 pr49641.diff
 
 
   * config/arm/arm.c (store_multiple_sequence): Avoid cases where
   the base reg is stored iff compiling for Thumb1.
 
   * gcc.target/arm/pr49641.c: New test.
 
 Index: gcc/config/arm/arm.c
 ===
 --- gcc/config/arm/arm.c  (revision 175906)
 +++ gcc/config/arm/arm.c  (working copy)
 @@ -9950,7 +9950,10 @@ store_multiple_sequence (rtx *operands,
 /* If it isn't an integer register, then we can't do this.  */
 if (unsorted_regs[i]  0
 || (TARGET_THUMB1  unsorted_regs[i]  LAST_LO_REGNUM)
 -   || (TARGET_THUMB2  unsorted_regs[i] == base_reg)
 +   /* For Thumb1, we'll generate an instruction with update,
 +  and the effects are unpredictable if the base reg is
 +  stored.  */
 +   || (TARGET_THUMB1  unsorted_regs[i] == base_reg)
 || (TARGET_THUMB2  unsorted_regs[i] == SP_REGNUM)
 || unsorted_regs[i]  14)
   return 0;
 Index: gcc/testsuite/gcc.target/arm/pr49641.c
 ===
 --- gcc/testsuite/gcc.target/arm/pr49641.c(revision 0)
 +++ gcc/testsuite/gcc.target/arm/pr49641.c(revision 0)
 @@ -0,0 +1,18 @@
 +/* { dg-do compile } */
 +/* { dg-options -mthumb -O2 } */
 +/* { dg-require-effective-target arm_thumb1_ok } */
 +/* { dg-final { scan-assembler-not stmia\[\\t \]*r3!\[^\\n]*r3 } } */
 +typedef struct {
 +  void *t1, *t2, *t3;
 +} z;
 +extern volatile int y;
 +static inline void foo(z *x) {
 +  x-t1 = x-t2;
 +  x-t2 = ((void *)0);
 +  x-t3 = x-t1;
 +}
 +extern z v;
 +void bar (void) {
 +   y = 0;
 +   foo(v);
 +}




Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-13 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 09:56:58AM -0400, Jason Merrill wrote:
 On 07/12/2011 07:46 PM, Jakub Jelinek wrote:
 The aim is to include just (or primarily) code generation affecting options
 explicitly passed on the command line.  So that the merging actually works,
 options or arguments which include filenames or paths shouldn't be added,
 on Roland's request -D*/-U* options aren't added either (that should be
 covered by .debug_macinfo)
 
 ...but only with -g3.

Sure.  But if we put -D*/-U* into DW_AT_producer, -D_FORTIFY_SOURCE=2
on the command line acts the same as #define _FORTIFY_SOURCE 2
before including the first header and the latter wouldn't be recorded.
I'm working on smaller .debug_macinfo right now.

 Ideally we'd just include explicitly passed options from command line that
 haven't been overridden by other command line options, and would sort them,
 so that there are higher chances of DW_AT_producer strings being merged
 (e.g. -O2 -ffast-math vs. -ffast-math -O2 are now different strings, and
 similarly -O2 vs. -O3 -O2 vs. -O0 -O1 -Ofast -O2), but I'm not sure if it is
 easily possible using current option handling framework.
 
 Why not?  Sorting sounds pretty straightforward to me, though you
 might want to copy the array first.

If the command line options contain options that override each other, then
sorting would drop important information what comes last and thus overrides
other options.  If we would have only options which weren't overridden,
we could sort.  Otherwise -O2 -O0 would be sorted as -O0 -O2 and we'd think
the code was optimized when it wasn't.

 On the other hand, it probably isn't worthwhile; presumably most
 relocatables being linked together will share the same CFLAGS, so
 you'll get a high degree of merging without any sorting.
 
 --- gcc/testsuite/lib/dg-pch.exp.jj  2011-01-03 18:58:03.0 +0100
 +++ gcc/testsuite/lib/dg-pch.exp 2011-07-12 23:13:50.943670171 +0200
 -dg-test -keep-output ./$bname$suffix $otherflags $flags 
 +dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches 
 $otherflags $flags 
 
 Why is this necessary?

It is only necessary if somebody wants to make -grecord-gcc-switches
the default (for bootstrap/regtest I've tweaked common.opt to do that
to test it better).  PCH is a big mess and screws debuginfo in many ways,
in this case it was just small differences in DW_AT_producer, but
we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc.

Jakub


Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-13 Thread Jason Merrill

On 07/13/2011 10:06 AM, Jakub Jelinek wrote:

--- gcc/testsuite/lib/dg-pch.exp.jj 2011-01-03 18:58:03.0 +0100
+++ gcc/testsuite/lib/dg-pch.exp2011-07-12 23:13:50.943670171 +0200
-   dg-test -keep-output ./$bname$suffix $otherflags $flags 
+   dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags 
$flags 



It is only necessary if somebody wants to make -grecord-gcc-switches
the default (for bootstrap/regtest I've tweaked common.opt to do that
to test it better).  PCH is a big mess and screws debuginfo in many ways,
in this case it was just small differences in DW_AT_producer, but
we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc.


Why would PCH change DW_AT_producer?  Because we're restoring 
single_comp_unit_die from the PCH?  Then perhaps we should set 
DW_AT_producer in output_comp_unit rather than gen_compile_unit_die.


Jason


Re: [build] Remove crt0, mcrt0 support

2011-07-13 Thread Rainer Orth
Jan,

 Rainer Orth r...@cebitec.uni-bielefeld.de 07/12/11 6:46 PM 
On the other hand, maybe it's time to obsolete or even immediately
remove the netware port: there is no listed maintainer, no testsuite
results at least back to 2007 (if any were ever posted), and the only
netware-related change that hasn't been part of general cleanup is
almost two years ago.

 That would be fine with me.

which variant would you prefer: obsoletion now and removal in 4.8 or
immediate removal?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-13 Thread Richard Guenther
On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
kreb...@linux.vnet.ibm.com wrote:
 Hi,

 the widening_mul pass might increase the number of multiplications in
 the code by transforming

 a = b * c
 d = a + 2
 e = a + 3

 into:

 d = b * c + 2
 e = b * c + 3

 under the assumption that an FMA instruction is not more expensive
 than a simple add.  This certainly isn't always true.  While e.g. on
 s390 an fma is indeed not slower than an add execution-wise it has
 disadvantages regarding instruction grouping.  It doesn't group with
 any other instruction what has a major impact on the instruction
 dispatch bandwidth.

 The following patch tries to figure out the costs for adds, mults and
 fmas by building an RTX and asking the backends cost function in order
 to estimate whether it is whorthwhile doing the transformation.

 With that patch the 436.cactus hotloop contains 28 less
 multiplications than before increasing performance slightly (~2%).

 Bootstrapped and regtested on x86_64 and s390x.

Ick ;)

Maybe this is finally the time to introduce target hook(s) to
get us back costs for trees?  For this case we'd need two
actually, or just one - dependent on what finegrained information
we pass.  Choices:

  tree_code_cost (enum tree_code)
  tree_code_cost (enum tree_code, enum machine_mode mode)
  unary_cost (enum tree_code, tree actual_arg0) // args will be mostly
SSA names or constants, but at least they are typed - works for
mixed-typed operations
  binary_cost (...)
  ...
  unary_cost (enum tree_code, enum tree_code arg0_kind) // constant
vs. non-constant arg, but lacks type/mode

Richard.

 Bye,

 -Andreas-

 2011-07-13  Andreas Krebbel  andreas.kreb...@de.ibm.com

        * tree-ssa-math-opts.c (compute_costs): New function.
        (convert_mult_to_fma): Take costs into account when propagating
        multiplications into several additions.
        * config/s390/s390.c (z196_costs): Adjust costs for madbr and
        maebr.


 Index: gcc/tree-ssa-math-opts.c
 ===
 *** gcc/tree-ssa-math-opts.c.orig
 --- gcc/tree-ssa-math-opts.c
 *** convert_plusminus_to_widen (gimple_stmt_
 *** 2185,2190 
 --- 2185,2252 
    return true;
  }

 + /* Computing the costs for calculating RTX with CODE in MODE.  */
 +
 + static unsigned
 + compute_costs (enum machine_mode mode, enum rtx_code code, bool speed)
 + {
 +   rtx seq;
 +   rtx set;
 +   unsigned cost = 0;
 +
 +   start_sequence ();
 +
 +   switch (GET_RTX_LENGTH (code))
 +     {
 +     case 2:
 +       force_operand (gen_rtx_fmt_ee (code, mode,
 +                      gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
 +                      gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
 +                    NULL_RTX);
 +       break;
 +     case 3:
 +       /* FMA expressions are not handled by force_operand.  */
 +       expand_ternary_op (mode, fma_optab,
 +                        gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
 +                        gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2),
 +                        gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3),
 +                        NULL_RTX, false);
 +       break;
 +     default:
 +       gcc_unreachable ();
 +     }
 +
 +   seq = get_insns ();
 +   end_sequence ();
 +
 +   if (dump_file  (dump_flags  TDF_DETAILS))
 +     {
 +       fprintf (dump_file, Calculating costs of %s in %s mode.  Sequence 
 is:\n,
 +              GET_RTX_NAME (code), GET_MODE_NAME (mode));
 +       print_rtl (dump_file, seq);
 +     }
 +
 +   for (; seq; seq = NEXT_INSN (seq))
 +     {
 +       set = single_set (seq);
 +       if (set)
 +       cost += rtx_cost (set, SET, speed);
 +       else
 +       cost++;
 +     }
 +
 +   /* If the backend returns a cost of zero it is most certainly lying.
 +      Set this to one in order to notice that we already calculated it
 +      once.  */
 +   cost = cost ? cost : 1;
 +
 +   if (dump_file  (dump_flags  TDF_DETAILS))
 +     fprintf (dump_file, %s in %s costs %d\n\n,
 +              GET_RTX_NAME (code), GET_MODE_NAME (mode), cost);
 +
 +   return cost;
 + }
 +
  /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
     with uses in additions and subtractions to form fused multiply-add
     operations.  Returns true if successful and MUL_STMT should be removed.  
 */
 *** convert_mult_to_fma (gimple mul_stmt, tr
 *** 2197,2202 
 --- 2259,2270 
    gimple use_stmt, neguse_stmt, fma_stmt;
    use_operand_p use_p;
    imm_use_iterator imm_iter;
 +   enum machine_mode mode;
 +   int uses = 0;
 +   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
 +   static unsigned mul_cost[NUM_MACHINE_MODES];
 +   static unsigned add_cost[NUM_MACHINE_MODES];
 +   static unsigned fma_cost[NUM_MACHINE_MODES];

    if (FLOAT_TYPE_P (type)
         flag_fp_contract_mode == FP_CONTRACT_OFF)
 *** convert_mult_to_fma (gimple mul_stmt, tr
 *** 2213, 
 

PING: PATCH [4/n] X32: Use ptr_mode for vtable adjustment

2011-07-13 Thread H.J. Lu
Hi Richard, Uros,

Is this patch OK?

Thanks.

H.J.
---
On Sun, Jul 10, 2011 at 6:47 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Jul 9, 2011 at 3:58 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Jul 9, 2011 at 3:43 PM, Richard Henderson r...@redhat.com wrote:
 On 07/09/2011 02:36 PM, H.J. Lu wrote:

 Hi,

 Thunk is in ptr_mode, not Pmode.  OK for trunk?

 Thanks.

 H.J.
 ---
 2011-07-09  H.J. Lu  hongjiu...@intel.com

       * config/i386/i386.c (x86_output_mi_thunk): Use ptr_mode instead
       of Pmode for vtable adjustment.

 Not ok.  This is incoherent in its treatment of Pmode vs ptr_mode.
 You're creating an addition

        (plus:P (reg:ptr tmp) (reg:P tmp2))

 It is because thunk is stored in ptr_mode, not Pmode.


 I have a queued patch that replaces all of this with rtl.  I will
 post it later today.


 I will update it for x32 after your change is checked in.


 I am testing this updated patch.  OK for trunk if it works?

 Thanks.


 --
 H.J.
 ---
 2011-07-10  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (x86_output_mi_thunk): Support ptr_mode
        != Pmode.

        * config/i386/i386.md (*addsi_1_zext): Renamed to ...
        (addsi_1_zext): This.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index a46101b..d6744be 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -29346,7 +29673,7 @@ x86_output_mi_thunk (FILE *file,
   /* Adjust the this parameter by a value stored in the vtable.  */
   if (vcall_offset)
     {
 -      rtx vcall_addr, vcall_mem;
 +      rtx vcall_addr, vcall_mem, this_mem;
       unsigned int tmp_regno;

       if (TARGET_64BIT)
 @@ -29361,7 +29688,10 @@ x86_output_mi_thunk (FILE *file,
        }
       tmp = gen_rtx_REG (Pmode, tmp_regno);

 -      emit_move_insn (tmp, gen_rtx_MEM (ptr_mode, this_reg));
 +      this_mem = gen_rtx_MEM (ptr_mode, this_reg);
 +      if (Pmode == DImode  ptr_mode == SImode)
 +       this_mem = gen_rtx_ZERO_EXTEND (DImode, this_mem);
 +      emit_move_insn (tmp, this_mem);

       /* Adjust the this parameter.  */
       vcall_addr = plus_constant (tmp, vcall_offset);
 @@ -29373,8 +29703,13 @@ x86_output_mi_thunk (FILE *file,
          vcall_addr = gen_rtx_PLUS (Pmode, tmp, tmp2);
        }

 -      vcall_mem = gen_rtx_MEM (Pmode, vcall_addr);
 -      emit_insn (ix86_gen_add3 (this_reg, this_reg, vcall_mem));
 +      vcall_mem = gen_rtx_MEM (ptr_mode, vcall_addr);
 +      if (Pmode == DImode  ptr_mode == SImode)
 +       emit_insn (gen_addsi_1_zext (this_reg,
 +                                    gen_rtx_REG (SImode, REGNO (this_reg)),
 +                                    vcall_mem));
 +      else
 +       emit_insn (ix86_gen_add3 (this_reg, this_reg, vcall_mem));
     }

   /* If necessary, drop THIS back to its stack slot.  */
 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index a52941b..3136fd0 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -5508,11 +5574,11 @@
  ;; operands so proper swapping will be done in reload.  This allow
  ;; patterns constructed from addsi_1 to match.

 -(define_insn *addsi_1_zext
 +(define_insn addsi_1_zext
   [(set (match_operand:DI 0 register_operand =r,r,r)
        (zero_extend:DI
          (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r,r)
                   (match_operand:SI 2 general_operand g,0,li
    (clobber (reg:CC FLAGS_REG))]
   TARGET_64BIT  ix86_binary_operator_ok (PLUS, SImode, operands)
  {



Re: More mudflap fixes for Solaris 11

2011-07-13 Thread Frank Ch. Eigler
Hi, Rainer -

 When testing libmudflap on Solaris 8, 9, and 10 with GNU ld, I found a
 couple of testsuite failures:
 [...]
 Ok for mainline?

Yes, thank you!


- FChE


Re: Define [CD]TORS_SECTION_ASM_OP on Solaris/x86 with Sun ld

2011-07-13 Thread Frank Ch. Eigler
Hi, Rainer -

On Mon, Jul 11, 2011 at 06:34:27PM +0200, Rainer Orth wrote:
 [...]
 On the other hand, there's the question why tree-mudflap.c tries to
 create a constructor with a non-default priority on a platform with
 SUPPORTS_INIT_PRIORITY == 0 or at all [...]

For the at all part, I believe the intent was to make it more likely
that mudflap-tracked literals be tracked early enough so that other
constructors would find them already available for checking.

- FChE


Re: [build] Remove crt0, mcrt0 support

2011-07-13 Thread Rainer Orth
Jan Beulich jbeul...@novell.com writes:

 Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM 
which variant would you prefer: obsoletion now and removal in 4.8 or
immediate removal?

 Both are fine with me, so unless someone else objects immediate removal
 would seem better given it had been pretty much unmaintained.

Right: it would be a one-time offort to remove the support, but
subsequent cleanups wouldn't have to deal with the effectively dead
code.

I had a quick look and it doesn't seem hard: apart from removing the
netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc
and libgcc/config.host changes), there's only a small list (apart from
netware-related target triplets in the testsuite):

config/elf.m4
configure.ac
contrib/config-list.mk
gcc/config/i386/i386.c
gcc/config/i386/i386.h
gcc/doc/extend.texi
libstdc++-v3/crossconfig.m4

configure.ac may have to stay if binutils/src wants to retain the
report, but that's about it.

Let's see what the release managers/global reviewers think.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-13 Thread Georg-Johann Lay
Richard Guenther wrote:
 On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
 kreb...@linux.vnet.ibm.com wrote:
 Hi,

 the widening_mul pass might increase the number of multiplications in
 the code by transforming

 a = b * c
 d = a + 2
 e = a + 3

 into:

 d = b * c + 2
 e = b * c + 3

 under the assumption that an FMA instruction is not more expensive
 than a simple add.  This certainly isn't always true.  While e.g. on
 s390 an fma is indeed not slower than an add execution-wise it has
 disadvantages regarding instruction grouping.  It doesn't group with
 any other instruction what has a major impact on the instruction
 dispatch bandwidth.

 The following patch tries to figure out the costs for adds, mults and
 fmas by building an RTX and asking the backends cost function in order
 to estimate whether it is whorthwhile doing the transformation.

 With that patch the 436.cactus hotloop contains 28 less
 multiplications than before increasing performance slightly (~2%).

 Bootstrapped and regtested on x86_64 and s390x.
 
 Ick ;)
 
 Maybe this is finally the time to introduce target hook(s) to
 get us back costs for trees?  For this case we'd need two
 actually, or just one - dependent on what finegrained information
 we pass.  Choices:
 
   tree_code_cost (enum tree_code)
   tree_code_cost (enum tree_code, enum machine_mode mode)
   unary_cost (enum tree_code, tree actual_arg0) // args will be mostly
 SSA names or constants, but at least they are typed - works for
 mixed-typed operations
   binary_cost (...)
   ...
   unary_cost (enum tree_code, enum tree_code arg0_kind) // constant
 vs. non-constant arg, but lacks type/mode
 
 Richard.

What's bad with rtx_costs?

Yet another cost function might duplicate cost computation in a backend --
once on trees and once on RTXs.

BTW: For a port I read rtx_costs from insn attributes which helped me to
clean up code in rtx_costs to a great extend.  In particular for a target
with complex instructions which are synthesized by insn combine, rtx_costs
is mostly mechanical and brain-dead retyping of bulk of code that is
already present almost identical in insn-recog.c.

Johann


Re: Define [CD]TORS_SECTION_ASM_OP on Solaris/x86 with Sun ld

2011-07-13 Thread Rainer Orth
Hi Frank,

 On Mon, Jul 11, 2011 at 06:34:27PM +0200, Rainer Orth wrote:
 [...]
 On the other hand, there's the question why tree-mudflap.c tries to
 create a constructor with a non-default priority on a platform with
 SUPPORTS_INIT_PRIORITY == 0 or at all [...]

 For the at all part, I believe the intent was to make it more likely
 that mudflap-tracked literals be tracked early enough so that other
 constructors would find them already available for checking.

I see.  I'm still undecided who's responsibility it is to deal with the
!SUPPORTS_INIT_PRIORITY case.  On one hand one might argue that only the
callers can decide if a non-default priority is strictly required or
just an improvement, OTOH silently ignoring the priority and causing
constructors not to be run at all doesn't seem a winning proposition
either ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread Richard Henderson
On 07/13/2011 07:02 AM, H.J. Lu wrote:
 Hi Richard,
 
 Is my patch OK?

No, I don't think it is.


r~


Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread H.J. Lu
On Wed, Jul 13, 2011 at 8:27 AM, Richard Henderson r...@redhat.com wrote:
 On 07/13/2011 07:02 AM, H.J. Lu wrote:
 Hi Richard,

 Is my patch OK?

 No, I don't think it is.


What is your suggestion?


-- 
H.J.


Re: PING: PATCH [4/n] X32: Use ptr_mode for vtable adjustment

2011-07-13 Thread Richard Henderson
On 07/13/2011 07:39 AM, H.J. Lu wrote:
* config/i386/i386.c (x86_output_mi_thunk): Support ptr_mode
!= Pmode.

* config/i386/i386.md (*addsi_1_zext): Renamed to ...
(addsi_1_zext): This.

Ok, except, 

 +  if (Pmode == DImode  ptr_mode == SImode)

 if (Pmode != ptr_mode)

in two locations.

 +   this_mem = gen_rtx_ZERO_EXTEND (DImode, this_mem);

Pmode

  +gen_rtx_REG (SImode, REGNO (this_reg)),

ptr_mode.


r~


Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread Richard Henderson
On 07/13/2011 08:35 AM, H.J. Lu wrote:
 On Wed, Jul 13, 2011 at 8:27 AM, Richard Henderson r...@redhat.com wrote:
 On 07/13/2011 07:02 AM, H.J. Lu wrote:
 Hi Richard,

 Is my patch OK?

 No, I don't think it is.

 
 What is your suggestion?

Promote the return value.  If that means it doesn't match function_value,
then I suggest that function_value is wrong.


r~


[PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac

2011-07-13 Thread Thomas Schwinge
Hallo!

Diffing the make log of a build of GCC with SHELL not explicitly set
(thus /bin/sh, which is bash) and one with SHELL=/bin/dash, I found the
following unexpected difference:

-checking assembler for eh_frame optimization... yes
+checking assembler for eh_frame optimization... buggy

This is from gcc/configure; which invokes
acinclude.m4:gcc_GAS_CHECK_FEATURE for the ``eh_frame optimization''
check.

Latter case, gcc/config.log:

configure:22282: checking assembler for eh_frame optimization
configure:22327: /usr/bin/as --32  -o conftest.o conftest.s 5
conftest.s: Assembler messages:
conftest.s: Warning: end of file in string; '' inserted
conftest.s:13: Warning: unterminated string; newline inserted

There, the following happens:

$ sh # This is bash.
sh-4.1$ echo '.ascii z\0'
.ascii z\0

This is what GCC expects.  However, with dash:

$ dash
$ echo '.ascii z\0'
.ascii z

The backslash escape and everything after is cut off.

The test in gcc/configure.ac:

gcc_GAS_CHECK_FEATURE(eh_frame optimization, gcc_cv_as_eh_frame,
  [elf,2,12,0],,
[   .text
[...]
.byte   0x1
.ascii z\0
.byte   0x1
[...]

As quickly determined in #gcc with Ian's and Ismail's help, this is
unportable usage of the echo builtin (and also at least questionable for
/bin/echo), so I'm suggesting the following simple fix:

gcc/
* configure.ac (eh_frame optimization): Avoid unportable shell feature.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index c2163bf..73f0209 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2538,7 +2538,7 @@ __FRAME_BEGIN__:
 .LSCIE1:
.4byte  0x0
.byte   0x1
-   .ascii z\0
+   .asciz z
.byte   0x1
.byte   0x78
.byte   0x1a

Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the
temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is
doing, for example.


Grüße,
 Thomas


pgpoH48Y3yGDm.pgp
Description: PGP signature


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-13 Thread Paolo Bonzini

On 07/11/2011 05:54 PM, H.J. Lu wrote:

The key is the

 XEXP (x, 1) == convert_memory_address_addr_space
(to_mode, XEXP (x, 1), as)

  test.  It ensures basically that the constant has 31-bit precision, because
  otherwise the constant would change from e.g. (const_int -0x7ffc) to
  (const_int 0x8004) when zero-extending it from SImode to DImode.

  But I'm not sure it's safe.  You have,

(zero_extend:DI (plus:SI FOO:SI) (const_int Y))

  and you want to convert it to

(plus:DI FOO:DI (zero_extend:DI (const_int Y)))

  (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF (this
  piece of code does not assume anything about its shape); if FOO ==
  0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
  0x10004 (invalid).

This example contradicts what you said above It ensures basically that the
constant has 31-bit precision.


Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the 
same representation in SImode and DImode, and the test above on XEXP (x, 
1) succeeds.



  What happens if you just return NULL instead of the assertion (good idea
  adding it!)?

  Of course then you need to:

  1) check the return values of convert_memory_address_addr_space_1, and
  propagate NULL up to simplify_unary_operation;

  2) check in simplify-rtx.c whether the return value of
  convert_memory_address_1 is NULL, and only return if the return value is not
  NULL.  This is not yet necessary (convert_memory_address is the last
  transformation for both SIGN_EXTEND and ZERO_EXTEND) but it is better to
  keep code clean.

I will give it a try.


Thanks, did you get any result?  There's no I think in this code.  So 
even if I cannot approve it, I'd really like to see a version that I 
understand and that is clearly conservative, if it works.


Paolo


[Patch,AVR]: Cleanup readonly_data_section et al.

2011-07-13 Thread Georg-Johann Lay
This patch removes some special treatment from avr/elf.h
which is actually not needed.  The only target supported
by avr is ELF and the defaults for READONLY_DATA_SECTION_ASM_OP,
TARGET_HAVE_SWITCHABLE_BSS_SECTIONS, and TARGET_ASM_SELECT_SECTION
are fine.

Using default for TARGET_ASM_SELECT_SECTION brings the additional
benefit that constant merging is enabled.

AVR is specific because it is Harvard Architecture so that all
constants have to be in .data, i.e. .rodata is part of .data.
This is accomplished by default linker scripts, so there is no
need to set
   readonly_data_section = data_section
in avr_asm_init_sections.

Changes in testsuite run are:

* gcc.dg/debug/dwarf2/dwarf-merge.c: UNSUPPORTED - PASS
* gcc.dg/array-quals-1.c: XFAIL - PASS
* g++.dg/opt/const4.C: FAIL - PASS

There's no avr maintainer approving at the moment,
so approving by global reviewer is much appreciated.

Ok to commit?

Johann


gcc/
* config/avr/elf.h (TARGET_ASM_SELECT_SECTION): Remove,
i.e. use default_elf_select_section.
(TARGET_HAVE_SWITCHABLE_BSS_SECTIONS): Remove.
(READONLY_DATA_SECTION_ASM_OP): Remove.
(TARGET_ASM_NAMED_SECTION): Move from here...
* config/avr/avr.c: ...to here.
(avr_asm_init_sections): Set unnamed callback of
readonly_data_section.
(avr_asm_named_section): Make static.

testsuite/
* gcc.dg/array-quals-1.c: Don't xfail on AVR.


Index: config/avr/elf.h
===
--- config/avr/elf.h	(revision 176136)
+++ config/avr/elf.h	(working copy)
@@ -26,24 +26,12 @@
 #undef PREFERRED_DEBUGGING_TYPE
 #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
 
-#undef TARGET_ASM_NAMED_SECTION
-#define TARGET_ASM_NAMED_SECTION avr_asm_named_section
-
-/* Use lame default: no string merging, ...  */
-#undef TARGET_ASM_SELECT_SECTION
-#define TARGET_ASM_SELECT_SECTION default_select_section
-
 #undef MAX_OFILE_ALIGNMENT
 #define MAX_OFILE_ALIGNMENT (32768 * 8)
 
-#undef TARGET_HAVE_SWITCHABLE_BSS_SECTIONS
-
 #undef STRING_LIMIT
 #define STRING_LIMIT ((unsigned) 64)
 
-/* Setup `readonly_data_section' in `avr_asm_init_sections'.  */
-#undef READONLY_DATA_SECTION_ASM_OP
-
 /* Take care of `signal' and `interrupt' attributes.  */
 #undef ASM_DECLARE_FUNCTION_NAME
 #define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL) \
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 176141)
+++ config/avr/avr.c	(working copy)
@@ -194,8 +194,8 @@ static const struct attribute_spec avr_a
 #undef TARGET_SECTION_TYPE_FLAGS
 #define TARGET_SECTION_TYPE_FLAGS avr_section_type_flags
 
-/* `TARGET_ASM_NAMED_SECTION' must be defined in avr.h.  */
-
+#undef TARGET_ASM_NAMED_SECTION
+#define TARGET_ASM_NAMED_SECTION avr_asm_named_section
 #undef TARGET_ASM_INIT_SECTIONS
 #define TARGET_ASM_INIT_SECTIONS avr_asm_init_sections
 #undef TARGET_ENCODE_SECTION_INFO
@@ -5091,8 +5091,11 @@ avr_asm_init_sections (void)
   progmem_section = get_unnamed_section (AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE,
 	 avr_output_progmem_section_asm_op,
 	 NULL);
-  readonly_data_section = data_section;
 
+  /* Override section callbacks to keep track of `avr_need_clear_bss_p'
+ resp. `avr_need_copy_data_p'.  */
+  
+  readonly_data_section-unnamed.callback = avr_output_data_section_asm_op;
   data_section-unnamed.callback = avr_output_data_section_asm_op;
   bss_section-unnamed.callback = avr_output_bss_section_asm_op;
 }
@@ -5101,7 +5104,7 @@ avr_asm_init_sections (void)
 /* Implement `TARGET_ASM_NAMED_SECTION'.  */
 /* Track need of __do_clear_bss, __do_copy_data for named sections.  */
 
-void
+static void
 avr_asm_named_section (const char *name, unsigned int flags, tree decl)
 {
   if (!avr_need_copy_data_p)
Index: testsuite/gcc.dg/array-quals-1.c
===
--- testsuite/gcc.dg/array-quals-1.c	(revision 176136)
+++ testsuite/gcc.dg/array-quals-1.c	(working copy)
@@ -4,7 +4,7 @@
 /* Origin: Joseph Myers j...@polyomino.org.uk */
 /* { dg-do compile } */
 /* The MMIX port always switches to the .data section at the end of a file.  */
-/* { dg-final { scan-assembler-not \\.data(?!\\.rel\\.ro) { xfail powerpc*-*-aix* mmix-*-* x86_64-*-mingw* picochip--*-* avr-*-*} } } */
+/* { dg-final { scan-assembler-not \\.data(?!\\.rel\\.ro) { xfail powerpc*-*-aix* mmix-*-* x86_64-*-mingw* picochip--*-* } } } */
 static const int a[2] = { 1, 2 };
 const int a1[2] = { 1, 2 };
 typedef const int ci;


Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac

2011-07-13 Thread Paolo Bonzini

On 07/13/2011 06:13 PM, Thomas Schwinge wrote:

Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the
temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is
doing, for example.


Change instead echo ifelse(...)  conftest.s to

  AS_ECHO([m4_if(...)])  conftest.s

in gcc_GAS_CHECK_FEATURE.

Paolo


[PATCH, testsuite]: Use istarget everywhere

2011-07-13 Thread Uros Bizjak
Hello!

Attached patch converts several places where string match or regexp on
$target_triplet is used with istarget.  The patch also removes quotes
around target string.

2011-07-13  Uros Bizjak  ubiz...@gmail.com

* lib/g++.exp (g++_init):  Use istarget.  Remove target_triplet global.
* lib/obj-c++.exp (obj-c++_init): Ditto.
* lib/file-format.exp (gcc_target_object_format): Ditto.
* lib/target-supports-dg.exp (dg-require-dll): Ditto.
* lib/target-supports-dg-exp (check_weak_available): Ditto.
(check_visibility_available): Ditto.
(check_effective_target_tls_native): Ditto.
(check_effective_target_tls_emulated): Ditto.
(check_effective_target_function_sections): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: lib/g++.exp
===
--- lib/g++.exp (revision 176236)
+++ lib/g++.exp (working copy)
@@ -188,7 +188,6 @@
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global GXX_UNDER_TEST
 global TESTING_IN_BUILD_TREE
-global target_triplet
 global gcc_warning_prefix
 global gcc_error_prefix
 
@@ -263,7 +262,7 @@
 set gcc_warning_prefix warning:
 set gcc_error_prefix error:
 
-if { [string match *-*-darwin* $target_triplet] } {
+if { [istarget *-*-darwin*] } {
lappend ALWAYS_CXXFLAGS ldflags=-multiply_defined suppress
}
 
Index: lib/obj-c++.exp
===
--- lib/obj-c++.exp (revision 176236)
+++ lib/obj-c++.exp (working copy)
@@ -210,7 +210,6 @@
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global OBJCXX_UNDER_TEST
 global TESTING_IN_BUILD_TREE
-global target_triplet
 global gcc_warning_prefix
 global gcc_error_prefix
 
@@ -270,7 +269,7 @@
 set gcc_warning_prefix warning:
 set gcc_error_prefix error:
 
-if { [string match *-*-darwin* $target_triplet] } {
+if { [istarget *-*-darwin*] } {
lappend ALWAYS_OBJCXXFLAGS ldflags=-multiply_defined suppress
 }
 
@@ -299,7 +298,7 @@
 # we need to add the include path for the gnu runtime if that is in
 # use.
 # First, set the default...
-if { [istarget *-*-darwin*] } {
+if { [istarget *-*-darwin*] } {
set nextruntime 1
 } else {
set nextruntime 0
Index: lib/scanasm.exp
===
--- lib/scanasm.exp (revision 176236)
+++ lib/scanasm.exp (working copy)
@@ -461,10 +461,10 @@
}
 }
 
-if { [istarget hppa*-*-*] } {
+if { [istarget hppa*-*-*] } {
set pattern [format {\t;[^:]+:%d\n(\t[^\t]+\n)+%s:\n\t.PROC} \
  $line $symbol]
-} elseif { [istarget mips-sgi-irix*] } {
+} elseif { [istarget mips-sgi-irix*] } {
set pattern [format {\t\.loc [0-9]+ %d 0( 
[^\n]*)?\n\t\.set\t(no)?mips16\n\t\.ent\t%s\n\t\.type\t%s, @function\n%s:\n} \
 $line $symbol $symbol $symbol]
 } else {
Index: lib/file-format.exp
===
--- lib/file-format.exp (revision 176236)
+++ lib/file-format.exp (working copy)
@@ -24,17 +24,16 @@
 
 proc gcc_target_object_format { } { 
 global gcc_target_object_format_saved
-global target_triplet
 global tool
 
 if [info exists gcc_target_object_format_saved] {
 verbose gcc_target_object_format returning saved 
$gcc_target_object_format_saved 2
-} elseif { [string match *-*-darwin* $target_triplet] } {
+} elseif { [istarget *-*-darwin*] } {
# Darwin doesn't necessarily have objdump, so hand-code it.
set gcc_target_object_format_saved mach-o
-} elseif { [string match hppa*-*-hpux* $target_triplet] } {
+} elseif { [istarget hppa*-*-hpux*] } {
# HP-UX doesn't necessarily have objdump, so hand-code it.
-   if { [string match hppa*64*-*-hpux* $target_triplet] } {
+   if { [istarget hppa*64*-*-hpux*] } {
  set gcc_target_object_format_saved elf
} else {
  set gcc_target_object_format_saved som
Index: lib/target-libpath.exp
===
--- lib/target-libpath.exp  (revision 176236)
+++ lib/target-libpath.exp  (working copy)
@@ -272,11 +272,11 @@
 proc get_shlib_extension { } {
 global shlib_ext
 
-if { [ istarget *-*-darwin* ] } {
+if { [istarget *-*-darwin*] } {
set shlib_ext dylib
-} elseif { [ istarget *-*-cygwin* ] || [ istarget *-*-mingw* ] } {
+} elseif { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } {
set shlib_ext dll
-} elseif { [ istarget hppa*-*-hpux* ] } {
+} elseif { [istarget hppa*-*-hpux*] } {
set shlib_ext sl
 } else {
set shlib_ext so
Index: lib/go-torture.exp
===
--- lib/go-torture.exp  (revision 176236)
+++ 

Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-13 Thread H.J. Lu
On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote:
 On 07/11/2011 05:54 PM, H.J. Lu wrote:

 The key is the
 
      XEXP (x, 1) == convert_memory_address_addr_space
                     (to_mode, XEXP (x, 1), as)
 
   test.  It ensures basically that the constant has 31-bit precision,
  because
   otherwise the constant would change from e.g. (const_int -0x7ffc)
  to
   (const_int 0x8004) when zero-extending it from SImode to DImode.
 
   But I'm not sure it's safe.  You have,
 
     (zero_extend:DI (plus:SI FOO:SI) (const_int Y))
 
   and you want to convert it to
 
     (plus:DI FOO:DI (zero_extend:DI (const_int Y)))
 
   (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF
  (this
   piece of code does not assume anything about its shape); if FOO ==
   0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
   0x10004 (invalid).

 This example contradicts what you said above It ensures basically that
 the
 constant has 31-bit precision.

 Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
 representation in SImode and DImode, and the test above on XEXP (x, 1)
 succeeds.

And then we permute conversion and addition, which leads to the issue you
raised above.  In another word, the current code permutes conversion
and addition.
It leads to different values in case of symbol (0xfffc) + 8.
Basically the current
test for 31-bit (or less) precision is bogus.  The real question is
for a address
computation, A + B, if address wrap-around is supported in
convert_memory_address_addr_space.

   What happens if you just return NULL instead of the assertion (good
  idea
   adding it!)?
 
   Of course then you need to:
 
   1) check the return values of convert_memory_address_addr_space_1, and
   propagate NULL up to simplify_unary_operation;
 
   2) check in simplify-rtx.c whether the return value of
   convert_memory_address_1 is NULL, and only return if the return value
  is not
   NULL.  This is not yet necessary (convert_memory_address is the last
   transformation for both SIGN_EXTEND and ZERO_EXTEND) but it is better
  to
   keep code clean.

 I will give it a try.

 Thanks, did you get any result?  There's no I think in this code.  So even
 if I cannot approve it, I'd really like to see a version that I understand
 and that is clearly conservative, if it works.


I opened a new bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721

My current code looks like:

   case CONST:
  temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0),
  as, no_emit,
  ignore_address_wrap_around);
  return temp ? gen_rtx_CONST (to_mode, temp) : temp;
  break;

case PLUS:
case MULT:
  /* For addition we can safely permute the conversion and addition
 operation if one operand is a constant, address wrap-around
 is ignored and we are using a ptr_extend instruction or
 zero-extending (POINTERS_EXTEND_UNSIGNED != 0).  We can always
 safely permute them if we are making the address narrower.  */
  if (GET_MODE_SIZE (to_mode)  GET_MODE_SIZE (from_mode)
  || (GET_CODE (x) == PLUS
   CONST_INT_P (XEXP (x, 1))
   (POINTERS_EXTEND_UNSIGNED != 0
   ignore_address_wrap_around)))
return gen_rtx_fmt_ee (GET_CODE (x), to_mode,
   convert_memory_address_addr_space_1
 (to_mode, XEXP (x, 0), as, no_emit,
  ignore_address_wrap_around),
   XEXP (x, 1));
  break;

-- 
H.J.


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-13 Thread Paolo Bonzini
On Wed, Jul 13, 2011 at 18:39, H.J. Lu hjl.to...@gmail.com wrote:

 Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
 representation in SImode and DImode, and the test above on XEXP (x, 1)
 succeeds.

 And then we permute conversion and addition, which leads to the issue you
 raised above.  In another word, the current code permutes conversion
 and addition.

No, only if we have ptr_extend.  It may be buggy as well, but let's
make sure first that x32 is done right, then perhaps whoever cares can
fix ptr_extend if it has to be fixed.  I don't know the semantics of
ia64 addp4 so I cannot tell.

 I opened a new bug:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721

Good, thanks.

 My current code looks like:

   case CONST:
      temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0),
                                                  as, no_emit,
                                                  ignore_address_wrap_around);

Here I stopped reading.  It's not what I asked for, so at least you
should say clearly _why_.

Paolo


Re: RFA: Avoid unnecessary clearing in union initialisers

2011-07-13 Thread H.J. Lu
On Tue, Jul 12, 2011 at 9:34 AM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 PR 48183 is caused by the fact that we don't really support integers
 (or least integer constants) wider than 2*HOST_BITS_PER_WIDE_INT:

   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01220.html

 However, such constants shouldn't be needed in normal use.
 They came from an unnecessary zero-initialisation of a union such as:

   union { a f1; b f2; } u = { init_f1 };

 where f1 and f2 are the full width of the union.  The zero-initialisation
 gets optimised away for real insns, but persists in debug insns:

   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01585.html

 This patch takes up Richard's idea here:

   http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01987.html

 categorize_ctor_elements currently tries to work out how many scalars a
 constructor initialises (IE) and how many of those scalars are zero (ZE).
 Callers can then call count_type_elements to find out how many scalars (TE)
 ought to be initialised if the constructor is complete (i.e. if it
 explicitly initialises every meaningful byte, rather than relying on
 default zero-initialisation).  The constructor is complete if TE == ZE,
 except as noted in [A] below.

 However, count_type_elements can't return the required TE for unions,
 because it would need to know which of the union's fields was initialised
 by the constructor (if any).  This choice of field is reflected in IE and
 ZE, so would need to be reflected in TE as well.

 count_type_elements therefore punts on unions.  However, the caller
 can't easily tell whether it punts because of that, because of overflow,
 of because of variable-sized types.

 [A] One particular case of interest is when a union constructor initialises
 a field that is shorter than the union.  In this case, the rest of the
 union must be zeroed in order to ensure that the other fields have
 predictable values.  categorize_ctor_elements has a special out-parameter
 to reccord this situation.

 This leads to quite a complicated interface.  The patch tries to
 simplify it by making categorize_ctor_elements keep track of whether
 a constructor is complete.  This also has the minor advantage of
 avoiding double recursion: first through the constructor,
 then through its type tree.

 After this change, ZE and IE are only needed when deciding how best to
 implement complete initialisers (such as whether to do a bulk zero
 initialisation anyway, and just write the nonzero elements individually).
 For cases where a leaf constructor element is itself an aggregate with
 a union, we can therefore estimate the number of scalars in the union,
 and hopefully make the heuristic a bit more accurate than the current 1:

            HOST_WIDE_INT tc = count_type_elements (TREE_TYPE (value), true);
            if (tc  1)
              tc = 1;

 cp/typeck2.c also wants to check whether the variable parts of a
 constructor are complete.  The patch uses the approach to completeness
 there.  This should make it a bit more general than the current code,
 which only deals with non-nested constructors.

 Tested on x86_64-linux-gnu (all languages, including Ada), and on
 arm-linux-gnueabi.  OK to install?

 Richard


 gcc/
        * tree.h (categorize_ctor_elements): Remove comment.  Fix long line.
        (count_type_elements): Delete.
        (complete_ctor_at_level_p): Declare.
        * expr.c (flexible_array_member_p): New function, split out from...
        (count_type_elements): ...here.  Make static.  Replace allow_flexarr
        parameter with for_ctor_p.  When for_ctor_p is true, return the
        number of elements that should appear in the top-level constructor,
        otherwise return an estimate of the number of scalars.
        (categorize_ctor_elements): Replace p_must_clear with p_complete.
        (categorize_ctor_elements_1): Likewise.  Use complete_ctor_at_level_p.
        (complete_ctor_at_level_p): New function, borrowing union logic
        from old categorize_ctor_elements_1.
        (mostly_zeros_p): Return true if the constructor is not complete.
        (all_zeros_p): Update call to categorize_ctor_elements.
        * gimplify.c (gimplify_init_constructor): Update call to
        categorize_ctor_elements.  Don't call count_type_elements.
        Unconditionally prevent clearing for variable-sized types,
        otherwise rely on categorize_ctor_elements to detect
        incomplete initializers.

 gcc/cp/
        * typeck2.c (split_nonconstant_init_1): Pass the initializer directly,
        rather than a pointer to it.  Return true if the whole of the value
        was initialized by the generated statements.  Use
        complete_ctor_at_level_p instead of count_type_elements.

 gcc/testsuite/
 2011-07-12  Chung-Lin Tang  clt...@codesourcery.com

        * gcc.target/arm/pr48183.c: New test.


This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736

-- 
H.J.


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-13 Thread Paolo Bonzini
 Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
 representation in SImode and DImode, and the test above on XEXP (x, 1)
 succeeds.

 And then we permute conversion and addition, which leads to the issue you
 raised above.  In another word, the current code permutes conversion
 and addition.

 No, only if we have ptr_extend.

Oops, hit send too early, I understand now what you mean.  But even
more so, let's make sure x32 is done right so that perhaps we can
remove the bogus test on XEXP (x, 1) for other Pmode != ptr_mode
targets, non-ptr_extend.  Then we can worry perhaps of
POINTERS_EXTEND_UNSIGNED  0.

Paolo


Re: [build] Move crtfastmath to toplevel libgcc

2011-07-13 Thread Rainer Orth
Richard Henderson r...@redhat.com writes:

 On 07/11/2011 10:26 AM, Rainer Orth wrote:
 There's one other question here: alpha/t-crtfm uses
 -frandom-seed=gcc-crtfastmath with this comment:
 
 # FIXME drow/20061228 - I have preserved this -frandom-seed option
 # while migrating this rule from the GCC directory, but I do not
 # know why it is necessary if no other crt file uses it.
 
 Is there any particular reason to either keep this or not to use it in
 the generic file?  This way, only i386 needs to stay separate with its
 use of -msse -minline-all-stringops.

 This random-seed thing is there for the mangled name we build
 for the constructor on Tru64.

 It's not needed for any target for which a .ctors section is
 supported.  It also doesn't hurt, so you could move it to any
 generic build rule.

This is what I've done.  Here's the revised patch, currently
bootstrapping on alpha-dec-osf5.1b and well into building the target
libraries.

After committing the Darwin crt[23].o patch and before continuing with
the i386/crtprec??.o one, I noticed that this would leave Darwin/x86 in
a broken state: gcc/config/i386/t-crtfm still has 

EXTRA_PARTS += crtfastmath.o

which is missing in libgcc/config.host, thus the extra_parts comparison
will fail and break bootstrap ;-(

Do you think the revised crtfastmath patch is safe enough to commit
together to avoid this mess?

Thanks.
Rainer


2011-07-10  Rainer Orth  r...@cebitec.uni-bielefeld.de

gcc:
* config/alpha/crtfastmath.c: Move to ../libgcc/config/alpha.
* config/alpha/t-crtfm: Remove.
* config/i386/crtfastmath.c: Move to ../libgcc/config/i386.
* config/i386/t-crtfm: Remove.
* config/ia64/crtfastmath.c: Move to ../libgcc/config/ia64.
* config/mips/crtfastmath.c: Move to ../libgcc/config/mips.
* config/sparc/crtfastmath.c: Move to ../libgcc/config/sparc.
* config/sparc/t-crtfm: Remove.

* config.gcc (alpha*-*-linux*): Remove alpha/t-crtfm from tmake_file.
(alpha*-*-freebsd*): Likewise.
(i[34567]86-*-darwin*): Remove i386/t-crtfm from tmake_file.
(x86_64-*-darwin*): Likewise.
(i[34567]86-*-linux*): Likewise.
(x86_64-*-linux*): Likewise.
(x86_64-*-mingw*): Likewise.
(ia64*-*-elf*): Remove crtfastmath.o from extra_parts.
(ia64*-*-freebsd*): Likewise.
(ia64*-*-linux*): Likewise.
(mips64*-*-linux*): Likewise.
(mips*-*-linux*): Likewise.
(sparc-*-linux*): Remove sparc/t-crtfm from tmake_file.
(sparc64-*-linux*): Likewise.
(sparc64-*-freebsd*): Likewise.

libgcc:
* config/alpha/crtfastmath.c: New file.
* config/i386/crtfastmath.c: New file.
* config/ia64/crtfastmath.c: New file.
* config/mips/crtfastmath.c: New file.
* config/sparc/crtfastmath.c: New file.

* config/t-crtfm (crtfastmath.o): Use $(srcdir) to refer to
crtfastmath.c.
Add -frandom-seed=gcc-crtfastmath.
* config/alpha/t-crtfm: Remove.
* config/i386/t-crtfm: Use $(srcdir) to refer to crtfastmath.c.
* config/ia64/t-ia64 (crtfastmath.o): Remove.

* config.host (alpha*-*-linux*): Replace alpha/t-crtfm by t-crtfm.
(alpha*-dec-osf5.1*): Likewise.
(alpha*-*-freebsd*): Add t-crtfm to tmake_file.
Add crtfastmath.o to extra_parts.
(i[34567]86-*-darwin*): Add i386/t-crtfm to tmake_file.
Add crtfastmath.o to extra_parts.
(x86_64-*-darwin*): Likewise.
(x86_64-*-mingw*): Likewise.
(ia64*-*-elf*): Add t-crtfm to tmake_file.
(ia64*-*-freebsd*): Likewise.
(ia64*-*-linux*): Likewise.
(sparc64-*-freebsd*): Add t-crtfm to tmake_file.
Add crtfastmath.o to extra_parts.

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -756,13 +756,13 @@ alpha*-*-linux*)
tm_file=${tm_file} alpha/elf.h alpha/linux.h alpha/linux-elf.h 
glibc-stdint.h
extra_options=${extra_options} alpha/elf.opt
target_cpu_default=MASK_GAS
-   tmake_file=${tmake_file} alpha/t-crtfm alpha/t-alpha alpha/t-ieee 
alpha/t-linux
+   tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux
;;
 alpha*-*-freebsd*)
tm_file=${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h
extra_options=${extra_options} alpha/elf.opt
target_cpu_default=MASK_GAS
-   tmake_file=${tmake_file} alpha/t-crtfm alpha/t-alpha alpha/t-ieee
+   tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee
extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
;;
 alpha*-*-netbsd*)
@@ -1208,12 +1208,12 @@ i[34567]86-*-darwin*)
need_64bit_isa=yes
# Baseline choice for a machine that allows m64 support.
with_cpu=${with_cpu:-core2}
-   tmake_file=${tmake_file} t-slibgcc-dummy i386/t-crtpc i386/t-crtfm
+   tmake_file=${tmake_file} 

Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-13 Thread Richard Henderson
On 07/13/2011 06:13 AM, Andreas Krebbel wrote:
 +   force_operand (gen_rtx_fmt_ee (code, mode,
 +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
 +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)),
 +  NULL_RTX);
 +   break;
 + case 3:
 +   /* FMA expressions are not handled by force_operand.  */
 +   expand_ternary_op (mode, fma_optab,
 +  gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
 +  gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2),
 +  gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3),
 +  NULL_RTX, false);

Why the force_operand?  You've got register inputs.  Either the target
is going to support the operation or it isn't.

Seems to me you can check the availability of the operation in the 
optab and pass that gen_rtx_fmt_ee result to rtx_cost directly.

 +   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
 +   static unsigned mul_cost[NUM_MACHINE_MODES];
 +   static unsigned add_cost[NUM_MACHINE_MODES];
 +   static unsigned fma_cost[NUM_MACHINE_MODES];
...
 +   if (!fma_cost[mode])
 + {
 +   fma_cost[mode] = compute_costs (mode, FMA, speed);
 +   add_cost[mode] = compute_costs (mode, PLUS, speed);
 +   mul_cost[mode] = compute_costs (mode, MULT, speed);
 + }

Saving cost data dependent on speed, which is non-constant.
You probably need to make this a two dimensional array.


r~


[RFC] More compact (100x) -g3 .debug_macinfo

2011-07-13 Thread Jakub Jelinek
Hi!

Currently .debug_macinfo is prohibitively large, because it doesn't
allow for any kind of merging of duplicate debug information.

This patch is an RFC for extensions that allow it to bring it down
to manageable levels.  The ideas for the first shrinking come from Jason
and/or Roland I think from last year and is similar to the introduction of
DW_FORM_strp to replace DW_FORM_string in some cases.
In particular, if the string in DW_MACINFO_define or DW_MACINFO_undef is
larger than 4 bytes including terminating '\0' and there is a chance the
string might occur more than once, instead an offset into .debug_str
is used.  The usual .debug_str string merging then kicks in and removes
duplicities.

The second savings come from merging of identical sequences of
DW_MACINFO_define/undef ops.  Usually, when you include some header,
the macros it defines/undefines are the same.  Unfortunately it is hard
to merge whole headers, because:
1) DW_MACINFO_start_file uses .debug_line references, which prevent merging
   - different CUs have different .debug_line content
2) multiple inclusion of headers with single inclusion guards is quite
   common and results in such merging to be less than satisfactory, as
   if some header includes stdio.h and you include that header
   in one source file without prior inclusion of stdio.h and in a different
   one after #include stdio.h, suddenly the .debug_macinfo sequence
   for that header is different if it transitively includes included headers

Unfortunately, as defined in DWARF{2,3,4}, .debug_macinfo is not really
allowing extensions.  DW_MACINFO_vendor_ext doesn't count, because its
argument is a string, which certainly can't include embedded zeros needed
for the offsets into other sections or other portions of the same section.

The following approach just grabs a range of .debug_macinfo opcodes for
vendor use, if the DWARF commitee would give such an approach a green light.
.debug_macinfo has 256 possible opcodes and just defines 5 (plus 1 for
termination), the remaining 250 are unused.
Other alternative would be to come up with .debug_gnu_macinfo section or
similar and defining a new DW_AT_GNU_macro_info attribute that would be
used instead of DW_AT_macro_info, but I'd prefer to stay with
.debug_macinfo.

The newly added opcodes:
DW_MACINFO_GNU_define_indirect4 0xe0
This opcode has two arguments, one is uleb128 lineno and the
other is 4 byte offset into .debug_str.  Except for the
encoding of the string it is similar to DW_MACINFO_define.
DW_MACINFO_GNU_undef_indirect4  0xe1
This opcode has two arguments, one is uleb128 lineno and the
other is 4 byte offset into .debug_str.  Except for the
encoding of the string it is similar to DW_MACINFO_undef.
DW_MACINFO_GNU_transparent_include4 0xe2
This opcode has a single argument, a 4 byte offset into
.debug_macinfo.  It instructs the debug info consumer that
this opcode during reading should be replaced with the sequence
of .debug_macinfo opcodes from the mentioned offset, up to
a terminating 0 opcode (not including that 0).
DW_MACINFO_GNU_define_opcode0xe3
This is an opcode for future extensibility through which
a debugger could skip unknown opcodes.  It has 3 arguments:
1 byte opcode number, uleb128 count of arguments and
a count bytes long array, with a DW_FORM_* code how the
argument is encoded.
The debug info producers have to ensure that opcodes in
DW_MACINFO_GNU_transparent_include4 chains reference the right sections
for any .debug_macinfo that includes them (which essentially means
that DW_MACINFO_start_file can't be used in the transparent_include4
chain.  Perhaps cleaner would be not to define all offset sizes in the
opcode values/names and instead have DW_MACINFO_GNU_define_indirect
and DW_MACINFO_GNU_undef_indirect whose arguments would be
DW_FORM_udata and DW_FORM_strp (i.e. offset size) - the producers
would need to ensure that .debug_macinfo chains with different
assumed offset size aren't merged together, which could be done
e.g. by using wm4.[filename.]lineno.md5 and wm8.* comdat
groups instead of the current wm.*.  DW_MACINFO_GNU_transparent_include4
then would have DW_FORM_sec_offset single argument and
DW_MACINFO_GNU_define_opcode would have DW_FORM_data1 and DW_FORM_block
arguments and the implicit opcode definition assumed at the start
of every .debug_macinfo would be:
DW_MACINFO_GNU_define_opcode 0, 0 []
DW_MACINFO_GNU_define_opcode DW_MACINFO_define, 2 [DW_FORM_udata, 
DW_FORM_string]
DW_MACINFO_GNU_define_opcode DW_MACINFO_undef, 2 [DW_FORM_udata, 
DW_FORM_string]
DW_MACINFO_GNU_define_opcode DW_MACINFO_start_file, 2 [DW_FORM_udata, 
DW_FORM_sec_offset]
DW_MACINFO_GNU_define_opcode DW_MACINFO_end_file, 1 [DW_FORM_udata]
DW_MACINFO_GNU_define_opcode DW_MACINFO_GNU_define_indirect, 2 [DW_FORM_udata, 
DW_FORM_strp]
DW_MACINFO_GNU_define_opcode 

Re: [build] Move crtfastmath to toplevel libgcc

2011-07-13 Thread Richard Henderson
On 07/13/2011 09:57 AM, Rainer Orth wrote:
 Do you think the revised crtfastmath patch is safe enough to commit
 together to avoid this mess?

Probably.

 +# -frandom-seed is necessary to keep the mangled name of the constructor on
 +# Tru64 Unix stable, but harmless otherwise.

Instead of implying permanent stability, I'd mention bootstrap comparison
failures specifically.


r~


Re: [build] Move crtfastmath to toplevel libgcc

2011-07-13 Thread Rainer Orth
Richard Henderson r...@redhat.com writes:

 On 07/13/2011 09:57 AM, Rainer Orth wrote:
 Do you think the revised crtfastmath patch is safe enough to commit
 together to avoid this mess?

 Probably.

Ok.  I'll will take this on me to get us out of this mess.  It has
survived i386-pc-solaris2.11, sparc-sun-solaris2.11,
x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the
risk seems acceptable.

 +# -frandom-seed is necessary to keep the mangled name of the constructor on
 +# Tru64 Unix stable, but harmless otherwise.

 Instead of implying permanent stability, I'd mention bootstrap comparison
 failures specifically.

Ok, will do.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac

2011-07-13 Thread Thomas Schwinge
Hallo!

On Wed, 13 Jul 2011 18:23:50 +0200, Paolo Bonzini bonz...@gnu.org wrote:
 On 07/13/2011 06:13 PM, Thomas Schwinge wrote:
  Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the
  temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is
  doing, for example.
 
 Change instead echo ifelse(...)  conftest.s to
 
AS_ECHO([m4_if(...)])  conftest.s
 
 in gcc_GAS_CHECK_FEATURE.

Ah, even better.

gcc/
* acinclude.m4 (gcc_GAS_CHECK_FEATURE): Use AS_ECHO instead of echo.
* configure: Regenerate.

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index ff38682..f092925 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -583,7 +583,7 @@ AC_CACHE_CHECK([assembler for $1], [$2],
   if test $in_tree_gas = yes; then
 gcc_GAS_VERSION_GTE_IFELSE($3, [[$2]=yes])
   el])if test x$gcc_cv_as != x; then
-echo ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')  conftest.s
+AS_ECHO([ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')])  conftest.s
 if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o conftest.s 
AS_MESSAGE_LOG_FD])
 then
ifelse([$6],, [$2]=yes, [$6])

The configure differences are strictly s%echo%$as_echo%.


Grüße,
 Thomas


pgpwEtgyYnb5O.pgp
Description: PGP signature


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-13 Thread Michael Meissner
I just ran a spec 2006 run on the powerpc (32-bit) last night setting the
reassociation to 2.  I do see a win in bwaves, but unfortunately it is not
enough of a win, and it is still a regression to GCC 4.5.  However, I see some
regressions in 3 other benchmarks (I tend to omit differences of less than 2%):

401.bzip2  97.99%
410.bwaves113.88%
436.cactusADM  93.96%
444.namd   93.74%

The profile differences are as follows.  Unfortunately, I'm not sure I can post
sample counts under Spec rules:

Bzip2:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
28.96%  28.39%  mainSort
15.94%  15.49%  BZ2_decompress
12.56%  12.35%  mainGtU.part.0
11.59%  11.54%  generateMTFValues
8.89%   9.04%   fallbackSort
6.60%   8.28%   BZ2_compressBlock
7.48%   7.21%   handle_compress.isra.2
6.24%   5.95%   BZ2_bzDecompress
0.55%   0.58%   add_pair_to_block
0.54%   0.54%   BZ2_hbMakeCodeLengths

Bwaves:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
78.70%  74.73%  mat_times_vec_
11.68%  13.21%  bi_cgstab_block_
 6.72%   8.47%  shell_
 2.11%   2.62%  jacobian_
 0.79%   0.96%  flux_

CactusADM:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
99.67%  99.69%  bench_staggeredleapfrog2_

Namd:

GCC 4.7 GCC 4.7 with patchesFunction
=== 
15.43%  14.71%  
_ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded.part.39
11.94%  11.80%  
_ZN20ComputeNonbondedUtil19calc_pair_fullelectEP9nonbonded.part.40
10.18%  11.52%  
_ZN20ComputeNonbondedUtil32calc_pair_energy_merge_fullelectEP9nonbonded.part.37
9.87%   9.02%   
_ZN20ComputeNonbondedUtil16calc_pair_energyEP9nonbonded.part.41
9.55%   8.85%   
_ZN20ComputeNonbondedUtil9calc_pairEP9nonbonded.part.42
9.52%   9.05%   
_ZN20ComputeNonbondedUtil25calc_pair_merge_fullelectEP9nonbonded.part.38
7.24%   8.72%   
_ZN20ComputeNonbondedUtil26calc_self_energy_fullelectEP9nonbonded.part.31
6.28%   6.42%   
_ZN20ComputeNonbondedUtil19calc_self_fullelectEP9nonbonded.part.32
5.23%   6.18%   
_ZN20ComputeNonbondedUtil32calc_self_energy_merge_fullelectEP9nonbonded.part.29
5.13%   4.66%   
_ZN20ComputeNonbondedUtil16calc_self_energyEP9nonbonded.part.33
4.72%   4.43%   
_ZN20ComputeNonbondedUtil25calc_self_merge_fullelectEP9nonbonded.part.30
4.60%   4.37%   
_ZN20ComputeNonbondedUtil9calc_selfEP9nonbonded.part.34

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-13 Thread H.J. Lu
On Wed, Jul 13, 2011 at 9:54 AM, Paolo Bonzini bonz...@gnu.org wrote:
 Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
 representation in SImode and DImode, and the test above on XEXP (x, 1)
 succeeds.

 And then we permute conversion and addition, which leads to the issue you
 raised above.  In another word, the current code permutes conversion
 and addition.

 No, only if we have ptr_extend.

 Oops, hit send too early, I understand now what you mean.  But even
 more so, let's make sure x32 is done right so that perhaps we can
 remove the bogus test on XEXP (x, 1) for other Pmode != ptr_mode
 targets, non-ptr_extend.  Then we can worry perhaps of
 POINTERS_EXTEND_UNSIGNED  0.


Here is the patch.  OK for trunk?

Thanks.


-- 
H.J.

2011-07-12  H.J. Lu  hongjiu...@intel.com

PR middle-end/49721
* explow.c (convert_memory_address_addr_space_1): New.
(convert_memory_address_addr_space): Use it.

* expr.c (convert_modes_1): New.
(convert_modes): Use it.

* expr.h (convert_modes_1): New.

* rtl.h (convert_memory_address_addr_space_1): New.
(convert_memory_address_1): Likewise.

* simplify-rtx.c (simplify_unary_operation_1): Call
convert_memory_address_1 instead of convert_memory_address.
2011-07-12  H.J. Lu  hongjiu...@intel.com

PR middle-end/49721
* explow.c (convert_memory_address_addr_space_1): New.
(convert_memory_address_addr_space): Use it.

* expr.c (convert_modes_1): New.
(convert_modes): Use it.

* expr.h (convert_modes_1): New.

* rtl.h (convert_memory_address_addr_space_1): New.
(convert_memory_address_1): Likewise.

* simplify-rtx.c (simplify_unary_operation_1): Call
convert_memory_address_1 instead of convert_memory_address.

diff --git a/gcc/explow.c b/gcc/explow.c
index 3c692f4..8551fe8 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -320,8 +320,10 @@ break_out_memory_refs (rtx x)
arithmetic insns can be used.  */
 
 rtx
-convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED,
-  rtx x, addr_space_t as ATTRIBUTE_UNUSED)
+convert_memory_address_addr_space_1 (enum machine_mode to_mode 
ATTRIBUTE_UNUSED,
+rtx x, addr_space_t as ATTRIBUTE_UNUSED,
+bool no_emit ATTRIBUTE_UNUSED,
+bool ignore_address_wrap_around 
ATTRIBUTE_UNUSED)
 {
 #ifndef POINTERS_EXTEND_UNSIGNED
   gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode);
@@ -377,28 +379,28 @@ convert_memory_address_addr_space (enum machine_mode 
to_mode ATTRIBUTE_UNUSED,
   break;
 
 case CONST:
-  return gen_rtx_CONST (to_mode,
-   convert_memory_address_addr_space
- (to_mode, XEXP (x, 0), as));
+  temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0),
+ as, no_emit,
+ ignore_address_wrap_around);
+  return temp ? gen_rtx_CONST (to_mode, temp) : temp;
   break;
 
 case PLUS:
 case MULT:
-  /* For addition we can safely permute the conversion and addition
-operation if one operand is a constant and converting the constant
-does not change it or if one operand is a constant and we are
-using a ptr_extend instruction  (POINTERS_EXTEND_UNSIGNED  0).
-We can always safely permute them if we are making the address
-narrower.  */
+  /* For addition, we can safely permute the conversion and addition
+operation if one operand is a constant and we are using a
+ptr_extend instruction (POINTERS_EXTEND_UNSIGNED  0) or address
+wrap-around is ignored.  We can always safely permute them if
+we are making the address narrower.  */
   if (GET_MODE_SIZE (to_mode)  GET_MODE_SIZE (from_mode)
  || (GET_CODE (x) == PLUS
   CONST_INT_P (XEXP (x, 1))
-  (XEXP (x, 1) == convert_memory_address_addr_space
-  (to_mode, XEXP (x, 1), as)
- || POINTERS_EXTEND_UNSIGNED  0)))
+  (POINTERS_EXTEND_UNSIGNED  0
+ || ignore_address_wrap_around)))
return gen_rtx_fmt_ee (GET_CODE (x), to_mode,
-  convert_memory_address_addr_space
-(to_mode, XEXP (x, 0), as),
+  convert_memory_address_addr_space_1
+(to_mode, XEXP (x, 0), as, no_emit,
+ ignore_address_wrap_around),
   XEXP (x, 1));
   break;
 
@@ -406,10 +408,17 @@ convert_memory_address_addr_space (enum machine_mode 
to_mode ATTRIBUTE_UNUSED,
   break;
 }
 
-  return convert_modes (to_mode, from_mode,
- 

Avoid overriding LIB_THREAD_LDFLAGS_SPEC on Solaris 8 (PR target/49541)

2011-07-13 Thread Rainer Orth
As reported in the PR, LIB_THREAD_LDFLAGS_SPEC (effectively
-L/usr/lib/lwp(/64)? -R/usr/lib/lwp(/64)? to make use of the alternate
thread library on Solaris 8, which also provides the only implementation
of __tls_get_addr) could be overridden by the regular -L flags from the
%D spec for 64-bit compilations to find /lib/sparcv9/libthread.so
instead, which lacks that function, causing link failures.

This patch fixes this by moving the -L/-R flags from LIB_SPEC to
LINK_SPEC which is before %D.

Bootstrapped without regressions on sparc-sun-solaris2.8 and
i386-pc-solaris2.8 by myself (though in a branded zone which doesn't
show the problem directly) and by Eric on bare-metal Solaris 8.

Installed on mainline, will backport to the 4.6 branch after testing.

Rainer


2011-07-08  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR target/49541
* config/sol2.h (LIB_SPEC): Simplify.
Move LIB_THREAD_LDFLAGS_SPEC ...
(LINK_SPEC): ... here.

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -109,10 +109,8 @@ along with GCC; see the file COPYING3.  
 #undef LIB_SPEC
 #define LIB_SPEC \
   %{!symbolic:\
- %{pthreads|pthread: \
-LIB_THREAD_LDFLAGS_SPEC  -lpthread  LIB_TLS_SPEC } \
- %{fprofile-generate*: \
-LIB_THREAD_LDFLAGS_SPEC   LIB_TLS_SPEC } \
+ %{pthreads|pthread:-lpthread} \
+ %{pthreads|pthread|fprofile-generate*: LIB_TLS_SPEC } \
  %{p|pg:-ldl} -lc}
 
 #ifndef CROSS_DIRECTORY_STRUCTURE
@@ -175,6 +173,7 @@ along with GCC; see the file COPYING3.  
%{static:-dn -Bstatic} \
%{shared:-G -dy %{!mimpure-text:-z text}} \
%{symbolic:-Bsymbolic -G -dy -z text} \
+   %{pthreads|pthread|fprofile-generate*: LIB_THREAD_LDFLAGS_SPEC } \
%(link_arch) \
%{Qy:} %{!Qn:-Qy}
 
-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac

2011-07-13 Thread Paolo Bonzini
Ok.

Paolo

On Wed, Jul 13, 2011 at 19:17, Thomas Schwinge tho...@schwinge.name wrote:
 Hallo!

 On Wed, 13 Jul 2011 18:23:50 +0200, Paolo Bonzini bonz...@gnu.org wrote:
 On 07/13/2011 06:13 PM, Thomas Schwinge wrote:
  Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the
  temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is
  doing, for example.

 Change instead echo ifelse(...)  conftest.s to

    AS_ECHO([m4_if(...)])  conftest.s

 in gcc_GAS_CHECK_FEATURE.

 Ah, even better.

        gcc/
        * acinclude.m4 (gcc_GAS_CHECK_FEATURE): Use AS_ECHO instead of echo.
        * configure: Regenerate.

 diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
 index ff38682..f092925 100644
 --- a/gcc/acinclude.m4
 +++ b/gcc/acinclude.m4
 @@ -583,7 +583,7 @@ AC_CACHE_CHECK([assembler for $1], [$2],
   if test $in_tree_gas = yes; then
     gcc_GAS_VERSION_GTE_IFELSE($3, [[$2]=yes])
   el])if test x$gcc_cv_as != x; then
 -    echo ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')  conftest.s
 +    AS_ECHO([ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')])  conftest.s
     if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o 
 conftest.s AS_MESSAGE_LOG_FD])
     then
        ifelse([$6],, [$2]=yes, [$6])

 The configure differences are strictly s%echo%$as_echo%.


 Grüße,
  Thomas



Re: [build] Move crtfastmath to toplevel libgcc

2011-07-13 Thread H.J. Lu
On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Richard Henderson r...@redhat.com writes:

 On 07/13/2011 09:57 AM, Rainer Orth wrote:
 Do you think the revised crtfastmath patch is safe enough to commit
 together to avoid this mess?

 Probably.

 Ok.  I'll will take this on me to get us out of this mess.  It has
 survived i386-pc-solaris2.11, sparc-sun-solaris2.11,
 x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the
 risk seems acceptable.

 +# -frandom-seed is necessary to keep the mangled name of the constructor on
 +# Tru64 Unix stable, but harmless otherwise.

 Instead of implying permanent stability, I'd mention bootstrap comparison
 failures specifically.

 Ok, will do.

I think your patch caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739

-- 
H.J.


[commit, spu] Fix regression (Re: [PR debug/47590] rework md option overriding to delay var-tracking)

2011-07-13 Thread Ulrich Weigand
Alexandre Oliva wrote:

   * config/spu/spu.c (spu_flag_var_tracking): Drop.
   (TARGET_DELAY_VARTRACK): Define.
   (spu_var_tracking): New.
   (spu_machine_dependent_reorg): Call it.
   (asm_file_start): Don't save and override flag_var_tracking.

This change caused crashes under certain circumstances.  The problem
is that spu_var_tracking calls df_analyze, which assumes the df
framework has been keeping up to date on instructions.  In particular,
it assumes that df_scan_insn was called for each insn that was
generated in the meantime.  Normally, this is not a problem, because
the emit_insn family itself calls df_scan_insn.

However, this works only as long as the assignment of insns to basic
blocks is valid.  In machine-dependent reorg, this is no longer true.
To fix this, the current place in spu_machine_dependent_reorg that
used to call df_analyze made sure to re-install the basic-block
mappings by calling compute_bb_for_insn before.

The new location where your patch has added a call to df_analyze,
however, is out of the scope of that existing compute_bb_for_insn
call, causing those problems.

Fixed by adding another compute_bb_for_insn/free_bb_for_insn pair
to cover the new call site as well.

Also, asm_file_start now no longer does anything interesting,
and can just be removed in favor of the default implementation.

Tested on spu-elf, committed to mainline.

Bye,
Ulrich


ChangeLog:

* config/spu/spu.c (TARGET_ASM_FILE_START): Do not define.
(asm_file_start): Remove.
(spu_machine_dependent_reorg): Call compute_bb_for_insn and
free_bb_for_insn around code that modifies insns before
restarting df analysis.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 176209)
--- gcc/config/spu/spu.c(working copy)
*** static enum machine_mode spu_addr_space_
*** 224,230 
  static bool spu_addr_space_subset_p (addr_space_t, addr_space_t);
  static rtx spu_addr_space_convert (rtx, tree, tree);
  static int spu_sms_res_mii (struct ddg *g);
- static void asm_file_start (void);
  static unsigned int spu_section_type_flags (tree, const char *, int);
  static section *spu_select_section (tree, int, unsigned HOST_WIDE_INT);
  static void spu_unique_section (tree, int);
--- 224,229 
*** static void spu_setup_incoming_varargs (
*** 462,470 
  #undef TARGET_SCHED_SMS_RES_MII
  #define TARGET_SCHED_SMS_RES_MII spu_sms_res_mii
  
- #undef TARGET_ASM_FILE_START
- #define TARGET_ASM_FILE_START asm_file_start
- 
  #undef TARGET_SECTION_TYPE_FLAGS
  #define TARGET_SECTION_TYPE_FLAGS spu_section_type_flags
  
--- 461,466 
*** spu_machine_dependent_reorg (void)
*** 2703,2711 
--- 2699,2709 
  {
/* We still do it for unoptimized code because an external
   function might have hinted a call or return. */
+   compute_bb_for_insn ();
insert_hbrp ();
pad_bb ();
spu_var_tracking ();
+   free_bb_for_insn ();
return;
  }
  
*** spu_libgcc_shift_count_mode (void)
*** 7039,7052 
return SImode;
  }
  
- /* An early place to adjust some flags after GCC has finished processing
-  * them. */
- static void
- asm_file_start (void)
- {
-   default_file_start ();
- }
- 
  /* Implement targetm.section_type_flags.  */
  static unsigned int
  spu_section_type_flags (tree decl, const char *name, int reloc)
--- 7037,7042 


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


[commit, spu] Support clrsb

2011-07-13 Thread Ulrich Weigand
Hello,

several builtin-bitops-1.c tests have been failing recently on SPU
since the new clrsb builtin is not supported.

This patch fixes this by:

- installing the libgcc __clrsbdi2 routine into optabs
  (which doesn't happen automatically on SPU since word_mode is TImode)

- providing an in-line expander for SImode clrsb.

Tested on spu-elf, committed to mainline.

Bye,
Ulrich


ChangeLog:

* config/spu/spu.c (spu_init_libfuncs): Install __clrsbdi2.
* config/spu/spu.md (clrsbmode2): New expander.

Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 176247)
--- gcc/config/spu/spu.c(working copy)
*** spu_init_libfuncs (void)
*** 5630,5635 
--- 5630,5636 
set_optab_libfunc (ffs_optab, DImode, __ffsdi2);
set_optab_libfunc (clz_optab, DImode, __clzdi2);
set_optab_libfunc (ctz_optab, DImode, __ctzdi2);
+   set_optab_libfunc (clrsb_optab, DImode, __clrsbdi2);
set_optab_libfunc (popcount_optab, DImode, __popcountdi2);
set_optab_libfunc (parity_optab, DImode, __paritydi2);
  
Index: gcc/config/spu/spu.md
===
*** gcc/config/spu/spu.md   (revision 176209)
--- gcc/config/spu/spu.md   (working copy)
***
*** 2232,2237 
--- 2232,2252 
   operands[5] = spu_const(MODEmode, 31);
})
  
+ (define_expand clrsbmode2
+   [(set (match_dup 2)
+ (gt:VSI (match_operand:VSI 1 spu_reg_operand ) (match_dup 5)))
+(set (match_dup 3) (not:VSI (xor:VSI (match_dup 1) (match_dup 2
+(set (match_dup 4) (clz:VSI (match_dup 3)))
+(set (match_operand:VSI 0 spu_reg_operand)
+ (plus:VSI (match_dup 4) (match_dup 5)))]
+   
+   {
+  operands[2] = gen_reg_rtx (MODEmode);
+  operands[3] = gen_reg_rtx (MODEmode);
+  operands[4] = gen_reg_rtx (MODEmode);
+  operands[5] = spu_const(MODEmode, -1);
+   })
+ 
  (define_expand ffsmode2
[(set (match_dup 2)
(neg:VSI (match_operand:VSI 1 spu_reg_operand )))
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [commit, spu] Support clrsb

2011-07-13 Thread Bernd Schmidt
On 07/13/11 21:22, Ulrich Weigand wrote:
 Hello,
 
 several builtin-bitops-1.c tests have been failing recently on SPU
 since the new clrsb builtin is not supported.

That's odd, it should just have picked the libgcc function rather than
causing test failures. Why didn't that happen?


Bernd



Re: [RFC] More compact (100x) -g3 .debug_macinfo

2011-07-13 Thread Tom Tromey
 Jakub == Jakub Jelinek ja...@redhat.com writes:

Jakub Currently .debug_macinfo is prohibitively large, because it doesn't
Jakub allow for any kind of merging of duplicate debug information.

Jakub This patch is an RFC for extensions that allow it to bring it down
Jakub to manageable levels.

I wrote a gdb patch for this.  I've appended it in case you want to try
it out; it is against git master.  I tried it a little on an executable
Jakub sent me and it seems to work fine.

It is no trouble to change this patch if you change the format.  It
wasn't hard to write in the first place, it just bigger than it is
because I moved a bunch of code into a new function.

I don't think I really understood DW_MACINFO_GNU_define_opcode, so the
implementation here is probably wrong.

Tom

2011-07-13  Tom Tromey  tro...@redhat.com

* dwarf2read.c (read_indirect_string_at_offset): New function.
(read_indirect_string): Use it.
(dwarf_decode_macro_bytes): New function, taken from
dwarf_decode_macros.  Handle DW_MACINFO_GNU_*.
(dwarf_decode_macros): Use it.  handle DW_MACINFO_GNU_*.

diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index fde5b6a..af35f16 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -10182,32 +10182,32 @@ read_direct_string (bfd *abfd, gdb_byte *buf, 
unsigned int *bytes_read_ptr)
 }
 
 static char *
-read_indirect_string (bfd *abfd, gdb_byte *buf,
- const struct comp_unit_head *cu_header,
- unsigned int *bytes_read_ptr)
+read_indirect_string_at_offset (bfd *abfd, LONGEST str_offset)
 {
-  LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr);
-
   dwarf2_read_section (dwarf2_per_objfile-objfile, dwarf2_per_objfile-str);
   if (dwarf2_per_objfile-str.buffer == NULL)
-{
-  error (_(DW_FORM_strp used without .debug_str section [in module %s]),
- bfd_get_filename (abfd));
-  return NULL;
-}
+error (_(DW_FORM_strp used without .debug_str section [in module %s]),
+  bfd_get_filename (abfd));
   if (str_offset = dwarf2_per_objfile-str.size)
-{
-  error (_(DW_FORM_strp pointing outside of 
-  .debug_str section [in module %s]),
-bfd_get_filename (abfd));
-  return NULL;
-}
+error (_(DW_FORM_strp pointing outside of 
+.debug_str section [in module %s]),
+  bfd_get_filename (abfd));
   gdb_assert (HOST_CHAR_BIT == 8);
   if (dwarf2_per_objfile-str.buffer[str_offset] == '\0')
 return NULL;
   return (char *) (dwarf2_per_objfile-str.buffer + str_offset);
 }
 
+static char *
+read_indirect_string (bfd *abfd, gdb_byte *buf,
+ const struct comp_unit_head *cu_header,
+ unsigned int *bytes_read_ptr)
+{
+  LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr);
+
+  return read_indirect_string_at_offset (abfd, str_offset);
+}
+
 static unsigned long
 read_unsigned_leb128 (bfd *abfd, gdb_byte *buf, unsigned int *bytes_read_ptr)
 {
@@ -14576,116 +14576,14 @@ parse_macro_definition (struct macro_source_file 
*file, int line,
 
 
 static void
-dwarf_decode_macros (struct line_header *lh, unsigned int offset,
- char *comp_dir, bfd *abfd,
- struct dwarf2_cu *cu)
+dwarf_decode_macro_bytes (bfd *abfd, gdb_byte *mac_ptr, gdb_byte *mac_end,
+ struct macro_source_file *current_file,
+ struct line_header *lh, char *comp_dir,
+ struct dwarf2_cu *cu)
 {
-  gdb_byte *mac_ptr, *mac_end;
-  struct macro_source_file *current_file = 0;
   enum dwarf_macinfo_record_type macinfo_type;
   int at_commandline;
 
-  dwarf2_read_section (dwarf2_per_objfile-objfile,
-  dwarf2_per_objfile-macinfo);
-  if (dwarf2_per_objfile-macinfo.buffer == NULL)
-{
-  complaint (symfile_complaints, _(missing .debug_macinfo section));
-  return;
-}
-
-  /* First pass: Find the name of the base filename.
- This filename is needed in order to process all macros whose definition
- (or undefinition) comes from the command line.  These macros are defined
- before the first DW_MACINFO_start_file entry, and yet still need to be
- associated to the base file.
-
- To determine the base file name, we scan the macro definitions until we
- reach the first DW_MACINFO_start_file entry.  We then initialize
- CURRENT_FILE accordingly so that any macro definition found before the
- first DW_MACINFO_start_file can still be associated to the base file.  */
-
-  mac_ptr = dwarf2_per_objfile-macinfo.buffer + offset;
-  mac_end = dwarf2_per_objfile-macinfo.buffer
-+ dwarf2_per_objfile-macinfo.size;
-
-  do
-{
-  /* Do we at least have room for a macinfo type byte?  */
-  if (mac_ptr = mac_end)
-{
- /* Complaint is printed during the second pass as GDB will probably
-stop the first pass earlier 

Re: [commit, spu] Support clrsb

2011-07-13 Thread Ulrich Weigand
Bernd Schmidt wrote:
 On 07/13/11 21:22, Ulrich Weigand wrote:
  several builtin-bitops-1.c tests have been failing recently on SPU
  since the new clrsb builtin is not supported.
 
 That's odd, it should just have picked the libgcc function rather than
 causing test failures. Why didn't that happen?

That's the usual word_mode == TImode problem on SPU.  By default, only
libgcc functions for word_mode and up are installed into the optabs
libfunc table.  This means that on SPU, the default behaviour of GCC
is to call __clrsbti2, which of course does not exist in libgcc ...

This means that on SPU, all SImode/DImode libgcc routines that should
be called need to be installed into optabs specifically by the back-end.
That's what my patch does for __clrsbdi2.  (For __clrsbsi2, I'm just
providing an in-line expander instead, no need to call a libfunc.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


  1   2   >