Re: [build] Remove crt0, mcrt0 support
On 07/12/2011 06:45 PM, Rainer Orth wrote: +crt0.o: $(srcdir)/config/i386/netware-crt0.c + $(crt_commpile) $(CRTSTUFF_T_CFLAGS) -c $ Typo here. Otherwise looks good, thanks. Paolo
Re: [build] Move i386/crtprec to toplevel libgcc
On 07/12/2011 06:37 PM, Rainer Orth wrote: The next easy step in toplevel libgcc migration is moving i386/crtprec.c. I noticed that -mpc{32, 64, 80} wasn't supported on Solaris/x86 yet and corrected that. The only testcase using the switch was adapted to also do so on Darwin/x86 (which already has the support, but didn't exercise it). For the reasons already described, I'm not yet removing crtprec??.o from gcc/config/i386/t-linux64 (EXTRA_MULTILIB_PARTS). Bootstrapped without regressions on i386-pc-solaris2.11, x86_64-unknown-linux-gnu. Bootstrap on i386-apple-darwin9.8.0 is currently running. Ok for mainline? Thanks. Rainer 2011-07-10 Rainer Orthr...@cebitec.uni-bielefeld.de gcc: * config/i386/crtprec.c: Move to ../libgcc/config/i386. * config/i386/t-crtpc: Remove. * config/t-darwin (EXTRA_MULTILIB_PARTS): Remove. * config.gcc (i[34567]86-*-darwin*): Remove i386/t-crtpc from tmake_file. (x86_64-*-darwin*): Likewise. (i[34567]86-*-linux*): Likewise. (x86_64-*-linux*): Likewise. * config/i386/sol2.h (ENDFILE_SPEC): Redefine. Handle -mpc32, -mpc64, -mpc80. libgcc: * config/i386/crtprec.c: New file. * config/i386/t-crtpc: Use $(srcdir) to refer to crtprec.c. * config.host (i[34567]86-*-darwin*): Add i386/t-crtpc to tmake_file. Add crtprec32.o, crtprec64.o, crtprec80.o to extra_parts. (x86_64-*-darwin*): Likewise. (i[34567]86-*-solaris2*: Likewise. gcc/testsuite: * gcc.c-torture/execute/990127-2.x: Use -mpc64 on i?86-*-darwin*, i?86-*-solaris2*, x86_64-*-darwin*, x86_64-*-solaris2*. diff --git a/gcc/config.gcc b/gcc/config.gcc --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1208,12 +1208,12 @@ i[34567]86-*-darwin*) need_64bit_isa=yes # Baseline choice for a machine that allows m64 support. with_cpu=${with_cpu:-core2} - tmake_file=${tmake_file} t-slibgcc-dummy i386/t-crtpc + tmake_file=${tmake_file} t-slibgcc-dummy libgcc_tm_file=$libgcc_tm_file i386/darwin-lib.h ;; x86_64-*-darwin*) with_cpu=${with_cpu:-core2} - tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-dummy i386/t-crtpc + tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc-dummy tm_file=${tm_file} ${cpu_type}/darwin64.h libgcc_tm_file=$libgcc_tm_file i386/darwin-lib.h ;; @@ -1311,7 +1311,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfree i[34567]86-*-kopensolaris*-gnu) tm_file=${tm_file} i386/gnu-user.h kopensolaris-gnu.h i386/kopensolaris-gnu.h ;; i[34567]86-*-gnu*) tm_file=$tm_file i386/gnu-user.h gnu.h i386/gnu.h;; esac - tmake_file=${tmake_file} i386/t-crtstuff i386/t-crtpc + tmake_file=${tmake_file} i386/t-crtstuff ;; x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | x86_64-*-knetbsd*-gnu) tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h gnu-user.h glibc-stdint.h \ @@ -1323,7 +1323,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu x86_64-*-kfreebsd*-gnu) tm_file=${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h ;; x86_64-*-knetbsd*-gnu) tm_file=${tm_file} knetbsd-gnu.h ;; esac - tmake_file=${tmake_file} i386/t-linux64 i386/t-crtstuff i386/t-crtpc + tmake_file=${tmake_file} i386/t-linux64 i386/t-crtstuff x86_multilibs=${with_multilib_list} if test $x86_multilibs = default; then x86_multilibs=m64,m32 diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h --- a/gcc/config/i386/sol2.h +++ b/gcc/config/i386/sol2.h @@ -70,6 +70,14 @@ along with GCC; see the file COPYING3. #undef ASM_SPEC #define ASM_SPEC ASM_SPEC_BASE +#undef ENDFILE_SPEC +#define ENDFILE_SPEC \ + %{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \ + %{mpc32:crtprec32.o%s} \ + %{mpc64:crtprec64.o%s} \ + %{mpc80:crtprec80.o%s} \ + crtend.o%s crtn.o%s + #define SUBTARGET_CPU_EXTRA_SPECS \ { cpp_subtarget, CPP_SUBTARGET_SPEC }, \ { asm_cpu, ASM_CPU_SPEC },\ diff --git a/gcc/config/i386/t-crtpc b/gcc/config/i386/t-crtpc deleted file mode 100644 diff --git a/gcc/testsuite/gcc.c-torture/execute/990127-2.x b/gcc/testsuite/gcc.c-torture/execute/990127-2.x --- a/gcc/testsuite/gcc.c-torture/execute/990127-2.x +++ b/gcc/testsuite/gcc.c-torture/execute/990127-2.x @@ -3,12 +3,16 @@ # Use -mpc64 to force 80387 floating-point precision to 64 bits. This option # has no effect on SSE, but it is needed in case of -m32 on x86_64 targets. -if { [istarget i?86-*-linux*] +if { [istarget i?86-*-darwin*] + || [istarget i?86-*-linux*] || [istarget i?86-*-kfreebsd*-gnu] || [istarget i?86-*-knetbsd*-gnu] + || [istarget i?86-*-solaris2*] + || [istarget x86_64-*-darwin*] || [istarget x86_64-*-linux*] || [istarget x86_64-*-kfreebsd*-gnu] - || [istarget
[patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
Hello, I split my old patch into 8 speparate pieces for easier review. These patches are a prerequist for enabling boolification of comparisons in gimplifier and the necessary type-cast preserving in gimple from/to boolean-type. This patch adds support to fold_truth_not_expr for one-bit precision typed bitwise-binary and bitwise-not expressions. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_not_expr): Add support for one-bit bitwise operations. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 07:48:29.0 +0200 +++ gcc/gcc/fold-const.c2011-07-13 08:59:36.865620200 +0200 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre case INTEGER_CST: return constant_boolean_node (integer_zerop (arg), type); +case BIT_AND_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) +return NULL_TREE; + if (integer_onep (TREE_OPERAND (arg, 1))) + return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); + /* fall through */ case TRUTH_AND_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_OR_EXPR, type, + return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR + : TRUTH_OR_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); +case BIT_IOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) +return NULL_TREE; + /* fall through. */ case TRUTH_OR_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_AND_EXPR, type, + return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR + : TRUTH_AND_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); +case BIT_XOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) +return NULL_TREE; + /* fall through. */ case TRUTH_XOR_EXPR: /* Here we can invert either operand. We invert the first operand unless the second operand is a TRUTH_NOT_EXPR in which case our @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre negation of the second operand. */ if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR) - return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0), + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), + TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); + else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR + TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1) + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); else - return build2_loc (loc, TRUTH_XOR_EXPR, type, + return build2_loc (loc, code, type, invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)), TREE_OPERAND (arg, 1)); @@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); +case BIT_NOT_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) +return NULL_TREE; + /* fall through */ case TRUTH_NOT_EXPR: return TREE_OPERAND (arg, 0); @@ -3158,11 +3182,6 @@ fold_truth_not_expr (location_t loc, tre return build1_loc (loc, TREE_CODE (arg), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0))); -case BIT_AND_EXPR: - if (!integer_onep (TREE_OPERAND (arg, 1))) - return NULL_TREE; - return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); - case SAVE_EXPR: return build1_loc (loc, TRUTH_NOT_EXPR, type, arg);
[patch 2/8 tree-optimization]: Bitwise logic for fold_range_test and fold_truthop.
Hello, This patch adds support to fold_range_test and to fold_truthop for one-bit precision typed bitwise-binary and bitwise-not expressions. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_range_test): Add support for one-bit bitwise operations. (fold_truthop): Likewise. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:07:59.0 +0200 +++ gcc/gcc/fold-const.c2011-07-13 08:59:26.117620200 +0200 @@ -4819,7 +4819,8 @@ fold_range_test (location_t loc, enum tr tree op0, tree op1) { int or_op = (code == TRUTH_ORIF_EXPR - || code == TRUTH_OR_EXPR); + || code == TRUTH_OR_EXPR + || code == BIT_IOR_EXPR); int in0_p, in1_p, in_p; tree low0, low1, low, high0, high1, high; bool strict_overflow_p = false; @@ -4890,7 +4891,7 @@ fold_range_test (location_t loc, enum tr } } - return 0; + return NULL_TREE; } /* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P @@ -5118,8 +5119,9 @@ fold_truthop (location_t loc, enum tree_ } } - code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR) - ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); + if (code != BIT_AND_EXPR code != BIT_IOR_EXPR) +code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR) + ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); /* If the RHS can be evaluated unconditionally and its operands are simple, it wins to evaluate the RHS unconditionally on machines @@ -5134,7 +5136,7 @@ fold_truthop (location_t loc, enum tree_ simple_operand_p (rr_arg)) { /* Convert (a != 0) || (b != 0) into (a | b) != 0. */ - if (code == TRUTH_OR_EXPR + if ((code == TRUTH_OR_EXPR || code == BIT_IOR_EXPR) lcode == NE_EXPR integer_zerop (lr_arg) rcode == NE_EXPR integer_zerop (rr_arg) TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg) @@ -5145,7 +5147,7 @@ fold_truthop (location_t loc, enum tree_ build_int_cst (TREE_TYPE (ll_arg), 0)); /* Convert (a == 0) (b == 0) into (a | b) == 0. */ - if (code == TRUTH_AND_EXPR + if ((code == TRUTH_AND_EXPR || code == BIT_AND_EXPR) lcode == EQ_EXPR integer_zerop (lr_arg) rcode == EQ_EXPR integer_zerop (rr_arg) TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg) @@ -5209,7 +5211,8 @@ fold_truthop (location_t loc, enum tree_ fail. However, we can convert a one-bit comparison against zero into the opposite comparison against that bit being set in the field. */ - wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR); + wanted_code = ((code == TRUTH_AND_EXPR + || code == BIT_AND_EXPR) ? EQ_EXPR : NE_EXPR); if (lcode != wanted_code) { if (l_const integer_zerop (l_const) integer_pow2p (ll_mask))
[patch 3/8 tree-optimization]: Bitwise logic for fold_truth_andor.
Hello, This patch adds support to fold_truth_andor for one-bit precision typed bitwise-binary and bitwise-not expressions. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Add support for one-bit bitwise operations. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:19:22.0 +0200 +++ gcc/gcc/fold-const.c2011-07-13 08:59:14.261620200 +0200 @@ -8248,6 +8248,12 @@ fold_truth_andor (location_t loc, enum t if (!optimize) return NULL_TREE; + /* If code is BIT_AND_EXPR or BIT_IOR_EXPR, type precision has to be + one. Otherwise return NULL_TREE. */ + if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR) + (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1)) +return NULL_TREE; + /* Check for things like (A || B) (A || C). We can convert this to A || (B C). Note that either operator can be any of the four truth and/or operations and the transformation will still be @@ -8258,7 +8264,9 @@ fold_truth_andor (location_t loc, enum t (TREE_CODE (arg0) == TRUTH_ANDIF_EXPR || TREE_CODE (arg0) == TRUTH_ORIF_EXPR || TREE_CODE (arg0) == TRUTH_AND_EXPR - || TREE_CODE (arg0) == TRUTH_OR_EXPR) + || TREE_CODE (arg0) == TRUTH_OR_EXPR + || TREE_CODE (arg0) == BIT_AND_EXPR + || TREE_CODE (arg0) == BIT_IOR_EXPR) ! TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))) { tree a00 = TREE_OPERAND (arg0, 0); @@ -8266,9 +8274,13 @@ fold_truth_andor (location_t loc, enum t tree a10 = TREE_OPERAND (arg1, 0); tree a11 = TREE_OPERAND (arg1, 1); int commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR - || TREE_CODE (arg0) == TRUTH_AND_EXPR) + || TREE_CODE (arg0) == TRUTH_AND_EXPR + || TREE_CODE (arg0) == BIT_IOR_EXPR + || TREE_CODE (arg0) == BIT_AND_EXPR) (code == TRUTH_AND_EXPR -|| code == TRUTH_OR_EXPR)); +|| code == TRUTH_OR_EXPR +|| code == BIT_AND_EXPR +|| code == BIT_IOR_EXPR)); if (operand_equal_p (a00, a10, 0)) return fold_build2_loc (loc, TREE_CODE (arg0), type, a00, @@ -9484,21 +9496,29 @@ fold_binary_loc (location_t loc, if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) - ((truth_value_p (TREE_CODE (arg0)) - (truth_value_p (TREE_CODE (arg1)) + ((truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)) + (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) - || (truth_value_p (TREE_CODE (arg1)) - (truth_value_p (TREE_CODE (arg0)) + || (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1)) + (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { - tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR -: code == BIT_IOR_EXPR ? TRUTH_OR_EXPR -: TRUTH_XOR_EXPR, -boolean_type_node, -fold_convert_loc (loc, boolean_type_node, arg0), -fold_convert_loc (loc, boolean_type_node, arg1)); + enum tree_code ncode; + + /* Do we operate on a non-boolified tree? */ + if (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1) +ncode = code == BIT_AND_EXPR ? TRUTH_AND_EXPR +: (code == BIT_IOR_EXPR + ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR); + else +ncode = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR) ? code + : BIT_XOR_EXPR; + tem = fold_build2_loc (loc, ncode, + boolean_type_node, + fold_convert_loc (loc, boolean_type_node, arg0), + fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc (loc, tem);
[patch 6/8 tree-optimization]: Bitwise and logic for fold_binary_loc.
Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-and expression. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-and optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:43:37.0 +0200 +++ gcc/gcc/fold-const.c2011-07-13 08:58:38.692620200 +0200 @@ -11062,6 +11062,48 @@ fold_binary_loc (location_t loc, if (operand_equal_p (arg0, arg1, 0)) return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0)); + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) +{ + if (TREE_CODE (arg0) == INTEGER_CST ! integer_zerop (arg0)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1)); + if (TREE_CODE (arg1) == INTEGER_CST ! integer_zerop (arg1)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0)); + /* Likewise for first arg. */ + if (integer_zerop (arg0)) + return omit_one_operand_loc (loc, type, arg0, arg1); + + /* !X X is always false. ~X X is always false. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_zero_node, arg1); + /* X !X is always false. X ~X is always false. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_zero_node, arg0); + + /* (A X) (A + 1 Y) == (A X) (A = Y). Normally +A + 1 Y means (A = Y) (A != MAX), but in this case +we know that A X = MAX. */ + + if (!TREE_SIDE_EFFECTS (arg0) !TREE_SIDE_EFFECTS (arg1)) + { + tem = fold_to_nonsharp_ineq_using_bound (loc, arg0, arg1); + if (tem !operand_equal_p (tem, arg0, 0)) + return fold_build2_loc (loc, code, type, tem, arg1); + + tem = fold_to_nonsharp_ineq_using_bound (loc, arg1, arg0); + if (tem !operand_equal_p (tem, arg1, 0)) + return fold_build2_loc (loc, code, type, arg0, tem); + } + + tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1); + if (tem) + return tem; + + } + /* ~X X, (X == 0) X, and !X X are always zero. */ if ((TREE_CODE (arg0) == BIT_NOT_EXPR || TREE_CODE (arg0) == TRUTH_NOT_EXPR
[patch 7/8 tree-optimization]: Bitwise not logic for fold_unary_loc.
Hello, This patch adds support to fold_unary_loc for one-bit precision typed bitwise-not expression. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_unary_loc): Add support for one-bit bitwise-not optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:49:50.0 +0200 +++ gcc/gcc/fold-const.c2011-07-13 08:56:45.170171300 +0200 @@ -8094,6 +8094,12 @@ fold_unary_loc (location_t loc, enum tre if (i == count) return build_vector (type, nreverse (list)); } + if (INTEGRAL_TYPE_P (type) TYPE_PRECISION (type) == 1) +{ + tem = fold_truth_not_expr (loc, arg0); + if (tem) + return fold_convert_loc (loc, type, tem); + } return NULL_TREE;
[patch 8/8 tree-optimization]: Add truth_value_type_p function
Hello, This patch adds new truth_value_type_p function, which has in contrast to truth_value_p the ability to detect also bitwise-operation with boolean characteristics. This patch has to be applied first for this series, but it requires the other patches as prerequist of this series. 2011-07-13 Kai Tietz kti...@redhat.com (fold_ternary_loc): Use truth_value_type_p instead of truth_value_p. * gimple.c (canonicalize_cond_expr_cond): Likewise. * gimplify.c (gimple_boolify): Likewise. * tree-ssa-structalias.c (find_func_aliases): Likewise. * tree-ssa-forwprop.c (truth_valued_ssa_name): Likewise. * tree.h (truth_value_type_p): New function. (truth_value_p): Implemented as macro via truth_value_type_p. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/fold-const.c === --- gcc-head.orig/gcc/fold-const.c +++ gcc-head/gcc/fold-const.c @@ -13416,7 +13581,7 @@ fold_ternary_loc (location_t loc, enum t /* If the second operand is simpler than the third, swap them since that produces better jump optimization results. */ - if (truth_value_p (TREE_CODE (arg0)) + if (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)) tree_swap_operands_p (op1, op2, false)) { location_t loc0 = expr_location_or (arg0, loc); @@ -13442,7 +13607,7 @@ fold_ternary_loc (location_t loc, enum t over COND_EXPR in cases such as floating point comparisons. */ if (integer_zerop (op1) integer_onep (op2) - truth_value_p (TREE_CODE (arg0))) + truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0))) return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, invert_truthvalue_loc (loc, Index: gcc-head/gcc/gimple.c === --- gcc-head.orig/gcc/gimple.c +++ gcc-head/gcc/gimple.c @@ -3160,7 +3160,8 @@ canonicalize_cond_expr_cond (tree t) { /* Strip conversions around boolean operations. */ if (CONVERT_EXPR_P (t) - truth_value_p (TREE_CODE (TREE_OPERAND (t, 0 + truth_value_type_p (TREE_CODE (TREE_OPERAND (t, 0)), +TREE_TYPE (TREE_OPERAND (t, 0 t = TREE_OPERAND (t, 0); /* For !x use x == 0. */ Index: gcc-head/gcc/gimplify.c === --- gcc-head.orig/gcc/gimplify.c +++ gcc-head/gcc/gimplify.c @@ -2837,7 +2837,7 @@ gimple_boolify (tree expr) if (TREE_CODE (arg) == NOP_EXPR TREE_TYPE (arg) == TREE_TYPE (call)) arg = TREE_OPERAND (arg, 0); - if (truth_value_p (TREE_CODE (arg))) + if (truth_value_type_p (TREE_CODE (arg), TREE_TYPE (arg))) { arg = gimple_boolify (arg); CALL_EXPR_ARG (call, 0) Index: gcc-head/gcc/tree-ssa-structalias.c === --- gcc-head.orig/gcc/tree-ssa-structalias.c +++ gcc-head/gcc/tree-ssa-structalias.c @@ -4416,7 +4416,8 @@ find_func_aliases (gimple origt) !POINTER_TYPE_P (TREE_TYPE (rhsop || gimple_assign_single_p (t)) get_constraint_for_rhs (rhsop, rhsc); - else if (truth_value_p (code)) + else if (truth_value_type_p (code, + TREE_TYPE (lhsop))) /* Truth value results are not pointer (parts). Or at least very very unreasonable obfuscation of a part. */ ; Index: gcc-head/gcc/tree.h === --- gcc-head.orig/gcc/tree.h +++ gcc-head/gcc/tree.h @@ -5307,13 +5307,22 @@ extern tree combine_comparisons (locatio extern void debug_fold_checksum (const_tree); /* Return nonzero if CODE is a tree code that represents a truth value. */ +#define truth_value_p(CODE) truth_value_type_p ((CODE), NULL_TREE) + +/* Return nonzero if CODE is a tree code that represents a truth value. + If TYPE is an integral type, unsigned, and has precision of one, then + additionally return for bitwise-binary and bitwise-invert nonzero. */ static inline bool -truth_value_p (enum tree_code code) +truth_value_type_p (enum tree_code code, tree type) { return (TREE_CODE_CLASS (code) == tcc_comparison || code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR - || code == TRUTH_XOR_EXPR || code == TRUTH_NOT_EXPR); + || code == TRUTH_XOR_EXPR || code == TRUTH_NOT_EXPR + || ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR + || code == BIT_XOR_EXPR || code ==
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Hello William, However, it does not fix http://gcc.gnu.org/PR45671, which surprises me as it was marked as a duplicate of this one. Any thoughts on why this isn't sufficient to reassociate the linear chain of adds? Test case: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Reassociation does not work for signed integers because signed integer is not wrap-around type in C. You can change it by passing -fwrapv option but it will disable other useful optimization. Reassociation of signed integers without this option is not a trivial question because in that case you may introduce overflows and therefore undefined behavior. BR Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Ilya, please mention PR middle-end/44382 in ChangeLog. Thanks for notice. Here is corrected ChangeLog: gcc/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-12 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 11:52:25AM +0400, Ilya Enkovich wrote: However, it does not fix http://gcc.gnu.org/PR45671, which surprises me as it was marked as a duplicate of this one. Any thoughts on why this isn't sufficient to reassociate the linear chain of adds? Test case: int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) { int ret; ret = a + b + c + d + e + f + g + h; return ret; } Reassociation does not work for signed integers because signed integer is not wrap-around type in C. You can change it by passing -fwrapv option but it will disable other useful optimization. Reassociation of signed integers without this option is not a trivial question because in that case you may introduce overflows and therefore undefined behavior. Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. Jakub
Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)
On Tue, 12 Jul 2011, Ulrich Weigand wrote: Richard Guenther wrote: 2011-07-11 Richard Guenther rguent...@suse.de * tree-vrp.c (simplify_conversion_using_ranges): Manually translate the source value-range through the conversion chain. This causes a build failure in cachemgr.c on spu-elf. A slightly modified simplified test case also fails on i386-linux: void * test (unsigned long long x, unsigned long long y) { return (void *) (unsigned int) (x / y); } compiled with -O2 results in: test.i: In function 'test': test.i:3:1: error: invalid types in nop conversion void * long long unsigned int D.1962_5 = (void *) D.1963_3; test.i:3:1: internal compiler error: verify_gimple failed Any thoughts? Fix in testing. Richard. 2011-07-13 Richard Guenther rguent...@suse.de * tree-vrp.c (simplify_conversion_using_ranges): Make sure the final type is integral. * gcc.dg/torture/20110713-1.c: New testcase. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 176224) +++ gcc/tree-vrp.c (working copy) @@ -7353,6 +7353,8 @@ simplify_conversion_using_ranges (gimple double_int innermin, innermax, middlemin, middlemax; finaltype = TREE_TYPE (gimple_assign_lhs (stmt)); + if (!INTEGRAL_TYPE_P (finaltype)) +return false; middleop = gimple_assign_rhs1 (stmt); def_stmt = SSA_NAME_DEF_STMT (middleop); if (!is_gimple_assign (def_stmt) Index: gcc/testsuite/gcc.dg/torture/20110713-1.c === --- gcc/testsuite/gcc.dg/torture/20110713-1.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/20110713-1.c (revision 0) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ilp32 } */ + +void * +test (unsigned long long x, unsigned long long y) +{ +return (void *) (unsigned int) (x / y); +}
Re: [Patch, AVR]: Fix PR49687: Better widening mul 16=8*8
On 12/07/11 12:11, Bernd Schmidt wrote: On 07/12/11 13:04, Andrew Stubbs wrote: On 12/07/11 11:35, Georg-Johann Lay wrote: +(define_insn *mulsu + [(set (match_operand:HI 0 register_operand =r) +(mult:HI (sign_extend:HI (match_operand:QI 1 register_operand a)) + (zero_extend:HI (match_operand:QI 2 register_operand a] + AVR_HAVE_MUL + mulsu %1,%2 +movw %0,r0 +clr __zero_reg__ + [(set_attr length 3) + (set_attr cc clobber)]) + +(define_insn *mulus + [(set (match_operand:HI 0 register_operand =r) +(mult:HI (zero_extend:HI (match_operand:QI 1 register_operand a)) + (sign_extend:HI (match_operand:QI 2 register_operand a] + AVR_HAVE_MUL + mulsu %2,%1 +movw %0,r0 +clr __zero_reg__ + [(set_attr length 3) + (set_attr cc clobber)]) 1. You should name that usmulqihi3 (no star), so the optimizers can see it. 2. There's no need to define both of these. For one thing, putting a '%' at the start of the constraint list for operand 1 does precisely this, Unfortunately it doesn't. It won't swap the sign/zero-extend. Bernd And what is more, zero-extending one operand and sign-extending another is definitely not commutative, even if the outer multiply is.
Re: AVX generic mode tuning discussion.
On Tue, Jul 12, 2011 at 11:56 PM, Richard Henderson r...@redhat.com wrote: On 07/12/2011 02:22 PM, harsha.jaga...@amd.com wrote: We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? Will the next AMD generation have a useable avx256? I'm not keen on the idea of generic mode being tune for a single processor revision that maybe shouldn't actually be using avx at all. Btw, it looks like the data is massively skewed by 436.cactusADM. What are the overall numbers if you disregard cactus? It's also for sure the case that the vectorizer cost model has not been touched for avx256 vs. avx128 vs. sse, so a more sensible approach would be to look at differentiating things there to improve the cactus numbers. Harsha, did you investigate why avx256 is such a loss for cactus or why it is so much of a win for SB? I suppose generic tuning is of less importance for AVX as people need to enable that manually anyway (and will possibly do so only via means of -march=native). Thanks, Richard. r~
Re: Use of vector instructions in memmov/memset expanding
Hello! Please don't use -m32/-m64 in testcases directly. You should use /* { dg-do compile { target { ! ia32 } } } */ for 32bit insns and /* { dg-do compile { target { ia32 } } } */ for 64bit insns. Also, there is no need to add -mtune if -march is already specified. -mtune will follow -march. To scan for the %xmm register, you don't have to add -dp to compile flags. -dp will also dump pattern name to file, so unless you are looking for specific pattern name, you should omit -dp. Uros.
Re: PATCH: Remove -mfused-madd and add -mfma
On Wed, Jul 13, 2011 at 3:00 AM, H.J. Lu hongjiu...@intel.com wrote: Hi, -mfused-madd is deprecated and -mfma is undocumented. This patch removes -mfused-madd and documents -mfma. OK for trunk? Ok. Thanks, Richard. Thanks. H.J. --- 2011-07-12 H.J. Lu hongjiu...@intel.com * doc/invoke.texi (x86): Remove -mfused-madd and add -mfma. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index f146cc5..3429b31 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -600,7 +600,7 @@ Objective-C and Objective-C++ Dialects}. -mincoming-stack-boundary=@var{num} @gol -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip -mvzeroupper @gol -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol --maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfused-madd @gol +-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma @gol -msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlwp @gol -mthreads -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol @@ -12587,6 +12587,8 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @itemx -mno-rdrnd @itemx -mf16c @itemx -mno-f16c +@itemx -mfma +@itemx -mno-fma @itemx -msse4a @itemx -mno-sse4a @itemx -mfma4 @@ -12612,9 +12614,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @opindex mno-sse @opindex m3dnow @opindex mno-3dnow -These switches enable or disable the use of instructions in the MMX, -SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, FSGSBASE, RDRND, -F16C, SSE4A, FMA4, XOP, LWP, ABM, BMI, or 3DNow!@: extended instruction sets. +These switches enable or disable the use of instructions in the MMX, SSE, +SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, +SSE4A, FMA4, XOP, LWP, ABM, BMI, or 3DNow!@: extended instruction sets. These extensions are also available as built-in functions: see @ref{X86 Built-in Functions}, for details of the functions enabled and disabled by these switches. @@ -12633,13 +12635,6 @@ supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options. -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Do (don't) generate code that uses the fused multiply/add or multiply/subtract -instructions. The default is to use these instructions. - @item -mcld @opindex mcld This option instructs GCC to emit a @code{cld} instruction in the prologue
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, I split my old patch into 8 speparate pieces for easier review. These patches are a prerequist for enabling boolification of comparisons in gimplifier and the necessary type-cast preserving in gimple from/to boolean-type. This patch adds support to fold_truth_not_expr for one-bit precision typed bitwise-binary and bitwise-not expressions. It seems this is only necessary because we still have TRUTH_NOT_EXPR in our IL and did not replace that with BIT_NOT_EXPR consistently yet. So no, this is not ok. fold-const.c is really mostly supposed to deal with GENERIC where we distinguish TRUTH_* and BIT_* variants. Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple. Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_not_expr): Add support for one-bit bitwise operations. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 07:48:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:36.865620200 +0200 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre case INTEGER_CST: return constant_boolean_node (integer_zerop (arg), type); + case BIT_AND_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + if (integer_onep (TREE_OPERAND (arg, 1))) + return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); + /* fall through */ case TRUTH_AND_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_OR_EXPR, type, + return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR + : TRUTH_OR_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_IOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_OR_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_AND_EXPR, type, + return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR + : TRUTH_AND_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_XOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_XOR_EXPR: /* Here we can invert either operand. We invert the first operand unless the second operand is a TRUTH_NOT_EXPR in which case our @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre negation of the second operand. */ if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR) - return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0), + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), + TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); + else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR + TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1) + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); else - return build2_loc (loc, TRUTH_XOR_EXPR, type, + return build2_loc (loc, code, type, invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)), TREE_OPERAND (arg, 1)); @@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_NOT_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through */ case TRUTH_NOT_EXPR: return TREE_OPERAND (arg, 0); @@ -3158,11 +3182,6 @@ fold_truth_not_expr (location_t loc, tre return build1_loc (loc, TREE_CODE (arg), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0))); - case BIT_AND_EXPR: - if (!integer_onep (TREE_OPERAND (arg, 1))) - return NULL_TREE; - return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); - case SAVE_EXPR: return build1_loc (loc, TRUTH_NOT_EXPR, type, arg);
[patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X
Hello, this patch fixes that for replaced uses, we call fold_stmt_inplace. Additionally it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 for X with one-bit precision type. ChangeLog gcc/ 2011-07-13 Kai Tietz kti...@redhat.com * gimple-fold.c (fold_gimple_assign): Add normalization for compares of 1-bit integer precision operands. * tree-ssa-propagate.c (replace_uses_in): Call fold_stmt_inplace on modified statement. ChangeLog gcc/testsuite 2011-07-13 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/fold-1.c: New test. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/gimple-fold.c === --- gcc.orig/gcc/gimple-fold.c 2011-07-13 10:37:32.0 +0200 +++ gcc/gcc/gimple-fold.c 2011-07-13 10:39:05.100843400 +0200 @@ -815,6 +815,17 @@ fold_gimple_assign (gimple_stmt_iterator gimple_assign_rhs2 (stmt)); } + if (!result (subcode == EQ_EXPR || subcode == NE_EXPR) + INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))) + TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) == 1 + integer_onep (gimple_assign_rhs2 (stmt))) + result = build2_loc (loc, (subcode == EQ_EXPR ? NE_EXPR : EQ_EXPR), +TREE_TYPE (gimple_assign_lhs (stmt)), +gimple_assign_rhs1 (stmt), +fold_convert_loc (loc, + TREE_TYPE (gimple_assign_rhs1 (stmt)), + integer_zero_node)); + if (!result) result = fold_binary_loc (loc, subcode, TREE_TYPE (gimple_assign_lhs (stmt)), Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c 2011-07-13 10:50:38.294367800 +0200 @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +int foo (_Bool a, _Bool b) +{ + return a != ((b | !b)); +} +/* { dg-final { scan-tree-dump-not != 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/gcc/tree-ssa-propagate.c === --- gcc.orig/gcc/tree-ssa-propagate.c 2011-07-13 10:37:42.0 +0200 +++ gcc/gcc/tree-ssa-propagate.c2011-07-13 10:40:25.688576800 +0200 @@ -904,6 +904,8 @@ replace_uses_in (gimple stmt, ssa_prop_g propagate_value (use, val); + fold_stmt_inplace (stmt); + replaced = true; }
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. Jakub You are right. But in this case we again make all operands have wrap-around type and thus disable some other optimization. It would be nice to have opportunity to reassociate and still have undefined behavior on overflow for optimizations. One way to do it for add/sub is to use wider type (long long instead of int). Ilya
Re: AVX generic mode tuning discussion.
On Wed, Jul 13, 2011 at 10:42:41AM +0200, Richard Guenther wrote: I suppose generic tuning is of less importance for AVX as people need to enable that manually anyway (and will possibly do so only via means of -march=native). Yeah, but if somebody does compile with -mavx -mtune=generic, I'd expect the intent is that he wants fastest code not just on current generation of CPUs, but on the next few following ones, and I'd say that being able to use twice as big vectorization factor ought to be a win in most cases if the cost model gets it right. If not for the vectorization factor doubling, what would be reasons why somebody would compile code with -mavx -mtune=generic and rule out support for many recent chips? Yeah, there are the 2 operand forms and such code can avoid penalty when mixed with AVX256 code, but would that be strong reason enough to lose the support of most of the recent CPUs? When targeting just a particular CPU and using -march= with CPU which already includes AVX, -mtune=generic probably doesn't make much sense, you probably want -march=native and you are optimizing for the CPU you have. Jakub
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. To avoid it, it needs to learn to handle 1-bit precision folding for those bitwise-operation on 1-bit integer types special. As gimple replies on this FE fold for now, it has to be learn about that. As soon as gimple_fold (and other passes) don't rely anymore on FE's fold-const, then we can remove those parts again. Otherwise this boolification of compares (and also the transition of TRUTH_NOT - BIT_NOT, simply doesn't work so long. Regards, Kai 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, I split my old patch into 8 speparate pieces for easier review. These patches are a prerequist for enabling boolification of comparisons in gimplifier and the necessary type-cast preserving in gimple from/to boolean-type. This patch adds support to fold_truth_not_expr for one-bit precision typed bitwise-binary and bitwise-not expressions. It seems this is only necessary because we still have TRUTH_NOT_EXPR in our IL and did not replace that with BIT_NOT_EXPR consistently yet. So no, this is not ok. fold-const.c is really mostly supposed to deal with GENERIC where we distinguish TRUTH_* and BIT_* variants. Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple. Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_not_expr): Add support for one-bit bitwise operations. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 07:48:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:36.865620200 +0200 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre case INTEGER_CST: return constant_boolean_node (integer_zerop (arg), type); + case BIT_AND_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + if (integer_onep (TREE_OPERAND (arg, 1))) + return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); + /* fall through */ case TRUTH_AND_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_OR_EXPR, type, + return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR + : TRUTH_OR_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_IOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_OR_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_AND_EXPR, type, + return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR + : TRUTH_AND_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_XOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_XOR_EXPR: /* Here we can invert either operand. We invert the first operand unless the second operand is a TRUTH_NOT_EXPR in which case our @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre negation of the second operand. */ if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR) - return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0), + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), + TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); + else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR + TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1) + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); else - return build2_loc (loc, TRUTH_XOR_EXPR, type, + return build2_loc (loc, code, type, invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)), TREE_OPERAND (arg, 1)); @@ -3116,6 +3136,10 @@ fold_truth_not_expr (location_t loc, tre invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_NOT_EXPR: +
Re: [Patch 1/3] ARM 64 bit atomic operations
On 12 July 2011 22:07, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Hi Dave, Hi Ramana, Thanks for the review. Could you split this further into a patch that deals with the case for disabling MCR memory barriers for Thumb1 so that it maybe backported to the release branches ? I have commented inline as well. Sure. Could you also provide a proper changelog entry for this that will also help with review of the patch ? Yep, no problem. I've not yet managed to fully review all the bits in this patch but here's some initial comments that should be looked at. On 1 July 2011 16:54, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c snip + if (is_di) + { + arm_output_asm_insn (emit, 0, operands, it\teq); This should be guarded with a if (TARGET_THUMB2) - there's no point in accounting for the length of this instruction in the compiler and then have the assembler fold it away in ARM state. OK; the length accounting seems pretty broken anyway; I think it assumes all instructions are 4 bytes. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index c32ef1a..3fdd22f 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -282,7 +282,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void); -#define TARGET_HAVE_DMB_MCR (arm_arch6k ! TARGET_HAVE_DMB) +#define TARGET_HAVE_DMB_MCR (arm_arch6k ! TARGET_HAVE_DMB \ + ! TARGET_THUMB1) This hunk (TARGET_HAVE_DMB_MCR) should probably be backported to release branches because this is technically fixing an issue and hence should be a separate patch that can be looked at separately. OK, will do. /* Nonzero if this chip implements a memory barrier instruction. */ #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR) @@ -290,8 +291,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void); sync.md changes - (define_mode_iterator NARROW [QI HI]) +(define_mode_iterator QHSD [QI HI SI DI]) +(define_mode_iterator SIDI [SI DI]) + +(define_mode_attr sync_predtab [(SI TARGET_HAVE_LDREX TARGET_HAVE_MEMORY_BARRIER) + (QI TARGET_HAVE_LDREXBH TARGET_HAVE_MEMORY_BARRIER) + (HI TARGET_HAVE_LDREXBH TARGET_HAVE_MEMORY_BARRIER) + (DI TARGET_HAVE_LDREXD ARM_DOUBLEWORD_ALIGN TARGET_HAVE_MEMORY_BARRIER)]) + Can we move all the iterators to iterators.md and then arrange includes to work automatically ? Minor nit - could you align the entries for QI, HI and DI with the start of the SI ? Yes I can do - the only odd thing is I guess the sync_predtab is very sync.md specific, does it really make sense for that to be in iterators.md ? +(define_mode_attr sync_atleastsi [(SI SI) + (DI DI) + (HI SI) + (QI SI)]) I couldn't spot where this was being used. Can this be removed if not necessary ? Ah - yes I think that's dead; it's a relic from an attempt to merge some of the other narrow cases into the same iterator but it got way too messy. -(define_insn arm_sync_new_nandsi +(define_insn arm_sync_new_sync_optabmode [(set (match_operand:SI 0 s_register_operand =r) - (unspec_volatile:SI [(not:SI (and:SI - (match_operand:SI 1 arm_sync_memory_operand +Q) - (match_operand:SI 2 s_register_operand r))) - ] - VUNSPEC_SYNC_NEW_OP)) + (unspec_volatile:SI [(syncop:SI + (zero_extend:SI + (match_operand:NARROW 1 arm_sync_memory_operand +Q)) + (match_operand:SI 2 s_register_operand r)) + ] + VUNSPEC_SYNC_NEW_OP)) (set (match_dup 1) - (unspec_volatile:SI [(match_dup 1) (match_dup 2)] - VUNSPEC_SYNC_NEW_OP)) + (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)] + VUNSPEC_SYNC_NEW_OP)) (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:SI 3 =r))] - TARGET_HAVE_LDREX TARGET_HAVE_MEMORY_BARRIER + TARGET_HAVE_LDREXBH TARGET_HAVE_MEMORY_BARRIER Can't this just use sync_predtab instead since the condition is identical for QImode and HImode from that mode attribute and in quite a few places below. ? Hmm yes it can - I'd only been using predtab in the places where it was varying on the mode; but as you say this can be converted as well. @@ -461,19 +359,19 @@ (unspec_volatile:SI [(not:SI (and:SI - (zero_extend:SI - (match_operand:NARROW 1 arm_sync_memory_operand +Q)) - (match_operand:SI 2 s_register_operand r))) + (zero_extend:SI +
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 01:01:59PM +0400, Ilya Enkovich wrote: Well, if it is clearly a win to reassociate, you can always reassociate them by doing arithmetics in corresponding unsigned type and afterwards converting back to the signed type. You are right. But in this case we again make all operands have wrap-around type and thus disable some other optimization. It would be nice to have opportunity to reassociate and still have undefined behavior on overflow for optimizations. One way to do it for add/sub is to use wider type (long long instead of int). I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/13 Jakub Jelinek ja...@redhat.com: I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub Sounds reasonable. Type casting to unsigned should be a better solution here. Ilya
Re: CFT: [build] Move soft-fp support to toplevel libgcc
Hallo Rainer! On Tue, 12 Jul 2011 19:22:51 +0200, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: 2011-07-09 Rainer Orth r...@cebitec.uni-bielefeld.de gcc: [...] * config.gcc ([...] (i[34567]86-*-darwin*): Remove i386/t-fprules-softfp, soft-fp/t-softfp from tmake_file. (i[34567]86-*-linux*): Likewise. [...] i[34567]86-*-linux* | x86_64-*-linux* | \ i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \ i[34567]86-*-gnu*) - tmake_file=${tmake_file} i386/t-fprules-softfp soft-fp/t-softfp i386/t-linux ;; This also removes i386/t-linux from tmake_file, which might not be what you intended? Grüße, Thomas pgpPt13NlR0kd.pgp Description: PGP signature
Re: [build] Remove crt0, mcrt0 support
Paolo Bonzini bonz...@gnu.org writes: On 07/12/2011 06:45 PM, Rainer Orth wrote: +crt0.o: $(srcdir)/config/i386/netware-crt0.c +$(crt_commpile) $(CRTSTUFF_T_CFLAGS) -c $ Typo here. Otherwise looks good, thanks. Fixed and installed. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: CFT: [build] Move soft-fp support to toplevel libgcc
Hi Thomas, i[34567]86-*-linux* | x86_64-*-linux* | \ i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \ i[34567]86-*-gnu*) -tmake_file=${tmake_file} i386/t-fprules-softfp soft-fp/t-softfp i386/t-linux ;; This also removes i386/t-linux from tmake_file, which might not be what you intended? indeed not. Will fix in my local copy. Thanks for noticing. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: The TI C6X port
On 05/25/11 02:29, Vladimir Makarov wrote: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00750.html Ok. But changelog entry for sched_change_pattern is absent. I've committed this with a slight change in sched_change_pattern; another patch I'm working on showed a need to also clear the cached cost for resolved dependencies. Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 176225) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2011-07-13 Bernd Schmidt ber...@codesourcery.com + + * sched-int.h (struct _dep): Add member cost. + (DEP_COST, UNKNOWN_DEP_COST): New macros. + * sched-deps.c (init_dep_1): Initialize DEP_COST. + * haifa-sched.c (dep_cost_1): Use and set DEP_COST. + (sched_change_pattern): Reset it for dependent insns. + 2011-07-13 Rainer Orth r...@cebitec.uni-bielefeld.de * Makefile.in (CRT0STUFF_T_CFLAGS): Remove. Index: gcc/haifa-sched.c === --- gcc/haifa-sched.c (revision 176171) +++ gcc/haifa-sched.c (working copy) @@ -854,6 +854,9 @@ dep_cost_1 (dep_t link, dw_t dw) rtx used = DEP_CON (link); int cost; + if (DEP_COST (link) != UNKNOWN_DEP_COST) +return DEP_COST (link); + /* A USE insn should never require the value used to be computed. This allows the computation of a function's result and parameter values to overlap the return and call. We don't care about the @@ -911,6 +914,7 @@ dep_cost_1 (dep_t link, dw_t dw) cost = 0; } + DEP_COST (link) = cost; return cost; } @@ -4864,11 +4868,21 @@ fix_recovery_deps (basic_block rec) void sched_change_pattern (rtx insn, rtx new_pat) { + sd_iterator_def sd_it; + dep_t dep; int t; t = validate_change (insn, PATTERN (insn), new_pat, 0); gcc_assert (t); dfa_clear_single_insn_cache (insn); + + for (sd_it = sd_iterator_start (insn, (SD_LIST_FORW | SD_LIST_BACK +| SD_LIST_RES_BACK)); + sd_iterator_cond (sd_it, dep);) +{ + DEP_COST (dep) = UNKNOWN_DEP_COST; + sd_iterator_next (sd_it); +} } /* Change pattern of INSN to NEW_PAT. Invalidate cached haifa Index: gcc/sched-deps.c === --- gcc/sched-deps.c(revision 176171) +++ gcc/sched-deps.c(working copy) @@ -107,6 +107,7 @@ init_dep_1 (dep_t dep, rtx pro, rtx con, DEP_CON (dep) = con; DEP_TYPE (dep) = type; DEP_STATUS (dep) = ds; + DEP_COST (dep) = UNKNOWN_DEP_COST; } /* Init DEP with the arguments. Index: gcc/sched-int.h === --- gcc/sched-int.h (revision 176171) +++ gcc/sched-int.h (working copy) @@ -215,6 +215,9 @@ struct _dep /* Dependency status. This field holds all dependency types and additional information for speculative dependencies. */ ds_t status; + + /* Cached cost of the dependency. */ + int cost; }; typedef struct _dep dep_def; @@ -224,6 +227,9 @@ typedef dep_def *dep_t; #define DEP_CON(D) ((D)-con) #define DEP_TYPE(D) ((D)-type) #define DEP_STATUS(D) ((D)-status) +#define DEP_COST(D) ((D)-cost) + +#define UNKNOWN_DEP_COST INT_MIN /* Functions to work with dep. */
Re: [patch 2/8 tree-optimization]: Bitwise logic for fold_range_test and fold_truthop.
On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_range_test and to fold_truthop for one-bit precision typed bitwise-binary and bitwise-not expressions. This looks reasonable but I'd like to see testcases excercising the foldings (by scanning the .original dump). Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_range_test): Add support for one-bit bitwise operations. (fold_truthop): Likewise. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:07:59.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:26.117620200 +0200 @@ -4819,7 +4819,8 @@ fold_range_test (location_t loc, enum tr tree op0, tree op1) { int or_op = (code == TRUTH_ORIF_EXPR - || code == TRUTH_OR_EXPR); + || code == TRUTH_OR_EXPR + || code == BIT_IOR_EXPR); int in0_p, in1_p, in_p; tree low0, low1, low, high0, high1, high; bool strict_overflow_p = false; @@ -4890,7 +4891,7 @@ fold_range_test (location_t loc, enum tr } } - return 0; + return NULL_TREE; } /* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P @@ -5118,8 +5119,9 @@ fold_truthop (location_t loc, enum tree_ } } - code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR) - ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); + if (code != BIT_AND_EXPR code != BIT_IOR_EXPR) + code = ((code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR) + ? TRUTH_AND_EXPR : TRUTH_OR_EXPR); /* If the RHS can be evaluated unconditionally and its operands are simple, it wins to evaluate the RHS unconditionally on machines @@ -5134,7 +5136,7 @@ fold_truthop (location_t loc, enum tree_ simple_operand_p (rr_arg)) { /* Convert (a != 0) || (b != 0) into (a | b) != 0. */ - if (code == TRUTH_OR_EXPR + if ((code == TRUTH_OR_EXPR || code == BIT_IOR_EXPR) lcode == NE_EXPR integer_zerop (lr_arg) rcode == NE_EXPR integer_zerop (rr_arg) TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg) @@ -5145,7 +5147,7 @@ fold_truthop (location_t loc, enum tree_ build_int_cst (TREE_TYPE (ll_arg), 0)); /* Convert (a == 0) (b == 0) into (a | b) == 0. */ - if (code == TRUTH_AND_EXPR + if ((code == TRUTH_AND_EXPR || code == BIT_AND_EXPR) lcode == EQ_EXPR integer_zerop (lr_arg) rcode == EQ_EXPR integer_zerop (rr_arg) TREE_TYPE (ll_arg) == TREE_TYPE (rl_arg) @@ -5209,7 +5211,8 @@ fold_truthop (location_t loc, enum tree_ fail. However, we can convert a one-bit comparison against zero into the opposite comparison against that bit being set in the field. */ - wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR); + wanted_code = ((code == TRUTH_AND_EXPR + || code == BIT_AND_EXPR) ? EQ_EXPR : NE_EXPR); if (lcode != wanted_code) { if (l_const integer_zerop (l_const) integer_pow2p (ll_mask))
Re: [patch 3/8 tree-optimization]: Bitwise logic for fold_truth_andor.
On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_truth_andor for one-bit precision typed bitwise-binary and bitwise-not expressions. Quickly checking some testcases shows we already perform all the foldings in other places. So please _always_ check for all transformations you add if there is a testcase that fails before and passes after your patch. (A|B)(A|C) is already folded to (BC)|A. Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Add support for one-bit bitwise operations. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:19:22.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:14.261620200 +0200 @@ -8248,6 +8248,12 @@ fold_truth_andor (location_t loc, enum t if (!optimize) return NULL_TREE; + /* If code is BIT_AND_EXPR or BIT_IOR_EXPR, type precision has to be + one. Otherwise return NULL_TREE. */ + if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR) + (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1)) + return NULL_TREE; + /* Check for things like (A || B) (A || C). We can convert this to A || (B C). Note that either operator can be any of the four truth and/or operations and the transformation will still be @@ -8258,7 +8264,9 @@ fold_truth_andor (location_t loc, enum t (TREE_CODE (arg0) == TRUTH_ANDIF_EXPR || TREE_CODE (arg0) == TRUTH_ORIF_EXPR || TREE_CODE (arg0) == TRUTH_AND_EXPR - || TREE_CODE (arg0) == TRUTH_OR_EXPR) + || TREE_CODE (arg0) == TRUTH_OR_EXPR + || TREE_CODE (arg0) == BIT_AND_EXPR + || TREE_CODE (arg0) == BIT_IOR_EXPR) ! TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))) { tree a00 = TREE_OPERAND (arg0, 0); @@ -8266,9 +8274,13 @@ fold_truth_andor (location_t loc, enum t tree a10 = TREE_OPERAND (arg1, 0); tree a11 = TREE_OPERAND (arg1, 1); int commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR - || TREE_CODE (arg0) == TRUTH_AND_EXPR) + || TREE_CODE (arg0) == TRUTH_AND_EXPR + || TREE_CODE (arg0) == BIT_IOR_EXPR + || TREE_CODE (arg0) == BIT_AND_EXPR) (code == TRUTH_AND_EXPR - || code == TRUTH_OR_EXPR)); + || code == TRUTH_OR_EXPR + || code == BIT_AND_EXPR + || code == BIT_IOR_EXPR)); if (operand_equal_p (a00, a10, 0)) return fold_build2_loc (loc, TREE_CODE (arg0), type, a00, @@ -9484,21 +9496,29 @@ fold_binary_loc (location_t loc, if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) - ((truth_value_p (TREE_CODE (arg0)) - (truth_value_p (TREE_CODE (arg1)) + ((truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)) + (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) - || (truth_value_p (TREE_CODE (arg1)) - (truth_value_p (TREE_CODE (arg0)) + || (truth_value_type_p (TREE_CODE (arg1), TREE_TYPE (arg1)) + (truth_value_type_p (TREE_CODE (arg0), TREE_TYPE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { - tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR - : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR - : TRUTH_XOR_EXPR, - boolean_type_node, - fold_convert_loc (loc, boolean_type_node, arg0), - fold_convert_loc (loc, boolean_type_node, arg1)); + enum tree_code ncode; + + /* Do we operate on a non-boolified tree? */ + if (!INTEGRAL_TYPE_P (type) || TYPE_PRECISION (type) != 1) + ncode = code == BIT_AND_EXPR ? TRUTH_AND_EXPR + : (code == BIT_IOR_EXPR + ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR); + else + ncode = (code == BIT_AND_EXPR || code == BIT_IOR_EXPR) ? code + : BIT_XOR_EXPR; + tem = fold_build2_loc (loc, ncode, + boolean_type_node, + fold_convert_loc (loc, boolean_type_node, arg0), + fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc
Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.
On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-or expression. Seems to be a fallout of the missing TRUTH_NOT conversion as well. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-or optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:23:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:04.011620200 +0200 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc, return omit_one_operand_loc (loc, type, t1, arg0); } + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + /* If arg0 is constant zero, drop it. */ + if (TREE_CODE (arg0) == INTEGER_CST integer_zerop (arg0)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1)); + if (TREE_CODE (arg0) == INTEGER_CST ! integer_zerop (arg0)) + return omit_one_operand_loc (loc, type, arg0, arg1); + + /* !X | X is always true. ~X | X is always true. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg1); + /* X | !X is always true. X | ~X is always true. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg0); + + /* (X !Y) | (!X Y) is X ^ Y */ + if (TREE_CODE (arg0) == BIT_AND_EXPR + TREE_CODE (arg1) == BIT_AND_EXPR) + { + tree a0, a1, l0, l1, n0, n1; + + a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0)); + a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1)); + + l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); + l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1)); + + n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0); + n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1); + + if ((operand_equal_p (n0, a0, 0) + operand_equal_p (n1, a1, 0)) + || (operand_equal_p (n0, a1, 0) + operand_equal_p (n1, a0, 0))) + return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1); + } + + tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1); + if (tem) + return tem; + } + /* Canonicalize (X C1) | C2. */ if (TREE_CODE (arg0) == BIT_AND_EXPR TREE_CODE (arg1) == INTEGER_CST
Re: [patch 5/8 tree-optimization]: Bitwise xor logic for fold_binary_loc.
On Wed, Jul 13, 2011 at 9:34 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-xor expression. Similar - we don't want to build a TRUTH_NOT_EXPR from a BIT_XOR_EXPR. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-xor optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:38:06.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:58:52.686620200 +0200 @@ -10872,11 +10872,35 @@ fold_binary_loc (location_t loc, case BIT_XOR_EXPR: if (integer_zerop (arg1)) return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0)); - if (integer_all_onesp (arg1)) - return fold_build1_loc (loc, BIT_NOT_EXPR, type, op0); if (operand_equal_p (arg0, arg1, 0)) return omit_one_operand_loc (loc, type, integer_zero_node, arg0); + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + /* If the second arg is constant true, this is a logical inversion. */ + if (integer_onep (arg1)) + { + tem = invert_truthvalue_loc (loc, arg0); + return non_lvalue_loc (loc, fold_convert_loc (loc, type, tem)); + } + } + else if (integer_all_onesp (arg1)) + return fold_build1_loc (loc, BIT_NOT_EXPR, type, op0); + + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + /* !X ^ X is always true. ~X ^X is always true. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg1); + /* X ^ !X is always true. X ^ ~X is always true. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg0); + } + /* ~X ^ X is -1. */ if (TREE_CODE (arg0) == BIT_NOT_EXPR operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) @@ -10911,7 +10935,7 @@ fold_binary_loc (location_t loc, goto bit_ior; } - /* (X | Y) ^ X - Y ~ X*/ + /* (X | Y) ^ X - Y ~ X. */ if (TREE_CODE (arg0) == BIT_IOR_EXPR operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) {
Re: [patch 6/8 tree-optimization]: Bitwise and logic for fold_binary_loc.
On Wed, Jul 13, 2011 at 9:34 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-and expression. Similar ... your patch descriptions are useless btw. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-and optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:43:37.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:58:38.692620200 +0200 @@ -11062,6 +11062,48 @@ fold_binary_loc (location_t loc, if (operand_equal_p (arg0, arg1, 0)) return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0)); + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + if (TREE_CODE (arg0) == INTEGER_CST ! integer_zerop (arg0)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1)); + if (TREE_CODE (arg1) == INTEGER_CST ! integer_zerop (arg1)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0)); + /* Likewise for first arg. */ + if (integer_zerop (arg0)) + return omit_one_operand_loc (loc, type, arg0, arg1); + + /* !X X is always false. ~X X is always false. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_zero_node, arg1); + /* X !X is always false. X ~X is always false. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_zero_node, arg0); + + /* (A X) (A + 1 Y) == (A X) (A = Y). Normally + A + 1 Y means (A = Y) (A != MAX), but in this case + we know that A X = MAX. */ + + if (!TREE_SIDE_EFFECTS (arg0) !TREE_SIDE_EFFECTS (arg1)) + { + tem = fold_to_nonsharp_ineq_using_bound (loc, arg0, arg1); + if (tem !operand_equal_p (tem, arg0, 0)) + return fold_build2_loc (loc, code, type, tem, arg1); + + tem = fold_to_nonsharp_ineq_using_bound (loc, arg1, arg0); + if (tem !operand_equal_p (tem, arg1, 0)) + return fold_build2_loc (loc, code, type, arg0, tem); + } + + tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1); + if (tem) + return tem; + + } + /* ~X X, (X == 0) X, and !X X are always zero. */ if ((TREE_CODE (arg0) == BIT_NOT_EXPR || TREE_CODE (arg0) == TRUTH_NOT_EXPR
Re: [patch 7/8 tree-optimization]: Bitwise not logic for fold_unary_loc.
On Wed, Jul 13, 2011 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_unary_loc for one-bit precision typed bitwise-not expression. Similar ... ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_unary_loc): Add support for one-bit bitwise-not optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:49:50.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:56:45.170171300 +0200 @@ -8094,6 +8094,12 @@ fold_unary_loc (location_t loc, enum tre if (i == count) return build_vector (type, nreverse (list)); } + if (INTEGRAL_TYPE_P (type) TYPE_PRECISION (type) == 1) + { + tem = fold_truth_not_expr (loc, arg0); + if (tem) + return fold_convert_loc (loc, type, tem); + } return NULL_TREE;
Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.
2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-or expression. Seems to be a fallout of the missing TRUTH_NOT conversion as well. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-or optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:23:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:04.011620200 +0200 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc, return omit_one_operand_loc (loc, type, t1, arg0); } + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + /* If arg0 is constant zero, drop it. */ + if (TREE_CODE (arg0) == INTEGER_CST integer_zerop (arg0)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1)); + if (TREE_CODE (arg0) == INTEGER_CST ! integer_zerop (arg0)) + return omit_one_operand_loc (loc, type, arg0, arg1); + + /* !X | X is always true. ~X | X is always true. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg1); + /* X | !X is always true. X | ~X is always true. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg0); + + /* (X !Y) | (!X Y) is X ^ Y */ + if (TREE_CODE (arg0) == BIT_AND_EXPR + TREE_CODE (arg1) == BIT_AND_EXPR) + { + tree a0, a1, l0, l1, n0, n1; + + a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0)); + a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1)); + + l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); + l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1)); + + n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0); + n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1); + + if ((operand_equal_p (n0, a0, 0) + operand_equal_p (n1, a1, 0)) + || (operand_equal_p (n0, a1, 0) + operand_equal_p (n1, a0, 0))) + return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1); + } + + tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1); + if (tem) + return tem; + } + /* Canonicalize (X C1) | C2. */ if (TREE_CODE (arg0) == BIT_AND_EXPR TREE_CODE (arg1) == INTEGER_CST Well, I wouldn't call it fallout. As by this we are able to handle things like ~(X = B) and see that it can be converted to X B. The point here is that we avoid that fold re-introduces here the TRUTH variants for the bitwise ones (for sure some parts are redudant and might be something to be factored out like we did for truth_andor function). Also we catch by this patterns like ~X op ~Y and convert them to ~(X op Y), which is just valid for one-bit precision typed X and Y. As in general !x is not the same as ~x, beside x has one-bit precision integeral type. I will adjust patches so, that for one-bit precision type we alway use here instead BIT_NOT_EXPR (instead of TRUTH_NOT). This is reasonable.
Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X
On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this patch fixes that for replaced uses, we call fold_stmt_inplace. Additionally it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 for X with one-bit precision type. ChangeLog gcc/ 2011-07-13 Kai Tietz kti...@redhat.com * gimple-fold.c (fold_gimple_assign): Add normalization for compares of 1-bit integer precision operands. * tree-ssa-propagate.c (replace_uses_in): Call fold_stmt_inplace on modified statement. err - sure not. The caller already does that. Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value, fold_fn=0, do_dce=1 '\001') at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134 1134 if (get_value_fn) D.2696_8 = a_1(D) != D.2704_10; (gdb) n 1135did_replace |= replace_uses_in (stmt, get_value_fn); (gdb) 1138 if (did_replace) (gdb) call debug_gimple_stmt (stmt) D.2696_8 = a_1(D) != 1; (gdb) p did_replace $1 = 1 '\001' (gdb) n 1139fold_stmt (oldi); so figure out why fold_stmt does not do its work instead. Which I quickly checked in gdb and it dispatches to fold_binary with boolean-typed arguments as a_1 != 1 where you can see the canonical form for this is !(int) a_1 because of a bug I think. /* bool_var != 1 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE integer_onep (arg1) code == NE_EXPR) return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_convert_loc (loc, type, arg0)); at least I don't see why we need to convert arg0 to the type of the comparison. You need to improve your debugging skills and see why existing transformations are not working before adding new ones. Richard. ChangeLog gcc/testsuite 2011-07-13 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/fold-1.c: New test. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/gimple-fold.c === --- gcc.orig/gcc/gimple-fold.c 2011-07-13 10:37:32.0 +0200 +++ gcc/gcc/gimple-fold.c 2011-07-13 10:39:05.100843400 +0200 @@ -815,6 +815,17 @@ fold_gimple_assign (gimple_stmt_iterator gimple_assign_rhs2 (stmt)); } + if (!result (subcode == EQ_EXPR || subcode == NE_EXPR) + INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))) + TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt))) == 1 + integer_onep (gimple_assign_rhs2 (stmt))) + result = build2_loc (loc, (subcode == EQ_EXPR ? NE_EXPR : EQ_EXPR), + TREE_TYPE (gimple_assign_lhs (stmt)), + gimple_assign_rhs1 (stmt), + fold_convert_loc (loc, + TREE_TYPE (gimple_assign_rhs1 (stmt)), + integer_zero_node)); + if (!result) result = fold_binary_loc (loc, subcode, TREE_TYPE (gimple_assign_lhs (stmt)), Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ gcc/gcc/testsuite/gcc.dg/tree-ssa/fold-1.c 2011-07-13 10:50:38.294367800 +0200 @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +int foo (_Bool a, _Bool b) +{ + return a != ((b | !b)); +} +/* { dg-final { scan-tree-dump-not != 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Index: gcc/gcc/tree-ssa-propagate.c === --- gcc.orig/gcc/tree-ssa-propagate.c 2011-07-13 10:37:42.0 +0200 +++ gcc/gcc/tree-ssa-propagate.c 2011-07-13 10:40:25.688576800 +0200 @@ -904,6 +904,8 @@ replace_uses_in (gimple stmt, ssa_prop_g propagate_value (use, val); + fold_stmt_inplace (stmt); + replaced = true; }
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote: Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. To avoid it, it needs to learn to handle 1-bit precision folding for those bitwise-operation on 1-bit integer types special. As gimple replies on this FE fold for now, it has to be learn about that. As soon as gimple_fold (and other passes) don't rely anymore on FE's fold-const, then we can remove those parts again. Otherwise this boolification of compares (and also the transition of TRUTH_NOT - BIT_NOT, simply doesn't work so long. I do not believe that. Regards, Kai 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, I split my old patch into 8 speparate pieces for easier review. These patches are a prerequist for enabling boolification of comparisons in gimplifier and the necessary type-cast preserving in gimple from/to boolean-type. This patch adds support to fold_truth_not_expr for one-bit precision typed bitwise-binary and bitwise-not expressions. It seems this is only necessary because we still have TRUTH_NOT_EXPR in our IL and did not replace that with BIT_NOT_EXPR consistently yet. So no, this is not ok. fold-const.c is really mostly supposed to deal with GENERIC where we distinguish TRUTH_* and BIT_* variants. Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple. Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_not_expr): Add support for one-bit bitwise operations. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 07:48:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:36.865620200 +0200 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre case INTEGER_CST: return constant_boolean_node (integer_zerop (arg), type); + case BIT_AND_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + if (integer_onep (TREE_OPERAND (arg, 1))) + return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); + /* fall through */ case TRUTH_AND_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_OR_EXPR, type, + return build2_loc (loc, (code == BIT_AND_EXPR ? BIT_IOR_EXPR + : TRUTH_OR_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_IOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_OR_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc); loc2 = expr_location_or (TREE_OPERAND (arg, 1), loc); - return build2_loc (loc, TRUTH_AND_EXPR, type, + return build2_loc (loc, (code == BIT_IOR_EXPR ? BIT_AND_EXPR + : TRUTH_AND_EXPR), type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0)), invert_truthvalue_loc (loc2, TREE_OPERAND (arg, 1))); + case BIT_XOR_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + /* fall through. */ case TRUTH_XOR_EXPR: /* Here we can invert either operand. We invert the first operand unless the second operand is a TRUTH_NOT_EXPR in which case our @@ -3095,10 +3111,14 @@ fold_truth_not_expr (location_t loc, tre negation of the second operand. */ if (TREE_CODE (TREE_OPERAND (arg, 1)) == TRUTH_NOT_EXPR) - return build2_loc (loc, TRUTH_XOR_EXPR, type, TREE_OPERAND (arg, 0), + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), + TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); + else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BIT_NOT_EXPR + TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 1))) == 1) + return build2_loc (loc, code, type, TREE_OPERAND (arg, 0), TREE_OPERAND (TREE_OPERAND (arg, 1), 0)); else - return build2_loc (loc, TRUTH_XOR_EXPR, type, + return build2_loc (loc, code, type, invert_truthvalue_loc (loc, TREE_OPERAND (arg, 0)), TREE_OPERAND (arg, 1)); @@
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Wed, Jul 13, 2011 at 11:18 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/7/13 Jakub Jelinek ja...@redhat.com: I disagree. Widening would result in worse code in most cases, as you need to sign extend all the operands. On the other side, I doubt you can actually usefully use the undefinedness of signed overflow for a series of 3 or more operands of the associative operation. Jakub Sounds reasonable. Type casting to unsigned should be a better solution here. Well, the solution of course lies in the no-undefined-overflow branch where we have separate tree codes for arithmetic with/without undefined overflow. Richard. Ilya
Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X
2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this patch fixes that for replaced uses, we call fold_stmt_inplace. Additionally it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 for X with one-bit precision type. ChangeLog gcc/ 2011-07-13 Kai Tietz kti...@redhat.com * gimple-fold.c (fold_gimple_assign): Add normalization for compares of 1-bit integer precision operands. * tree-ssa-propagate.c (replace_uses_in): Call fold_stmt_inplace on modified statement. err - sure not. The caller already does that. Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value, fold_fn=0, do_dce=1 '\001') at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134 1134 if (get_value_fn) D.2696_8 = a_1(D) != D.2704_10; (gdb) n 1135 did_replace |= replace_uses_in (stmt, get_value_fn); (gdb) 1138 if (did_replace) (gdb) call debug_gimple_stmt (stmt) D.2696_8 = a_1(D) != 1; (gdb) p did_replace $1 = 1 '\001' (gdb) n 1139 fold_stmt (oldi); so figure out why fold_stmt does not do its work instead. Which I quickly checked in gdb and it dispatches to fold_binary with boolean-typed arguments as a_1 != 1 where you can see the canonical form for this is !(int) a_1 because of a bug I think. /* bool_var != 1 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE integer_onep (arg1) code == NE_EXPR) return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_convert_loc (loc, type, arg0)); at least I don't see why we need to convert arg0 to the type of the comparison. Well, this type-cast is required by C specification - integer autopromotion - AFAIR. So I don't think FE maintainer would be happy about this change. Nevertheless I saw this pattern before, and was wondering why we check here for boolean_type at all. This might be in Ada-case even a latent bug due type-precision, and it prevents signed case detection too. IMHO this check should look like that: /* bool_var != 1 becomes !bool_var. */ if (INTEGRAL_TYPE_P (TREE_TYPE (arg0)) TYPE_PRECISION (TREE_TYPE (arg0)) == 1 integer_onep (arg1) code == NE_EXPR) return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_convert_loc (loc, type, arg0)); For thie BIT_NOT_EXPR variant, the cast of arg0 would be of course false, as ~(bool) is of course different in result then ~(int) You need to improve your debugging skills and see why existing transformations are not working before adding new ones. I work on that. Kai
RFA: Tighten vector aliasing check
tree-vect-loop-manip.c assumes there is an alias if: ((store_ptr_0 + store_segment_length_0) load_ptr_0) || (load_ptr_0 + load_segment_length_0) store_ptr_0)) which means that contiguous arrays are unnecessarily considered to alias. This patch changes the to =. Tested on x86_64-linux-gnu (all languages). OK to install? Richard gcc/ * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Tighten overlap check. Index: gcc/tree-vect-loop-manip.c === --- gcc/tree-vect-loop-manip.c 2011-06-22 16:46:34.0 +0100 +++ gcc/tree-vect-loop-manip.c 2011-07-13 11:12:06.0 +0100 @@ -2409,13 +2409,13 @@ vect_create_cond_for_alias_checks (loop_ tree part_cond_expr, length_factor; /* Create expression - ((store_ptr_0 + store_segment_length_0) load_ptr_0) - || (load_ptr_0 + load_segment_length_0) store_ptr_0)) + ((store_ptr_0 + store_segment_length_0) = load_ptr_0) + || (load_ptr_0 + load_segment_length_0) = store_ptr_0)) ... - ((store_ptr_n + store_segment_length_n) load_ptr_n) - || (load_ptr_n + load_segment_length_n) store_ptr_n)) */ + ((store_ptr_n + store_segment_length_n) = load_ptr_n) + || (load_ptr_n + load_segment_length_n) = store_ptr_n)) */ if (VEC_empty (ddr_p, may_alias_ddrs)) return; @@ -2484,8 +2484,8 @@ vect_create_cond_for_alias_checks (loop_ part_cond_expr = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, - fold_build2 (LT_EXPR, boolean_type_node, seg_a_max, seg_b_min), - fold_build2 (LT_EXPR, boolean_type_node, seg_b_max, seg_a_min)); + fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min), + fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min)); if (*cond_expr) *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
Re: [patch tree-optimization]: [3 of 3]: Boolify compares more
On Tue, Jul 12, 2011 at 6:55 PM, Kai Tietz ktiet...@googlemail.com wrote: Hello, As discussed on IRC, I reuse here the do_dce flag to choose folding direction within BB. Bootstrapped and regression tested for all standard-languages (plus Ada and Obj-C++) on host x86_64-pc-linux-gnu. Ok for apply? The tree-ssa-propagate.c change is ok on its own. For the tree-vrp.c changes you didn't follow the advise of removing the TRUTH_ op support and instead generalizing the BIT_ op support properly. There should be no 1-bit type thing left. Richard. Regards, Kai ChangeLog gcc/ 2011-07-12 Kai Tietz kti...@redhat.com * tree-ssa-propagate.c (substitute_and_fold): Only use last to first scanning direction if do_cde is true. * tree-vrp.c (extract_range_from_binary_expr): Add handling for BIT_IOR_EXPR, BIT_AND_EXPR, and BIT_NOT_EXPR. (register_edge_assert_for_1): Add handling for 1-bit BIT_IOR_EXPR and BIT_NOT_EXPR. (register_edge_assert_for): Add handling for 1-bit BIT_IOR_EXPR. (ssa_name_get_inner_ssa_name_p): New helper function. (ssa_name_get_cast_to_p): New helper function. (simplify_truth_ops_using_ranges): Handle prefixed cast instruction for result, and add support for one bit precision BIT_IOR_EXPR, BIT_AND_EXPR, BIT_XOR_EXPR, and BIT_NOT_EXPR. (simplify_stmt_using_ranges): Add handling for one bit precision BIT_IOR_EXPR, BIT_AND_EXPR, BIT_XOR_EXPR, and BIT_NOT_EXPR. ChangeLog gcc/testsuite 2011-07-08 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/vrp47.c: Remove dom-output and adjust testcase for vrp output analysis. Index: gcc/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c === --- gcc.orig/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c 2011-07-12 15:21:23.793440400 +0200 +++ gcc/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c 2011-07-12 15:27:11.892259100 +0200 @@ -4,7 +4,7 @@ jumps when evaluating an condition. VRP is not able to optimize this. */ /* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ -/* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom } */ +/* { dg-options -O2 -fdump-tree-vrp } */ /* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom -march=i586 { target { i?86-*-* ilp32 } } } */ int h(int x, int y) @@ -36,13 +36,10 @@ int f(int x) 0 or 1. */ /* { dg-final { scan-tree-dump-times \[xy\]\[^ \]* != 0 vrp1 } } */ -/* This one needs more copy propagation that only happens in dom1. */ -/* { dg-final { scan-tree-dump-times x\[^ \]* y 1 dom1 } } */ -/* { dg-final { scan-tree-dump-times x\[^ \]* y 1 vrp1 { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times x\[^ \]* y 1 vrp1 } } */ /* These two are fully simplified by VRP. */ /* { dg-final { scan-tree-dump-times x\[^ \]* \[|\] y 1 vrp1 } } */ /* { dg-final { scan-tree-dump-times x\[^ \]* \\^ 1 1 vrp1 } } */ /* { dg-final { cleanup-tree-dump vrp\[0-9\] } } */ -/* { dg-final { cleanup-tree-dump dom\[0-9\] } } */ Index: gcc/gcc/tree-ssa-propagate.c === --- gcc.orig/gcc/tree-ssa-propagate.c 2011-07-12 15:21:23.804440400 +0200 +++ gcc/gcc/tree-ssa-propagate.c 2011-07-12 15:28:22.83100 +0200 @@ -979,6 +979,9 @@ replace_phi_args_in (gimple phi, ssa_pro DO_DCE is true if trivially dead stmts can be removed. + If DO_DCE is true, the statements within a BB are walked from + last to first element. Otherwise we scan from first to last element. + Return TRUE when something changed. */ bool @@ -1059,9 +1062,10 @@ substitute_and_fold (ssa_prop_get_value_ for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (i)) replace_phi_args_in (gsi_stmt (i), get_value_fn); - /* Propagate known values into stmts. Do a backward walk to expose - more trivially deletable stmts. */ - for (i = gsi_last_bb (bb); !gsi_end_p (i);) + /* Propagate known values into stmts. Do a backward walk if + do_dce is true. In some case it exposes + more trivially deletable stmts to walk backward. */ + for (i = (do_dce ? gsi_last_bb (bb) : gsi_start_bb (bb)); !gsi_end_p (i);) { bool did_replace; gimple stmt = gsi_stmt (i); @@ -1070,7 +1074,10 @@ substitute_and_fold (ssa_prop_get_value_ gimple_stmt_iterator oldi; oldi = i; - gsi_prev (i); + if (do_dce) + gsi_prev (i); + else + gsi_next (i); /* Ignore ASSERT_EXPRs. They are used by VRP to generate range information for names and they are discarded Index: gcc/gcc/tree-vrp.c === --- gcc.orig/gcc/tree-vrp.c 2011-07-12 15:21:23.838440400 +0200 +++ gcc/gcc/tree-vrp.c
Re: [patch 4/8 tree-optimization]: Bitwise or logic for fold_binary_loc.
On Wed, Jul 13, 2011 at 12:39 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 9:33 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, This patch adds support to fold_binary_loc for one-bit precision typed bitwise-or expression. Seems to be a fallout of the missing TRUTH_NOT conversion as well. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_binary_loc): Add support for one-bit bitwise-or optimizeation. Bootstrapped and regression tested with prior patches of this series for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 08:23:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:04.011620200 +0200 @@ -10688,6 +10688,52 @@ fold_binary_loc (location_t loc, return omit_one_operand_loc (loc, type, t1, arg0); } + if (TYPE_PRECISION (type) == 1 INTEGRAL_TYPE_P (type)) + { + /* If arg0 is constant zero, drop it. */ + if (TREE_CODE (arg0) == INTEGER_CST integer_zerop (arg0)) + return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg1)); + if (TREE_CODE (arg0) == INTEGER_CST ! integer_zerop (arg0)) + return omit_one_operand_loc (loc, type, arg0, arg1); + + /* !X | X is always true. ~X | X is always true. */ + if ((TREE_CODE (arg0) == TRUTH_NOT_EXPR + || TREE_CODE (arg0) == BIT_NOT_EXPR) + operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg1); + /* X | !X is always true. X | ~X is always true. */ + if ((TREE_CODE (arg1) == TRUTH_NOT_EXPR + || TREE_CODE (arg1) == BIT_NOT_EXPR) + operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) + return omit_one_operand_loc (loc, type, integer_one_node, arg0); + + /* (X !Y) | (!X Y) is X ^ Y */ + if (TREE_CODE (arg0) == BIT_AND_EXPR + TREE_CODE (arg1) == BIT_AND_EXPR) + { + tree a0, a1, l0, l1, n0, n1; + + a0 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0)); + a1 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1)); + + l0 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); + l1 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1)); + + n0 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l0); + n1 = fold_build1_loc (loc, TRUTH_NOT_EXPR, type, l1); + + if ((operand_equal_p (n0, a0, 0) + operand_equal_p (n1, a1, 0)) + || (operand_equal_p (n0, a1, 0) + operand_equal_p (n1, a0, 0))) + return fold_build2_loc (loc, BIT_XOR_EXPR, type, l0, n1); + } + + tem = fold_truth_andor (loc, code, type, arg0, arg1, op0, op1); + if (tem) + return tem; + } + /* Canonicalize (X C1) | C2. */ if (TREE_CODE (arg0) == BIT_AND_EXPR TREE_CODE (arg1) == INTEGER_CST Well, I wouldn't call it fallout. As by this we are able to handle things like ~(X = B) and see that it can be converted to X B. The point here is that we avoid that fold re-introduces here the TRUTH variants for the bitwise ones (for sure some parts are redudant and might be something to be factored out like we did for truth_andor function). Also we catch by this patterns like ~X op ~Y and convert them to ~(X op Y), which is just valid for one-bit precision typed X and Y. As in general !x is not the same as ~x, beside x has one-bit precision integeral type. I will adjust patches so, that for one-bit precision type we alway use here instead BIT_NOT_EXPR (instead of TRUTH_NOT). This is reasonable. Sorry, but no. fold-const.c should not look at 1-bitness at all. fold-const.c should special-case BOOLEAN_TYPEs - and it does that already. This patch series makes me think that it is premature given that on gimple we still mix TRUTH_NOT_EXPR and BIT_*_EXPRs. Richard.
Re: [patch tree-optimization]: Normalize compares X ==/!= 1 - X !=/== 0 for truth valued X
On Wed, Jul 13, 2011 at 12:56 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:00 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, this patch fixes that for replaced uses, we call fold_stmt_inplace. Additionally it adds to fold_gimple_assign the canonical form for X !=/== 1 - X ==/!= 0 for X with one-bit precision type. ChangeLog gcc/ 2011-07-13 Kai Tietz kti...@redhat.com * gimple-fold.c (fold_gimple_assign): Add normalization for compares of 1-bit integer precision operands. * tree-ssa-propagate.c (replace_uses_in): Call fold_stmt_inplace on modified statement. err - sure not. The caller already does that. Breakpoint 5, substitute_and_fold (get_value_fn=0xc269ae get_value, fold_fn=0, do_dce=1 '\001') at /space/rguenther/src/svn/trunk/gcc/tree-ssa-propagate.c:1134 1134 if (get_value_fn) D.2696_8 = a_1(D) != D.2704_10; (gdb) n 1135 did_replace |= replace_uses_in (stmt, get_value_fn); (gdb) 1138 if (did_replace) (gdb) call debug_gimple_stmt (stmt) D.2696_8 = a_1(D) != 1; (gdb) p did_replace $1 = 1 '\001' (gdb) n 1139 fold_stmt (oldi); so figure out why fold_stmt does not do its work instead. Which I quickly checked in gdb and it dispatches to fold_binary with boolean-typed arguments as a_1 != 1 where you can see the canonical form for this is !(int) a_1 because of a bug I think. /* bool_var != 1 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE integer_onep (arg1) code == NE_EXPR) return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_convert_loc (loc, type, arg0)); at least I don't see why we need to convert arg0 to the type of the comparison. Well, this type-cast is required by C specification - integer autopromotion - AFAIR. So I don't think FE maintainer would be happy about this change. ? fold-const.c isn't supposed to perform integer promotion. It's input will have integer promotions if the frontend requires them. If they are semantically not needed fold-const.c strips them away anyway. Nevertheless I saw this pattern before, and was wondering why we check here for boolean_type at all. This might be in Ada-case even a latent bug due type-precision, and it prevents signed case detection too. IMHO this check should look like that: /* bool_var != 1 becomes !bool_var. */ if (INTEGRAL_TYPE_P (TREE_TYPE (arg0)) TYPE_PRECISION (TREE_TYPE (arg0)) == 1 integer_onep (arg1) code == NE_EXPR) return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, fold_convert_loc (loc, type, arg0)); No it should not. The BOOLEAN_TYPE check is exactly correct. For thie BIT_NOT_EXPR variant, the cast of arg0 would be of course false, as ~(bool) is of course different in result then ~(int) You need to improve your debugging skills and see why existing transformations are not working before adding new ones. I work on that. Kai
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote: Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. Well, look into fold-const's fold_binary_loc function and see /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. First check for cases where an arithmetic operation is applied to a compound, conditional, or comparison operation. Push the arithmetic operation inside the compound or conditional to see if any folding can then be done. Convert comparison to conditional for this purpose. The also optimizes non-constant cases that used to be done in expand_expr. Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR, one of the operands is a comparison and the other is a comparison, a BIT_AND_EXPR with the constant 1, or a truth value. In that case, the code below would make the expression more complex. Change it to a TRUTH_{AND,OR}_EXPR. Likewise, convert a similar NE_EXPR to TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR. */ if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) ((truth_value_p (TREE_CODE (arg0)) (truth_value_p (TREE_CODE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) || (truth_value_p (TREE_CODE (arg1)) (truth_value_p (TREE_CODE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR, boolean_type_node, fold_convert_loc (loc, boolean_type_node, arg0), fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc (loc, tem); return fold_convert_loc (loc, type, tem); } Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands are of kind truth. This is btw the point, why you see that those cases are handled. But as soon as this part is turned off for BIT_- IOR/AND, we need to do the folding for 1-bit precision case explicit. To avoid it, it needs to learn to handle 1-bit precision folding for those bitwise-operation on 1-bit integer types special. As gimple replies on this FE fold for now, it has to be learn about that. As soon as gimple_fold (and other passes) don't rely anymore on FE's fold-const, then we can remove those parts again. Otherwise this boolification of compares (and also the transition of TRUTH_NOT - BIT_NOT, simply doesn't work so long. I do not believe that. Regards, Kai 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 9:32 AM, Kai Tietz ktiet...@googlemail.com wrote: Hello, I split my old patch into 8 speparate pieces for easier review. These patches are a prerequist for enabling boolification of comparisons in gimplifier and the necessary type-cast preserving in gimple from/to boolean-type. This patch adds support to fold_truth_not_expr for one-bit precision typed bitwise-binary and bitwise-not expressions. It seems this is only necessary because we still have TRUTH_NOT_EXPR in our IL and did not replace that with BIT_NOT_EXPR consistently yet. So no, this is not ok. fold-const.c is really mostly supposed to deal with GENERIC where we distinguish TRUTH_* and BIT_* variants. Please instead lower TRUTH_NOT_EXPR to BIT_NOT_EXPR for gimple. Richard. ChangeLog 2011-07-13 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_not_expr): Add support for one-bit bitwise operations. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c 2011-07-13 07:48:29.0 +0200 +++ gcc/gcc/fold-const.c 2011-07-13 08:59:36.865620200 +0200 @@ -3074,20 +3074,36 @@ fold_truth_not_expr (location_t loc, tre case INTEGER_CST: return constant_boolean_node (integer_zerop (arg), type); + case BIT_AND_EXPR: + if (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) != 1) + return NULL_TREE; + if (integer_onep (TREE_OPERAND (arg, 1))) + return build2_loc (loc, EQ_EXPR, type, arg, build_int_cst (type, 0)); + /* fall through */ case TRUTH_AND_EXPR: loc1 = expr_location_or (TREE_OPERAND (arg, 0), loc);
Re: RFA: Tighten vector aliasing check
On Wed, Jul 13, 2011 at 1:00 PM, Richard Sandiford rdsandif...@googlemail.com wrote: tree-vect-loop-manip.c assumes there is an alias if: ((store_ptr_0 + store_segment_length_0) load_ptr_0) || (load_ptr_0 + load_segment_length_0) store_ptr_0)) which means that contiguous arrays are unnecessarily considered to alias. This patch changes the to =. Tested on x86_64-linux-gnu (all languages). OK to install? Ok. Thanks, Richard. Richard gcc/ * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Tighten overlap check. Index: gcc/tree-vect-loop-manip.c === --- gcc/tree-vect-loop-manip.c 2011-06-22 16:46:34.0 +0100 +++ gcc/tree-vect-loop-manip.c 2011-07-13 11:12:06.0 +0100 @@ -2409,13 +2409,13 @@ vect_create_cond_for_alias_checks (loop_ tree part_cond_expr, length_factor; /* Create expression - ((store_ptr_0 + store_segment_length_0) load_ptr_0) - || (load_ptr_0 + load_segment_length_0) store_ptr_0)) + ((store_ptr_0 + store_segment_length_0) = load_ptr_0) + || (load_ptr_0 + load_segment_length_0) = store_ptr_0)) ... - ((store_ptr_n + store_segment_length_n) load_ptr_n) - || (load_ptr_n + load_segment_length_n) store_ptr_n)) */ + ((store_ptr_n + store_segment_length_n) = load_ptr_n) + || (load_ptr_n + load_segment_length_n) = store_ptr_n)) */ if (VEC_empty (ddr_p, may_alias_ddrs)) return; @@ -2484,8 +2484,8 @@ vect_create_cond_for_alias_checks (loop_ part_cond_expr = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, - fold_build2 (LT_EXPR, boolean_type_node, seg_a_max, seg_b_min), - fold_build2 (LT_EXPR, boolean_type_node, seg_b_max, seg_a_min)); + fold_build2 (LE_EXPR, boolean_type_node, seg_a_max, seg_b_min), + fold_build2 (LE_EXPR, boolean_type_node, seg_b_max, seg_a_min)); if (*cond_expr) *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
Re: RFA: Tighten vector aliasing check
Richard Sandiford rdsandif...@googlemail.com writes: tree-vect-loop-manip.c assumes there is an alias if: I meant _unless_. ((store_ptr_0 + store_segment_length_0) load_ptr_0) || (load_ptr_0 + load_segment_length_0) store_ptr_0)) which means that contiguous arrays are unnecessarily considered to alias. Richard
Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)
On Wed, Jul 13, 2011 at 03:12, Guozhi Wei car...@google.com wrote: Hi This patch fixes a testing error on arm backend. It has been tested on both x86 and arm target with following commands. make check-g++ RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a dg.exp=anon-ns1.C make check-g++ RUNTESTFLAGS=dg.exp=anon-ns1.C Carrot, did you backport this patch with svnmerge.py? Thanks. Diego.
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote: Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. Well, look into fold-const's fold_binary_loc function and see /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. First check for cases where an arithmetic operation is applied to a compound, conditional, or comparison operation. Push the arithmetic operation inside the compound or conditional to see if any folding can then be done. Convert comparison to conditional for this purpose. The also optimizes non-constant cases that used to be done in expand_expr. Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR, one of the operands is a comparison and the other is a comparison, a BIT_AND_EXPR with the constant 1, or a truth value. In that case, the code below would make the expression more complex. Change it to a TRUTH_{AND,OR}_EXPR. Likewise, convert a similar NE_EXPR to TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR. */ if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) ((truth_value_p (TREE_CODE (arg0)) (truth_value_p (TREE_CODE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) || (truth_value_p (TREE_CODE (arg1)) (truth_value_p (TREE_CODE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR, boolean_type_node, fold_convert_loc (loc, boolean_type_node, arg0), fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc (loc, tem); return fold_convert_loc (loc, type, tem); } Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands are of kind truth. This is btw the point, why you see that those cases are handled. But as soon as this part is turned off for BIT_- IOR/AND, we need to do the folding for 1-bit precision case explicit. First of all this checks for a quite complex pattern - where do we pass such complex pattern from the gimple level to fold? For the EQ/NE_EXPR case forwprop probably might be able to feed it that, but then how does it go wrong? The above could also simply be guarded by !in_gimple_form. Richard.
Re: Ping: C-family stack check for threads
On Sun, 3 Jul 2011, Thomas Klein wrote: Ye Joey wrote: Thomas, I think your are working on a very useful feature. I have ARM MCU applications running of out stack space and resulting strange behaviors silently. I'd like to try your patch and probably give further comments I also think this will be a very useful feature (not just for threads), and I hope you'll persevere through the review process. ;) Not your first patch and you have copyright assignments in place, so that's covered. The first thing I see is that you need to fix the issues regarding the GCC coding standards, http://gcc.gnu.org/codingconventions.html as that is a hurdle for reviewers, and you don't want that. Be very careful. I haven't ran contrib/check_GNU_style.sh myself but maybe it'll be helpful. The second issue I see is that documentation for the new patterns is missing, that should go in gcc/doc/md.texi, somewhere under @node Standard Names. I can imagine there'll be a thing or two to tweak regarding them and that best reviewed through the documentation. Generally, as much as possible should be general and not ARM-specific. If you need helper functions, add them to libgcc. Regards Thomas Klein gcc/ChangeLog 2011-07-03 Thomas Kleinth.r.kl...@web.de mailto:th.r.kl...@web.de * opts.c (common_handle_option): introduce additional stack checking parameters direct and indirect * flag-types.h (enum stack_check_type): Likewise * explow.c (allocate_dynamic_stack_space): - suppress stack probing if parameter direct, indirect or if a stack-limit is given - do additional read of limit value if parameter indirect and a stack-limit symbol is given - emit a call to a stack_failure function [as an alternative to a trap call] No bullet list in the changelog, please. Individual sentences. Follow the existing format; full sentences with capitalization and all that. brgds, H-P
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote: Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. Well, look into fold-const's fold_binary_loc function and see /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. First check for cases where an arithmetic operation is applied to a compound, conditional, or comparison operation. Push the arithmetic operation inside the compound or conditional to see if any folding can then be done. Convert comparison to conditional for this purpose. The also optimizes non-constant cases that used to be done in expand_expr. Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR, one of the operands is a comparison and the other is a comparison, a BIT_AND_EXPR with the constant 1, or a truth value. In that case, the code below would make the expression more complex. Change it to a TRUTH_{AND,OR}_EXPR. Likewise, convert a similar NE_EXPR to TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR. */ if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) ((truth_value_p (TREE_CODE (arg0)) (truth_value_p (TREE_CODE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) || (truth_value_p (TREE_CODE (arg1)) (truth_value_p (TREE_CODE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR, boolean_type_node, fold_convert_loc (loc, boolean_type_node, arg0), fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc (loc, tem); return fold_convert_loc (loc, type, tem); } Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands are of kind truth. This is btw the point, why you see that those cases are handled. But as soon as this part is turned off for BIT_- IOR/AND, we need to do the folding for 1-bit precision case explicit. First of all this checks for a quite complex pattern - where do we pass such complex pattern from the gimple level to fold? For the EQ/NE_EXPR case forwprop probably might be able to feed it that, but then how does it go wrong? The above could also simply be guarded by !in_gimple_form. Richard. See reassoc pass as example and this hacky maybe_fold_and_comparisons / maybe_fold_or_comparisons functions. As indeed we want still be able to do comparison foldings without getting back an TRUTH-op. Additionally we have a lot of passes - like vectorizer - which are happily try to build new condition on tree-level. This is another place I saw issues and tree-cfg failures. And last but not least those truth-ops might be reintroduced in gimple_fold, as soon as we see bitwise-ops on one-bit precision integral type as truth_value. Kai
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Tue, 2011-07-12 at 11:50 -0500, William J. Schmidt wrote: Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns. On further investigation, this is improving the code generation but not reverting all of the performance loss. We'll open a bug on this one once we have it narrowed down a little further. Bill Bill
Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)
On Wed, Jul 13, 2011 at 1:28 AM, Richard Guenther rguent...@suse.de wrote: On Tue, 12 Jul 2011, Ulrich Weigand wrote: Richard Guenther wrote: 2011-07-11 Richard Guenther rguent...@suse.de * tree-vrp.c (simplify_conversion_using_ranges): Manually translate the source value-range through the conversion chain. This causes a build failure in cachemgr.c on spu-elf. A slightly modified simplified test case also fails on i386-linux: void * test (unsigned long long x, unsigned long long y) { return (void *) (unsigned int) (x / y); } compiled with -O2 results in: test.i: In function 'test': test.i:3:1: error: invalid types in nop conversion void * long long unsigned int D.1962_5 = (void *) D.1963_3; test.i:3:1: internal compiler error: verify_gimple failed Any thoughts? Fix in testing. Richard. 2011-07-13 Richard Guenther rguent...@suse.de * tree-vrp.c (simplify_conversion_using_ranges): Make sure the final type is integral. * gcc.dg/torture/20110713-1.c: New testcase. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 176224) +++ gcc/tree-vrp.c (working copy) @@ -7353,6 +7353,8 @@ simplify_conversion_using_ranges (gimple double_int innermin, innermax, middlemin, middlemax; finaltype = TREE_TYPE (gimple_assign_lhs (stmt)); + if (!INTEGRAL_TYPE_P (finaltype)) + return false; middleop = gimple_assign_rhs1 (stmt); def_stmt = SSA_NAME_DEF_STMT (middleop); if (!is_gimple_assign (def_stmt) Index: gcc/testsuite/gcc.dg/torture/20110713-1.c === --- gcc/testsuite/gcc.dg/torture/20110713-1.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/20110713-1.c (revision 0) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ilp32 } */ + +void * +test (unsigned long long x, unsigned long long y) +{ + return (void *) (unsigned int) (x / y); +} This also fixed: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49731 -- H.J.
[Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)
This is a patch to fix PR49487. As Denis will be off-line for some time, it'd be great if a global reviewer would review it. It appears that he is the only AVR maintainer who approves patches. The reason for the ICE is as explained in the PR: Rotate pattern use X as constraint for an operand which is used as scratch. However, the operand is a match_operand and not a match_scratch. Because the scratch is not needed in all situations, I choose to use match_scratch instead of match_operand and not to fix the constraints. Fixing constraints would lead to superfluous allocation of register if no scratch was needed. Tested with 2 FAILs less: gcc.c-torture/compile/pr46883.c ICEs without this patch and passes with it. The test case in the PR passes, too. That test case passes also the current unpatched 4.7, but it's obvious that the constraint/operand combination is a bug. Ok to commit and back-port to 4.6? Johann PR target/49487 * config/avr/avr.md (rotlmode3): Generate SCRATCH instead of REG. (*rotwmode): Use const_int_operand for operands2. Use match_scatch for operands3. (*rotbmode): Ditto * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH.
[PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
Hi, the widening_mul pass might increase the number of multiplications in the code by transforming a = b * c d = a + 2 e = a + 3 into: d = b * c + 2 e = b * c + 3 under the assumption that an FMA instruction is not more expensive than a simple add. This certainly isn't always true. While e.g. on s390 an fma is indeed not slower than an add execution-wise it has disadvantages regarding instruction grouping. It doesn't group with any other instruction what has a major impact on the instruction dispatch bandwidth. The following patch tries to figure out the costs for adds, mults and fmas by building an RTX and asking the backends cost function in order to estimate whether it is whorthwhile doing the transformation. With that patch the 436.cactus hotloop contains 28 less multiplications than before increasing performance slightly (~2%). Bootstrapped and regtested on x86_64 and s390x. Bye, -Andreas- 2011-07-13 Andreas Krebbel andreas.kreb...@de.ibm.com * tree-ssa-math-opts.c (compute_costs): New function. (convert_mult_to_fma): Take costs into account when propagating multiplications into several additions. * config/s390/s390.c (z196_costs): Adjust costs for madbr and maebr. Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c.orig --- gcc/tree-ssa-math-opts.c *** convert_plusminus_to_widen (gimple_stmt_ *** 2185,2190 --- 2185,2252 return true; } + /* Computing the costs for calculating RTX with CODE in MODE. */ + + static unsigned + compute_costs (enum machine_mode mode, enum rtx_code code, bool speed) + { + rtx seq; + rtx set; + unsigned cost = 0; + + start_sequence (); + + switch (GET_RTX_LENGTH (code)) + { + case 2: + force_operand (gen_rtx_fmt_ee (code, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)), +NULL_RTX); + break; + case 3: + /* FMA expressions are not handled by force_operand. */ + expand_ternary_op (mode, fma_optab, +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2), +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3), +NULL_RTX, false); + break; + default: + gcc_unreachable (); + } + + seq = get_insns (); + end_sequence (); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Calculating costs of %s in %s mode. Sequence is:\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode)); + print_rtl (dump_file, seq); + } + + for (; seq; seq = NEXT_INSN (seq)) + { + set = single_set (seq); + if (set) + cost += rtx_cost (set, SET, speed); + else + cost++; + } + + /* If the backend returns a cost of zero it is most certainly lying. + Set this to one in order to notice that we already calculated it + once. */ + cost = cost ? cost : 1; + + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, %s in %s costs %d\n\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode), cost); + + return cost; + } + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ *** convert_mult_to_fma (gimple mul_stmt, tr *** 2197,2202 --- 2259,2270 gimple use_stmt, neguse_stmt, fma_stmt; use_operand_p use_p; imm_use_iterator imm_iter; + enum machine_mode mode; + int uses = 0; + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[NUM_MACHINE_MODES]; + static unsigned add_cost[NUM_MACHINE_MODES]; + static unsigned fma_cost[NUM_MACHINE_MODES]; if (FLOAT_TYPE_P (type) flag_fp_contract_mode == FP_CONTRACT_OFF) *** convert_mult_to_fma (gimple mul_stmt, tr *** 2213, if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; /* Make sure that the multiplication statement becomes dead after ! the transformation, thus that all uses are transformed to FMAs. ! This means we assume that an FMA operation has the same cost ! as an addition. */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result) { enum tree_code use_code; --- 2281,2297 if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; + mode = TYPE_MODE (type); + + if (!fma_cost[mode]) + { + fma_cost[mode] = compute_costs (mode, FMA, speed); + add_cost[mode] = compute_costs (mode, PLUS, speed); + mul_cost[mode] = compute_costs (mode, MULT, speed);
Re: PATCH [3/n] X32: Promote pointers to Pmode
PING. On Sun, Jul 10, 2011 at 12:43 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Jul 10, 2011 at 7:32 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Jul 9, 2011 at 11:28 PM, H.J. Lu hongjiu...@intel.com wrote: X32 psABI requires promoting pointers to Pmode when passing/returning in registers. OK for trunk? Thanks. H.J. -- 2011-07-09 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_promote_function_mode): New. (TARGET_PROMOTE_FUNCTION_MODE): Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 04cb07d..c852719 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -7052,6 +7061,23 @@ ix86_function_value (const_tree valtype, const_tree fntype_or_decl, return ix86_function_value_1 (valtype, fntype_or_decl, orig_mode, mode); } +/* Pointer function arguments and return values are promoted to + Pmode. */ + +static enum machine_mode +ix86_promote_function_mode (const_tree type, enum machine_mode mode, + int *punsignedp, const_tree fntype, + int for_return) +{ + if (for_return != 1 type != NULL_TREE POINTER_TYPE_P (type)) + { + *punsignedp = POINTERS_EXTEND_UNSIGNED; + return Pmode; + } + return default_promote_function_mode (type, mode, punsignedp, fntype, + for_return); +} Please rewrite the condition to: if (for_return == 1) /* Do not promote function return values. */ ; else if (type != NULL_TREE ...) Also, please add some comments. Your comment also says that pointer return arguments are promoted to Pmode. The documentation says that: FOR_RETURN allows to distinguish the promotion of arguments and return values. If it is `1', a return value is being promoted and `TARGET_FUNCTION_VALUE' must perform the same promotions done here. If it is `2', the returned mode should be that of the register in which an incoming parameter is copied, or the outgoing result is computed; then the hook should return the same mode as `promote_mode', though the signedness may be different. You bypass promotions when FOR_RETURN is 1. Uros. Here is the updated patch. OK for trunk? Thanks. -- H.J. -- 2011-07-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_promote_function_mode): New. (TARGET_PROMOTE_FUNCTION_MODE): Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 04cb07d..1b02312 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -7052,6 +7061,27 @@ ix86_function_value (const_tree valtype, const_tree fntype_or_decl, return ix86_function_value_1 (valtype, fntype_or_decl, orig_mode, mode); } +/* Pointer function arguments and return values are promoted to Pmode. + If FOR_RETURN is 1, this function must behave in the same way with + regard to function returns as TARGET_FUNCTION_VALUE. */ + +static enum machine_mode +ix86_promote_function_mode (const_tree type, enum machine_mode mode, + int *punsignedp, const_tree fntype, + int for_return) +{ + if (for_return == 1) + /* Do not promote function return values. */ + ; + else if (type != NULL_TREE POINTER_TYPE_P (type)) + { + *punsignedp = POINTERS_EXTEND_UNSIGNED; + return Pmode; + } + return default_promote_function_mode (type, mode, punsignedp, fntype, + for_return); +} + rtx ix86_libcall_value (enum machine_mode mode) { @@ -34810,6 +35157,9 @@ ix86_autovectorize_vector_sizes (void) #undef TARGET_FUNCTION_VALUE_REGNO_P #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p +#undef TARGET_PROMOTE_FUNCTION_MODE +#define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode + #undef TARGET_SECONDARY_RELOAD #define TARGET_SECONDARY_RELOAD ix86_secondary_reload -- H.J.
Re: PATCH [3/n] X32: Promote pointers to Pmode
On Wed, Jul 13, 2011 at 3:17 PM, H.J. Lu hjl.to...@gmail.com wrote: PING. 2011-07-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_promote_function_mode): New. (TARGET_PROMOTE_FUNCTION_MODE): Likewise. You have discussed this with rth, the final approval should be from him. Uros.
Re: Build failure (Re: [PATCH] Make VRP optimize useless conversions)
Richard Guenther wrote: 2011-07-13 Richard Guenther rguent...@suse.de * tree-vrp.c (simplify_conversion_using_ranges): Make sure the final type is integral. This fixes the spu-elf build failure. Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [build] Move libgcov support to toplevel libgcc
Jan, I would also preffer libgcov to go into its own toplevel directory, especially because there are plans to add non-stdlib i/o into it i.e. for kernel profiling. that way it would be handy to have libgcov as a toplevel library with its own configure that allows it to be build independently of rest of GCC. I'm probably not going to try that. There's so much cleanup possible in the toplevel libgcc move as is that will keep me busy for some time (provided that I can testing and approval for the parts I can't easily test myself ;-). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On 07/12/2011 07:46 PM, Jakub Jelinek wrote: The aim is to include just (or primarily) code generation affecting options explicitly passed on the command line. So that the merging actually works, options or arguments which include filenames or paths shouldn't be added, on Roland's request -D*/-U* options aren't added either (that should be covered by .debug_macinfo) ...but only with -g3. Ideally we'd just include explicitly passed options from command line that haven't been overridden by other command line options, and would sort them, so that there are higher chances of DW_AT_producer strings being merged (e.g. -O2 -ffast-math vs. -ffast-math -O2 are now different strings, and similarly -O2 vs. -O3 -O2 vs. -O0 -O1 -Ofast -O2), but I'm not sure if it is easily possible using current option handling framework. Why not? Sorting sounds pretty straightforward to me, though you might want to copy the array first. On the other hand, it probably isn't worthwhile; presumably most relocatables being linked together will share the same CFLAGS, so you'll get a high degree of merging without any sorting. --- gcc/testsuite/lib/dg-pch.exp.jj 2011-01-03 18:58:03.0 +0100 +++ gcc/testsuite/lib/dg-pch.exp2011-07-12 23:13:50.943670171 +0200 - dg-test -keep-output ./$bname$suffix $otherflags $flags + dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags $flags Why is this necessary? Jason
Re: PATCH [3/n] X32: Promote pointers to Pmode
Hi Richard, Is my patch OK? Thanks. H.J. On Sun, Jul 10, 2011 at 6:14 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Jul 10, 2011 at 5:48 PM, Richard Henderson r...@redhat.com wrote: On 07/10/2011 03:01 PM, H.J. Lu wrote: We only want to promote function parameters passed/returned in register. But I can't tell if a value will be passed in register or memory inside of TARGET_FUNCTION_VALUE. So when FOR_RETURN is 1, we don't promote. Even if we don't promote it explicitly, hardware still zero-extends it for us. So it isn't a real issue. The hardware *usually* zero-extends. It all depends on where the data is coming from. Without certainty, i.e. actually representing it properly, you're designing a broken ABI. What you wrote above re T_F_V not being able to tell register or memory doesn't make sense. Did you really mean inside TARGET_PROMOTE_FUNCTION_MODE? And why exactly wouldn't you be able to tell there? Can you not find out via a call to ix86_return_in_memory? TARGET_PROMOTE_FUNCTION_MODE is for passing/returning value in a register and the documentation says that: FOR_RETURN allows to distinguish the promotion of arguments and return values. If it is `1', a return value is being promoted and `TARGET_FUNCTION_VALUE' must perform the same promotions done here. If it is `2', the returned mode should be that of the register in which an incoming parameter is copied, or the outgoing result is computed; then the hook should return the same mode as `promote_mode', though the signedness may be different. But for TARGET_FUNCTION_VALUE, there is no difference for register and memory. That is why I don't promote when FOR_RETURN is 1. mmix/mmix.c and rx/rx.c have similar treatment.
Re: [ARM] Fix PR49641
On 07/07/11 21:02, Bernd Schmidt wrote: This corrects an error in store_multiple_operation. We're only generating the writeback version of the instruction on Thumb-1, so that's where we must make sure the base register isn't also stored. The ARMv7 manual is unfortunately not totally clear that this does in fact produce unpredictable results; it seems to suggest that this is the case only for the T2 encoding. Older documentation makes it clear. Tested on arm-eabi{,mthumb}. I agree that the wording here is unclear, but the pseudo code for the decode makes the situation clearer, and does reflect what I really believe to be the case. Put explicitly: For LDM: - Encoding A1: Unpredictable if writeback and base in list (I believe this is true for all architecture versions, despite what it says in the current ARM ARM -- at least, my v5 copy certainly says unpredictable) - Encoding T1: Not unpredictable, but deprecated (for base in list, the loaded value used and writeback ignored). Note, however, that in UAL the ! operator on the base register must not be used if the base register appears in the list. - Encoding T2: Unpredictable if writeback and base in list For STM: - Encoding T2: Unpredictable if writeback and base in list regardless of the position. - Encodings T1 and A1: Unpredictable if writeback and base in list and not lowest numbered register (note that encoding T1 always has writeback). In the case where the base is the first register in the list, then the original value of base will be stored; deprecated. This is all quite complicated, I hope I've expressed it correctly... :-) R. Bernd pr49641.diff * config/arm/arm.c (store_multiple_sequence): Avoid cases where the base reg is stored iff compiling for Thumb1. * gcc.target/arm/pr49641.c: New test. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 175906) +++ gcc/config/arm/arm.c (working copy) @@ -9950,7 +9950,10 @@ store_multiple_sequence (rtx *operands, /* If it isn't an integer register, then we can't do this. */ if (unsorted_regs[i] 0 || (TARGET_THUMB1 unsorted_regs[i] LAST_LO_REGNUM) - || (TARGET_THUMB2 unsorted_regs[i] == base_reg) + /* For Thumb1, we'll generate an instruction with update, + and the effects are unpredictable if the base reg is + stored. */ + || (TARGET_THUMB1 unsorted_regs[i] == base_reg) || (TARGET_THUMB2 unsorted_regs[i] == SP_REGNUM) || unsorted_regs[i] 14) return 0; Index: gcc/testsuite/gcc.target/arm/pr49641.c === --- gcc/testsuite/gcc.target/arm/pr49641.c(revision 0) +++ gcc/testsuite/gcc.target/arm/pr49641.c(revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -mthumb -O2 } */ +/* { dg-require-effective-target arm_thumb1_ok } */ +/* { dg-final { scan-assembler-not stmia\[\\t \]*r3!\[^\\n]*r3 } } */ +typedef struct { + void *t1, *t2, *t3; +} z; +extern volatile int y; +static inline void foo(z *x) { + x-t1 = x-t2; + x-t2 = ((void *)0); + x-t3 = x-t1; +} +extern z v; +void bar (void) { + y = 0; + foo(v); +}
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On Wed, Jul 13, 2011 at 09:56:58AM -0400, Jason Merrill wrote: On 07/12/2011 07:46 PM, Jakub Jelinek wrote: The aim is to include just (or primarily) code generation affecting options explicitly passed on the command line. So that the merging actually works, options or arguments which include filenames or paths shouldn't be added, on Roland's request -D*/-U* options aren't added either (that should be covered by .debug_macinfo) ...but only with -g3. Sure. But if we put -D*/-U* into DW_AT_producer, -D_FORTIFY_SOURCE=2 on the command line acts the same as #define _FORTIFY_SOURCE 2 before including the first header and the latter wouldn't be recorded. I'm working on smaller .debug_macinfo right now. Ideally we'd just include explicitly passed options from command line that haven't been overridden by other command line options, and would sort them, so that there are higher chances of DW_AT_producer strings being merged (e.g. -O2 -ffast-math vs. -ffast-math -O2 are now different strings, and similarly -O2 vs. -O3 -O2 vs. -O0 -O1 -Ofast -O2), but I'm not sure if it is easily possible using current option handling framework. Why not? Sorting sounds pretty straightforward to me, though you might want to copy the array first. If the command line options contain options that override each other, then sorting would drop important information what comes last and thus overrides other options. If we would have only options which weren't overridden, we could sort. Otherwise -O2 -O0 would be sorted as -O0 -O2 and we'd think the code was optimized when it wasn't. On the other hand, it probably isn't worthwhile; presumably most relocatables being linked together will share the same CFLAGS, so you'll get a high degree of merging without any sorting. --- gcc/testsuite/lib/dg-pch.exp.jj 2011-01-03 18:58:03.0 +0100 +++ gcc/testsuite/lib/dg-pch.exp 2011-07-12 23:13:50.943670171 +0200 -dg-test -keep-output ./$bname$suffix $otherflags $flags +dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags $flags Why is this necessary? It is only necessary if somebody wants to make -grecord-gcc-switches the default (for bootstrap/regtest I've tweaked common.opt to do that to test it better). PCH is a big mess and screws debuginfo in many ways, in this case it was just small differences in DW_AT_producer, but we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc. Jakub
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On 07/13/2011 10:06 AM, Jakub Jelinek wrote: --- gcc/testsuite/lib/dg-pch.exp.jj 2011-01-03 18:58:03.0 +0100 +++ gcc/testsuite/lib/dg-pch.exp2011-07-12 23:13:50.943670171 +0200 - dg-test -keep-output ./$bname$suffix $otherflags $flags + dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags $flags It is only necessary if somebody wants to make -grecord-gcc-switches the default (for bootstrap/regtest I've tweaked common.opt to do that to test it better). PCH is a big mess and screws debuginfo in many ways, in this case it was just small differences in DW_AT_producer, but we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc. Why would PCH change DW_AT_producer? Because we're restoring single_comp_unit_die from the PCH? Then perhaps we should set DW_AT_producer in output_comp_unit rather than gen_compile_unit_die. Jason
Re: [build] Remove crt0, mcrt0 support
Jan, Rainer Orth r...@cebitec.uni-bielefeld.de 07/12/11 6:46 PM On the other hand, maybe it's time to obsolete or even immediately remove the netware port: there is no listed maintainer, no testsuite results at least back to 2007 (if any were ever posted), and the only netware-related change that hasn't been part of general cleanup is almost two years ago. That would be fine with me. which variant would you prefer: obsoletion now and removal in 4.8 or immediate removal? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the widening_mul pass might increase the number of multiplications in the code by transforming a = b * c d = a + 2 e = a + 3 into: d = b * c + 2 e = b * c + 3 under the assumption that an FMA instruction is not more expensive than a simple add. This certainly isn't always true. While e.g. on s390 an fma is indeed not slower than an add execution-wise it has disadvantages regarding instruction grouping. It doesn't group with any other instruction what has a major impact on the instruction dispatch bandwidth. The following patch tries to figure out the costs for adds, mults and fmas by building an RTX and asking the backends cost function in order to estimate whether it is whorthwhile doing the transformation. With that patch the 436.cactus hotloop contains 28 less multiplications than before increasing performance slightly (~2%). Bootstrapped and regtested on x86_64 and s390x. Ick ;) Maybe this is finally the time to introduce target hook(s) to get us back costs for trees? For this case we'd need two actually, or just one - dependent on what finegrained information we pass. Choices: tree_code_cost (enum tree_code) tree_code_cost (enum tree_code, enum machine_mode mode) unary_cost (enum tree_code, tree actual_arg0) // args will be mostly SSA names or constants, but at least they are typed - works for mixed-typed operations binary_cost (...) ... unary_cost (enum tree_code, enum tree_code arg0_kind) // constant vs. non-constant arg, but lacks type/mode Richard. Bye, -Andreas- 2011-07-13 Andreas Krebbel andreas.kreb...@de.ibm.com * tree-ssa-math-opts.c (compute_costs): New function. (convert_mult_to_fma): Take costs into account when propagating multiplications into several additions. * config/s390/s390.c (z196_costs): Adjust costs for madbr and maebr. Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c.orig --- gcc/tree-ssa-math-opts.c *** convert_plusminus_to_widen (gimple_stmt_ *** 2185,2190 --- 2185,2252 return true; } + /* Computing the costs for calculating RTX with CODE in MODE. */ + + static unsigned + compute_costs (enum machine_mode mode, enum rtx_code code, bool speed) + { + rtx seq; + rtx set; + unsigned cost = 0; + + start_sequence (); + + switch (GET_RTX_LENGTH (code)) + { + case 2: + force_operand (gen_rtx_fmt_ee (code, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)), + NULL_RTX); + break; + case 3: + /* FMA expressions are not handled by force_operand. */ + expand_ternary_op (mode, fma_optab, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3), + NULL_RTX, false); + break; + default: + gcc_unreachable (); + } + + seq = get_insns (); + end_sequence (); + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Calculating costs of %s in %s mode. Sequence is:\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode)); + print_rtl (dump_file, seq); + } + + for (; seq; seq = NEXT_INSN (seq)) + { + set = single_set (seq); + if (set) + cost += rtx_cost (set, SET, speed); + else + cost++; + } + + /* If the backend returns a cost of zero it is most certainly lying. + Set this to one in order to notice that we already calculated it + once. */ + cost = cost ? cost : 1; + + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, %s in %s costs %d\n\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode), cost); + + return cost; + } + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ *** convert_mult_to_fma (gimple mul_stmt, tr *** 2197,2202 --- 2259,2270 gimple use_stmt, neguse_stmt, fma_stmt; use_operand_p use_p; imm_use_iterator imm_iter; + enum machine_mode mode; + int uses = 0; + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[NUM_MACHINE_MODES]; + static unsigned add_cost[NUM_MACHINE_MODES]; + static unsigned fma_cost[NUM_MACHINE_MODES]; if (FLOAT_TYPE_P (type) flag_fp_contract_mode == FP_CONTRACT_OFF) *** convert_mult_to_fma (gimple mul_stmt, tr *** 2213,
PING: PATCH [4/n] X32: Use ptr_mode for vtable adjustment
Hi Richard, Uros, Is this patch OK? Thanks. H.J. --- On Sun, Jul 10, 2011 at 6:47 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Jul 9, 2011 at 3:58 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Jul 9, 2011 at 3:43 PM, Richard Henderson r...@redhat.com wrote: On 07/09/2011 02:36 PM, H.J. Lu wrote: Hi, Thunk is in ptr_mode, not Pmode. OK for trunk? Thanks. H.J. --- 2011-07-09 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (x86_output_mi_thunk): Use ptr_mode instead of Pmode for vtable adjustment. Not ok. This is incoherent in its treatment of Pmode vs ptr_mode. You're creating an addition (plus:P (reg:ptr tmp) (reg:P tmp2)) It is because thunk is stored in ptr_mode, not Pmode. I have a queued patch that replaces all of this with rtl. I will post it later today. I will update it for x32 after your change is checked in. I am testing this updated patch. OK for trunk if it works? Thanks. -- H.J. --- 2011-07-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (x86_output_mi_thunk): Support ptr_mode != Pmode. * config/i386/i386.md (*addsi_1_zext): Renamed to ... (addsi_1_zext): This. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a46101b..d6744be 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -29346,7 +29673,7 @@ x86_output_mi_thunk (FILE *file, /* Adjust the this parameter by a value stored in the vtable. */ if (vcall_offset) { - rtx vcall_addr, vcall_mem; + rtx vcall_addr, vcall_mem, this_mem; unsigned int tmp_regno; if (TARGET_64BIT) @@ -29361,7 +29688,10 @@ x86_output_mi_thunk (FILE *file, } tmp = gen_rtx_REG (Pmode, tmp_regno); - emit_move_insn (tmp, gen_rtx_MEM (ptr_mode, this_reg)); + this_mem = gen_rtx_MEM (ptr_mode, this_reg); + if (Pmode == DImode ptr_mode == SImode) + this_mem = gen_rtx_ZERO_EXTEND (DImode, this_mem); + emit_move_insn (tmp, this_mem); /* Adjust the this parameter. */ vcall_addr = plus_constant (tmp, vcall_offset); @@ -29373,8 +29703,13 @@ x86_output_mi_thunk (FILE *file, vcall_addr = gen_rtx_PLUS (Pmode, tmp, tmp2); } - vcall_mem = gen_rtx_MEM (Pmode, vcall_addr); - emit_insn (ix86_gen_add3 (this_reg, this_reg, vcall_mem)); + vcall_mem = gen_rtx_MEM (ptr_mode, vcall_addr); + if (Pmode == DImode ptr_mode == SImode) + emit_insn (gen_addsi_1_zext (this_reg, + gen_rtx_REG (SImode, REGNO (this_reg)), + vcall_mem)); + else + emit_insn (ix86_gen_add3 (this_reg, this_reg, vcall_mem)); } /* If necessary, drop THIS back to its stack slot. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index a52941b..3136fd0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -5508,11 +5574,11 @@ ;; operands so proper swapping will be done in reload. This allow ;; patterns constructed from addsi_1 to match. -(define_insn *addsi_1_zext +(define_insn addsi_1_zext [(set (match_operand:DI 0 register_operand =r,r,r) (zero_extend:DI (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r,r) (match_operand:SI 2 general_operand g,0,li (clobber (reg:CC FLAGS_REG))] TARGET_64BIT ix86_binary_operator_ok (PLUS, SImode, operands) {
Re: More mudflap fixes for Solaris 11
Hi, Rainer - When testing libmudflap on Solaris 8, 9, and 10 with GNU ld, I found a couple of testsuite failures: [...] Ok for mainline? Yes, thank you! - FChE
Re: Define [CD]TORS_SECTION_ASM_OP on Solaris/x86 with Sun ld
Hi, Rainer - On Mon, Jul 11, 2011 at 06:34:27PM +0200, Rainer Orth wrote: [...] On the other hand, there's the question why tree-mudflap.c tries to create a constructor with a non-default priority on a platform with SUPPORTS_INIT_PRIORITY == 0 or at all [...] For the at all part, I believe the intent was to make it more likely that mudflap-tracked literals be tracked early enough so that other constructors would find them already available for checking. - FChE
Re: [build] Remove crt0, mcrt0 support
Jan Beulich jbeul...@novell.com writes: Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM which variant would you prefer: obsoletion now and removal in 4.8 or immediate removal? Both are fine with me, so unless someone else objects immediate removal would seem better given it had been pretty much unmaintained. Right: it would be a one-time offort to remove the support, but subsequent cleanups wouldn't have to deal with the effectively dead code. I had a quick look and it doesn't seem hard: apart from removing the netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc and libgcc/config.host changes), there's only a small list (apart from netware-related target triplets in the testsuite): config/elf.m4 configure.ac contrib/config-list.mk gcc/config/i386/i386.c gcc/config/i386/i386.h gcc/doc/extend.texi libstdc++-v3/crossconfig.m4 configure.ac may have to stay if binutils/src wants to retain the report, but that's about it. Let's see what the release managers/global reviewers think. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
Richard Guenther wrote: On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the widening_mul pass might increase the number of multiplications in the code by transforming a = b * c d = a + 2 e = a + 3 into: d = b * c + 2 e = b * c + 3 under the assumption that an FMA instruction is not more expensive than a simple add. This certainly isn't always true. While e.g. on s390 an fma is indeed not slower than an add execution-wise it has disadvantages regarding instruction grouping. It doesn't group with any other instruction what has a major impact on the instruction dispatch bandwidth. The following patch tries to figure out the costs for adds, mults and fmas by building an RTX and asking the backends cost function in order to estimate whether it is whorthwhile doing the transformation. With that patch the 436.cactus hotloop contains 28 less multiplications than before increasing performance slightly (~2%). Bootstrapped and regtested on x86_64 and s390x. Ick ;) Maybe this is finally the time to introduce target hook(s) to get us back costs for trees? For this case we'd need two actually, or just one - dependent on what finegrained information we pass. Choices: tree_code_cost (enum tree_code) tree_code_cost (enum tree_code, enum machine_mode mode) unary_cost (enum tree_code, tree actual_arg0) // args will be mostly SSA names or constants, but at least they are typed - works for mixed-typed operations binary_cost (...) ... unary_cost (enum tree_code, enum tree_code arg0_kind) // constant vs. non-constant arg, but lacks type/mode Richard. What's bad with rtx_costs? Yet another cost function might duplicate cost computation in a backend -- once on trees and once on RTXs. BTW: For a port I read rtx_costs from insn attributes which helped me to clean up code in rtx_costs to a great extend. In particular for a target with complex instructions which are synthesized by insn combine, rtx_costs is mostly mechanical and brain-dead retyping of bulk of code that is already present almost identical in insn-recog.c. Johann
Re: Define [CD]TORS_SECTION_ASM_OP on Solaris/x86 with Sun ld
Hi Frank, On Mon, Jul 11, 2011 at 06:34:27PM +0200, Rainer Orth wrote: [...] On the other hand, there's the question why tree-mudflap.c tries to create a constructor with a non-default priority on a platform with SUPPORTS_INIT_PRIORITY == 0 or at all [...] For the at all part, I believe the intent was to make it more likely that mudflap-tracked literals be tracked early enough so that other constructors would find them already available for checking. I see. I'm still undecided who's responsibility it is to deal with the !SUPPORTS_INIT_PRIORITY case. On one hand one might argue that only the callers can decide if a non-default priority is strictly required or just an improvement, OTOH silently ignoring the priority and causing constructors not to be run at all doesn't seem a winning proposition either ;-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PATCH [3/n] X32: Promote pointers to Pmode
On 07/13/2011 07:02 AM, H.J. Lu wrote: Hi Richard, Is my patch OK? No, I don't think it is. r~
Re: PATCH [3/n] X32: Promote pointers to Pmode
On Wed, Jul 13, 2011 at 8:27 AM, Richard Henderson r...@redhat.com wrote: On 07/13/2011 07:02 AM, H.J. Lu wrote: Hi Richard, Is my patch OK? No, I don't think it is. What is your suggestion? -- H.J.
Re: PING: PATCH [4/n] X32: Use ptr_mode for vtable adjustment
On 07/13/2011 07:39 AM, H.J. Lu wrote: * config/i386/i386.c (x86_output_mi_thunk): Support ptr_mode != Pmode. * config/i386/i386.md (*addsi_1_zext): Renamed to ... (addsi_1_zext): This. Ok, except, + if (Pmode == DImode ptr_mode == SImode) if (Pmode != ptr_mode) in two locations. + this_mem = gen_rtx_ZERO_EXTEND (DImode, this_mem); Pmode +gen_rtx_REG (SImode, REGNO (this_reg)), ptr_mode. r~
Re: PATCH [3/n] X32: Promote pointers to Pmode
On 07/13/2011 08:35 AM, H.J. Lu wrote: On Wed, Jul 13, 2011 at 8:27 AM, Richard Henderson r...@redhat.com wrote: On 07/13/2011 07:02 AM, H.J. Lu wrote: Hi Richard, Is my patch OK? No, I don't think it is. What is your suggestion? Promote the return value. If that means it doesn't match function_value, then I suggest that function_value is wrong. r~
[PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac
Hallo! Diffing the make log of a build of GCC with SHELL not explicitly set (thus /bin/sh, which is bash) and one with SHELL=/bin/dash, I found the following unexpected difference: -checking assembler for eh_frame optimization... yes +checking assembler for eh_frame optimization... buggy This is from gcc/configure; which invokes acinclude.m4:gcc_GAS_CHECK_FEATURE for the ``eh_frame optimization'' check. Latter case, gcc/config.log: configure:22282: checking assembler for eh_frame optimization configure:22327: /usr/bin/as --32 -o conftest.o conftest.s 5 conftest.s: Assembler messages: conftest.s: Warning: end of file in string; '' inserted conftest.s:13: Warning: unterminated string; newline inserted There, the following happens: $ sh # This is bash. sh-4.1$ echo '.ascii z\0' .ascii z\0 This is what GCC expects. However, with dash: $ dash $ echo '.ascii z\0' .ascii z The backslash escape and everything after is cut off. The test in gcc/configure.ac: gcc_GAS_CHECK_FEATURE(eh_frame optimization, gcc_cv_as_eh_frame, [elf,2,12,0],, [ .text [...] .byte 0x1 .ascii z\0 .byte 0x1 [...] As quickly determined in #gcc with Ian's and Ismail's help, this is unportable usage of the echo builtin (and also at least questionable for /bin/echo), so I'm suggesting the following simple fix: gcc/ * configure.ac (eh_frame optimization): Avoid unportable shell feature. diff --git a/gcc/configure.ac b/gcc/configure.ac index c2163bf..73f0209 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -2538,7 +2538,7 @@ __FRAME_BEGIN__: .LSCIE1: .4byte 0x0 .byte 0x1 - .ascii z\0 + .asciz z .byte 0x1 .byte 0x78 .byte 0x1a Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is doing, for example. Grüße, Thomas pgpoH48Y3yGDm.pgp Description: PGP signature
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. What happens if you just return NULL instead of the assertion (good idea adding it!)? Of course then you need to: 1) check the return values of convert_memory_address_addr_space_1, and propagate NULL up to simplify_unary_operation; 2) check in simplify-rtx.c whether the return value of convert_memory_address_1 is NULL, and only return if the return value is not NULL. This is not yet necessary (convert_memory_address is the last transformation for both SIGN_EXTEND and ZERO_EXTEND) but it is better to keep code clean. I will give it a try. Thanks, did you get any result? There's no I think in this code. So even if I cannot approve it, I'd really like to see a version that I understand and that is clearly conservative, if it works. Paolo
[Patch,AVR]: Cleanup readonly_data_section et al.
This patch removes some special treatment from avr/elf.h which is actually not needed. The only target supported by avr is ELF and the defaults for READONLY_DATA_SECTION_ASM_OP, TARGET_HAVE_SWITCHABLE_BSS_SECTIONS, and TARGET_ASM_SELECT_SECTION are fine. Using default for TARGET_ASM_SELECT_SECTION brings the additional benefit that constant merging is enabled. AVR is specific because it is Harvard Architecture so that all constants have to be in .data, i.e. .rodata is part of .data. This is accomplished by default linker scripts, so there is no need to set readonly_data_section = data_section in avr_asm_init_sections. Changes in testsuite run are: * gcc.dg/debug/dwarf2/dwarf-merge.c: UNSUPPORTED - PASS * gcc.dg/array-quals-1.c: XFAIL - PASS * g++.dg/opt/const4.C: FAIL - PASS There's no avr maintainer approving at the moment, so approving by global reviewer is much appreciated. Ok to commit? Johann gcc/ * config/avr/elf.h (TARGET_ASM_SELECT_SECTION): Remove, i.e. use default_elf_select_section. (TARGET_HAVE_SWITCHABLE_BSS_SECTIONS): Remove. (READONLY_DATA_SECTION_ASM_OP): Remove. (TARGET_ASM_NAMED_SECTION): Move from here... * config/avr/avr.c: ...to here. (avr_asm_init_sections): Set unnamed callback of readonly_data_section. (avr_asm_named_section): Make static. testsuite/ * gcc.dg/array-quals-1.c: Don't xfail on AVR. Index: config/avr/elf.h === --- config/avr/elf.h (revision 176136) +++ config/avr/elf.h (working copy) @@ -26,24 +26,12 @@ #undef PREFERRED_DEBUGGING_TYPE #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG -#undef TARGET_ASM_NAMED_SECTION -#define TARGET_ASM_NAMED_SECTION avr_asm_named_section - -/* Use lame default: no string merging, ... */ -#undef TARGET_ASM_SELECT_SECTION -#define TARGET_ASM_SELECT_SECTION default_select_section - #undef MAX_OFILE_ALIGNMENT #define MAX_OFILE_ALIGNMENT (32768 * 8) -#undef TARGET_HAVE_SWITCHABLE_BSS_SECTIONS - #undef STRING_LIMIT #define STRING_LIMIT ((unsigned) 64) -/* Setup `readonly_data_section' in `avr_asm_init_sections'. */ -#undef READONLY_DATA_SECTION_ASM_OP - /* Take care of `signal' and `interrupt' attributes. */ #undef ASM_DECLARE_FUNCTION_NAME #define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL) \ Index: config/avr/avr.c === --- config/avr/avr.c (revision 176141) +++ config/avr/avr.c (working copy) @@ -194,8 +194,8 @@ static const struct attribute_spec avr_a #undef TARGET_SECTION_TYPE_FLAGS #define TARGET_SECTION_TYPE_FLAGS avr_section_type_flags -/* `TARGET_ASM_NAMED_SECTION' must be defined in avr.h. */ - +#undef TARGET_ASM_NAMED_SECTION +#define TARGET_ASM_NAMED_SECTION avr_asm_named_section #undef TARGET_ASM_INIT_SECTIONS #define TARGET_ASM_INIT_SECTIONS avr_asm_init_sections #undef TARGET_ENCODE_SECTION_INFO @@ -5091,8 +5091,11 @@ avr_asm_init_sections (void) progmem_section = get_unnamed_section (AVR_HAVE_JMP_CALL ? 0 : SECTION_CODE, avr_output_progmem_section_asm_op, NULL); - readonly_data_section = data_section; + /* Override section callbacks to keep track of `avr_need_clear_bss_p' + resp. `avr_need_copy_data_p'. */ + + readonly_data_section-unnamed.callback = avr_output_data_section_asm_op; data_section-unnamed.callback = avr_output_data_section_asm_op; bss_section-unnamed.callback = avr_output_bss_section_asm_op; } @@ -5101,7 +5104,7 @@ avr_asm_init_sections (void) /* Implement `TARGET_ASM_NAMED_SECTION'. */ /* Track need of __do_clear_bss, __do_copy_data for named sections. */ -void +static void avr_asm_named_section (const char *name, unsigned int flags, tree decl) { if (!avr_need_copy_data_p) Index: testsuite/gcc.dg/array-quals-1.c === --- testsuite/gcc.dg/array-quals-1.c (revision 176136) +++ testsuite/gcc.dg/array-quals-1.c (working copy) @@ -4,7 +4,7 @@ /* Origin: Joseph Myers j...@polyomino.org.uk */ /* { dg-do compile } */ /* The MMIX port always switches to the .data section at the end of a file. */ -/* { dg-final { scan-assembler-not \\.data(?!\\.rel\\.ro) { xfail powerpc*-*-aix* mmix-*-* x86_64-*-mingw* picochip--*-* avr-*-*} } } */ +/* { dg-final { scan-assembler-not \\.data(?!\\.rel\\.ro) { xfail powerpc*-*-aix* mmix-*-* x86_64-*-mingw* picochip--*-* } } } */ static const int a[2] = { 1, 2 }; const int a1[2] = { 1, 2 }; typedef const int ci;
Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac
On 07/13/2011 06:13 PM, Thomas Schwinge wrote: Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is doing, for example. Change instead echo ifelse(...) conftest.s to AS_ECHO([m4_if(...)]) conftest.s in gcc_GAS_CHECK_FEATURE. Paolo
[PATCH, testsuite]: Use istarget everywhere
Hello! Attached patch converts several places where string match or regexp on $target_triplet is used with istarget. The patch also removes quotes around target string. 2011-07-13 Uros Bizjak ubiz...@gmail.com * lib/g++.exp (g++_init): Use istarget. Remove target_triplet global. * lib/obj-c++.exp (obj-c++_init): Ditto. * lib/file-format.exp (gcc_target_object_format): Ditto. * lib/target-supports-dg.exp (dg-require-dll): Ditto. * lib/target-supports-dg-exp (check_weak_available): Ditto. (check_visibility_available): Ditto. (check_effective_target_tls_native): Ditto. (check_effective_target_tls_emulated): Ditto. (check_effective_target_function_sections): Ditto. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: lib/g++.exp === --- lib/g++.exp (revision 176236) +++ lib/g++.exp (working copy) @@ -188,7 +188,6 @@ global TOOL_EXECUTABLE TOOL_OPTIONS global GXX_UNDER_TEST global TESTING_IN_BUILD_TREE -global target_triplet global gcc_warning_prefix global gcc_error_prefix @@ -263,7 +262,7 @@ set gcc_warning_prefix warning: set gcc_error_prefix error: -if { [string match *-*-darwin* $target_triplet] } { +if { [istarget *-*-darwin*] } { lappend ALWAYS_CXXFLAGS ldflags=-multiply_defined suppress } Index: lib/obj-c++.exp === --- lib/obj-c++.exp (revision 176236) +++ lib/obj-c++.exp (working copy) @@ -210,7 +210,6 @@ global TOOL_EXECUTABLE TOOL_OPTIONS global OBJCXX_UNDER_TEST global TESTING_IN_BUILD_TREE -global target_triplet global gcc_warning_prefix global gcc_error_prefix @@ -270,7 +269,7 @@ set gcc_warning_prefix warning: set gcc_error_prefix error: -if { [string match *-*-darwin* $target_triplet] } { +if { [istarget *-*-darwin*] } { lappend ALWAYS_OBJCXXFLAGS ldflags=-multiply_defined suppress } @@ -299,7 +298,7 @@ # we need to add the include path for the gnu runtime if that is in # use. # First, set the default... -if { [istarget *-*-darwin*] } { +if { [istarget *-*-darwin*] } { set nextruntime 1 } else { set nextruntime 0 Index: lib/scanasm.exp === --- lib/scanasm.exp (revision 176236) +++ lib/scanasm.exp (working copy) @@ -461,10 +461,10 @@ } } -if { [istarget hppa*-*-*] } { +if { [istarget hppa*-*-*] } { set pattern [format {\t;[^:]+:%d\n(\t[^\t]+\n)+%s:\n\t.PROC} \ $line $symbol] -} elseif { [istarget mips-sgi-irix*] } { +} elseif { [istarget mips-sgi-irix*] } { set pattern [format {\t\.loc [0-9]+ %d 0( [^\n]*)?\n\t\.set\t(no)?mips16\n\t\.ent\t%s\n\t\.type\t%s, @function\n%s:\n} \ $line $symbol $symbol $symbol] } else { Index: lib/file-format.exp === --- lib/file-format.exp (revision 176236) +++ lib/file-format.exp (working copy) @@ -24,17 +24,16 @@ proc gcc_target_object_format { } { global gcc_target_object_format_saved -global target_triplet global tool if [info exists gcc_target_object_format_saved] { verbose gcc_target_object_format returning saved $gcc_target_object_format_saved 2 -} elseif { [string match *-*-darwin* $target_triplet] } { +} elseif { [istarget *-*-darwin*] } { # Darwin doesn't necessarily have objdump, so hand-code it. set gcc_target_object_format_saved mach-o -} elseif { [string match hppa*-*-hpux* $target_triplet] } { +} elseif { [istarget hppa*-*-hpux*] } { # HP-UX doesn't necessarily have objdump, so hand-code it. - if { [string match hppa*64*-*-hpux* $target_triplet] } { + if { [istarget hppa*64*-*-hpux*] } { set gcc_target_object_format_saved elf } else { set gcc_target_object_format_saved som Index: lib/target-libpath.exp === --- lib/target-libpath.exp (revision 176236) +++ lib/target-libpath.exp (working copy) @@ -272,11 +272,11 @@ proc get_shlib_extension { } { global shlib_ext -if { [ istarget *-*-darwin* ] } { +if { [istarget *-*-darwin*] } { set shlib_ext dylib -} elseif { [ istarget *-*-cygwin* ] || [ istarget *-*-mingw* ] } { +} elseif { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } { set shlib_ext dll -} elseif { [ istarget hppa*-*-hpux* ] } { +} elseif { [istarget hppa*-*-hpux*] } { set shlib_ext sl } else { set shlib_ext so Index: lib/go-torture.exp === --- lib/go-torture.exp (revision 176236) +++
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote: On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. It leads to different values in case of symbol (0xfffc) + 8. Basically the current test for 31-bit (or less) precision is bogus. The real question is for a address computation, A + B, if address wrap-around is supported in convert_memory_address_addr_space. What happens if you just return NULL instead of the assertion (good idea adding it!)? Of course then you need to: 1) check the return values of convert_memory_address_addr_space_1, and propagate NULL up to simplify_unary_operation; 2) check in simplify-rtx.c whether the return value of convert_memory_address_1 is NULL, and only return if the return value is not NULL. This is not yet necessary (convert_memory_address is the last transformation for both SIGN_EXTEND and ZERO_EXTEND) but it is better to keep code clean. I will give it a try. Thanks, did you get any result? There's no I think in this code. So even if I cannot approve it, I'd really like to see a version that I understand and that is clearly conservative, if it works. I opened a new bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721 My current code looks like: case CONST: temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0), as, no_emit, ignore_address_wrap_around); return temp ? gen_rtx_CONST (to_mode, temp) : temp; break; case PLUS: case MULT: /* For addition we can safely permute the conversion and addition operation if one operand is a constant, address wrap-around is ignored and we are using a ptr_extend instruction or zero-extending (POINTERS_EXTEND_UNSIGNED != 0). We can always safely permute them if we are making the address narrower. */ if (GET_MODE_SIZE (to_mode) GET_MODE_SIZE (from_mode) || (GET_CODE (x) == PLUS CONST_INT_P (XEXP (x, 1)) (POINTERS_EXTEND_UNSIGNED != 0 ignore_address_wrap_around))) return gen_rtx_fmt_ee (GET_CODE (x), to_mode, convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0), as, no_emit, ignore_address_wrap_around), XEXP (x, 1)); break; -- H.J.
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On Wed, Jul 13, 2011 at 18:39, H.J. Lu hjl.to...@gmail.com wrote: Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. No, only if we have ptr_extend. It may be buggy as well, but let's make sure first that x32 is done right, then perhaps whoever cares can fix ptr_extend if it has to be fixed. I don't know the semantics of ia64 addp4 so I cannot tell. I opened a new bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721 Good, thanks. My current code looks like: case CONST: temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0), as, no_emit, ignore_address_wrap_around); Here I stopped reading. It's not what I asked for, so at least you should say clearly _why_. Paolo
Re: RFA: Avoid unnecessary clearing in union initialisers
On Tue, Jul 12, 2011 at 9:34 AM, Richard Sandiford richard.sandif...@linaro.org wrote: PR 48183 is caused by the fact that we don't really support integers (or least integer constants) wider than 2*HOST_BITS_PER_WIDE_INT: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01220.html However, such constants shouldn't be needed in normal use. They came from an unnecessary zero-initialisation of a union such as: union { a f1; b f2; } u = { init_f1 }; where f1 and f2 are the full width of the union. The zero-initialisation gets optimised away for real insns, but persists in debug insns: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01585.html This patch takes up Richard's idea here: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01987.html categorize_ctor_elements currently tries to work out how many scalars a constructor initialises (IE) and how many of those scalars are zero (ZE). Callers can then call count_type_elements to find out how many scalars (TE) ought to be initialised if the constructor is complete (i.e. if it explicitly initialises every meaningful byte, rather than relying on default zero-initialisation). The constructor is complete if TE == ZE, except as noted in [A] below. However, count_type_elements can't return the required TE for unions, because it would need to know which of the union's fields was initialised by the constructor (if any). This choice of field is reflected in IE and ZE, so would need to be reflected in TE as well. count_type_elements therefore punts on unions. However, the caller can't easily tell whether it punts because of that, because of overflow, of because of variable-sized types. [A] One particular case of interest is when a union constructor initialises a field that is shorter than the union. In this case, the rest of the union must be zeroed in order to ensure that the other fields have predictable values. categorize_ctor_elements has a special out-parameter to reccord this situation. This leads to quite a complicated interface. The patch tries to simplify it by making categorize_ctor_elements keep track of whether a constructor is complete. This also has the minor advantage of avoiding double recursion: first through the constructor, then through its type tree. After this change, ZE and IE are only needed when deciding how best to implement complete initialisers (such as whether to do a bulk zero initialisation anyway, and just write the nonzero elements individually). For cases where a leaf constructor element is itself an aggregate with a union, we can therefore estimate the number of scalars in the union, and hopefully make the heuristic a bit more accurate than the current 1: HOST_WIDE_INT tc = count_type_elements (TREE_TYPE (value), true); if (tc 1) tc = 1; cp/typeck2.c also wants to check whether the variable parts of a constructor are complete. The patch uses the approach to completeness there. This should make it a bit more general than the current code, which only deals with non-nested constructors. Tested on x86_64-linux-gnu (all languages, including Ada), and on arm-linux-gnueabi. OK to install? Richard gcc/ * tree.h (categorize_ctor_elements): Remove comment. Fix long line. (count_type_elements): Delete. (complete_ctor_at_level_p): Declare. * expr.c (flexible_array_member_p): New function, split out from... (count_type_elements): ...here. Make static. Replace allow_flexarr parameter with for_ctor_p. When for_ctor_p is true, return the number of elements that should appear in the top-level constructor, otherwise return an estimate of the number of scalars. (categorize_ctor_elements): Replace p_must_clear with p_complete. (categorize_ctor_elements_1): Likewise. Use complete_ctor_at_level_p. (complete_ctor_at_level_p): New function, borrowing union logic from old categorize_ctor_elements_1. (mostly_zeros_p): Return true if the constructor is not complete. (all_zeros_p): Update call to categorize_ctor_elements. * gimplify.c (gimplify_init_constructor): Update call to categorize_ctor_elements. Don't call count_type_elements. Unconditionally prevent clearing for variable-sized types, otherwise rely on categorize_ctor_elements to detect incomplete initializers. gcc/cp/ * typeck2.c (split_nonconstant_init_1): Pass the initializer directly, rather than a pointer to it. Return true if the whole of the value was initialized by the generated statements. Use complete_ctor_at_level_p instead of count_type_elements. gcc/testsuite/ 2011-07-12 Chung-Lin Tang clt...@codesourcery.com * gcc.target/arm/pr48183.c: New test. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736 -- H.J.
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. No, only if we have ptr_extend. Oops, hit send too early, I understand now what you mean. But even more so, let's make sure x32 is done right so that perhaps we can remove the bogus test on XEXP (x, 1) for other Pmode != ptr_mode targets, non-ptr_extend. Then we can worry perhaps of POINTERS_EXTEND_UNSIGNED 0. Paolo
Re: [build] Move crtfastmath to toplevel libgcc
Richard Henderson r...@redhat.com writes: On 07/11/2011 10:26 AM, Rainer Orth wrote: There's one other question here: alpha/t-crtfm uses -frandom-seed=gcc-crtfastmath with this comment: # FIXME drow/20061228 - I have preserved this -frandom-seed option # while migrating this rule from the GCC directory, but I do not # know why it is necessary if no other crt file uses it. Is there any particular reason to either keep this or not to use it in the generic file? This way, only i386 needs to stay separate with its use of -msse -minline-all-stringops. This random-seed thing is there for the mangled name we build for the constructor on Tru64. It's not needed for any target for which a .ctors section is supported. It also doesn't hurt, so you could move it to any generic build rule. This is what I've done. Here's the revised patch, currently bootstrapping on alpha-dec-osf5.1b and well into building the target libraries. After committing the Darwin crt[23].o patch and before continuing with the i386/crtprec??.o one, I noticed that this would leave Darwin/x86 in a broken state: gcc/config/i386/t-crtfm still has EXTRA_PARTS += crtfastmath.o which is missing in libgcc/config.host, thus the extra_parts comparison will fail and break bootstrap ;-( Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? Thanks. Rainer 2011-07-10 Rainer Orth r...@cebitec.uni-bielefeld.de gcc: * config/alpha/crtfastmath.c: Move to ../libgcc/config/alpha. * config/alpha/t-crtfm: Remove. * config/i386/crtfastmath.c: Move to ../libgcc/config/i386. * config/i386/t-crtfm: Remove. * config/ia64/crtfastmath.c: Move to ../libgcc/config/ia64. * config/mips/crtfastmath.c: Move to ../libgcc/config/mips. * config/sparc/crtfastmath.c: Move to ../libgcc/config/sparc. * config/sparc/t-crtfm: Remove. * config.gcc (alpha*-*-linux*): Remove alpha/t-crtfm from tmake_file. (alpha*-*-freebsd*): Likewise. (i[34567]86-*-darwin*): Remove i386/t-crtfm from tmake_file. (x86_64-*-darwin*): Likewise. (i[34567]86-*-linux*): Likewise. (x86_64-*-linux*): Likewise. (x86_64-*-mingw*): Likewise. (ia64*-*-elf*): Remove crtfastmath.o from extra_parts. (ia64*-*-freebsd*): Likewise. (ia64*-*-linux*): Likewise. (mips64*-*-linux*): Likewise. (mips*-*-linux*): Likewise. (sparc-*-linux*): Remove sparc/t-crtfm from tmake_file. (sparc64-*-linux*): Likewise. (sparc64-*-freebsd*): Likewise. libgcc: * config/alpha/crtfastmath.c: New file. * config/i386/crtfastmath.c: New file. * config/ia64/crtfastmath.c: New file. * config/mips/crtfastmath.c: New file. * config/sparc/crtfastmath.c: New file. * config/t-crtfm (crtfastmath.o): Use $(srcdir) to refer to crtfastmath.c. Add -frandom-seed=gcc-crtfastmath. * config/alpha/t-crtfm: Remove. * config/i386/t-crtfm: Use $(srcdir) to refer to crtfastmath.c. * config/ia64/t-ia64 (crtfastmath.o): Remove. * config.host (alpha*-*-linux*): Replace alpha/t-crtfm by t-crtfm. (alpha*-dec-osf5.1*): Likewise. (alpha*-*-freebsd*): Add t-crtfm to tmake_file. Add crtfastmath.o to extra_parts. (i[34567]86-*-darwin*): Add i386/t-crtfm to tmake_file. Add crtfastmath.o to extra_parts. (x86_64-*-darwin*): Likewise. (x86_64-*-mingw*): Likewise. (ia64*-*-elf*): Add t-crtfm to tmake_file. (ia64*-*-freebsd*): Likewise. (ia64*-*-linux*): Likewise. (sparc64-*-freebsd*): Add t-crtfm to tmake_file. Add crtfastmath.o to extra_parts. diff --git a/gcc/config.gcc b/gcc/config.gcc --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -756,13 +756,13 @@ alpha*-*-linux*) tm_file=${tm_file} alpha/elf.h alpha/linux.h alpha/linux-elf.h glibc-stdint.h extra_options=${extra_options} alpha/elf.opt target_cpu_default=MASK_GAS - tmake_file=${tmake_file} alpha/t-crtfm alpha/t-alpha alpha/t-ieee alpha/t-linux + tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux ;; alpha*-*-freebsd*) tm_file=${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h extra_options=${extra_options} alpha/elf.opt target_cpu_default=MASK_GAS - tmake_file=${tmake_file} alpha/t-crtfm alpha/t-alpha alpha/t-ieee + tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o ;; alpha*-*-netbsd*) @@ -1208,12 +1208,12 @@ i[34567]86-*-darwin*) need_64bit_isa=yes # Baseline choice for a machine that allows m64 support. with_cpu=${with_cpu:-core2} - tmake_file=${tmake_file} t-slibgcc-dummy i386/t-crtpc i386/t-crtfm + tmake_file=${tmake_file}
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On 07/13/2011 06:13 AM, Andreas Krebbel wrote: + force_operand (gen_rtx_fmt_ee (code, mode, +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)), + NULL_RTX); + break; + case 3: + /* FMA expressions are not handled by force_operand. */ + expand_ternary_op (mode, fma_optab, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3), + NULL_RTX, false); Why the force_operand? You've got register inputs. Either the target is going to support the operation or it isn't. Seems to me you can check the availability of the operation in the optab and pass that gen_rtx_fmt_ee result to rtx_cost directly. + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[NUM_MACHINE_MODES]; + static unsigned add_cost[NUM_MACHINE_MODES]; + static unsigned fma_cost[NUM_MACHINE_MODES]; ... + if (!fma_cost[mode]) + { + fma_cost[mode] = compute_costs (mode, FMA, speed); + add_cost[mode] = compute_costs (mode, PLUS, speed); + mul_cost[mode] = compute_costs (mode, MULT, speed); + } Saving cost data dependent on speed, which is non-constant. You probably need to make this a two dimensional array. r~
[RFC] More compact (100x) -g3 .debug_macinfo
Hi! Currently .debug_macinfo is prohibitively large, because it doesn't allow for any kind of merging of duplicate debug information. This patch is an RFC for extensions that allow it to bring it down to manageable levels. The ideas for the first shrinking come from Jason and/or Roland I think from last year and is similar to the introduction of DW_FORM_strp to replace DW_FORM_string in some cases. In particular, if the string in DW_MACINFO_define or DW_MACINFO_undef is larger than 4 bytes including terminating '\0' and there is a chance the string might occur more than once, instead an offset into .debug_str is used. The usual .debug_str string merging then kicks in and removes duplicities. The second savings come from merging of identical sequences of DW_MACINFO_define/undef ops. Usually, when you include some header, the macros it defines/undefines are the same. Unfortunately it is hard to merge whole headers, because: 1) DW_MACINFO_start_file uses .debug_line references, which prevent merging - different CUs have different .debug_line content 2) multiple inclusion of headers with single inclusion guards is quite common and results in such merging to be less than satisfactory, as if some header includes stdio.h and you include that header in one source file without prior inclusion of stdio.h and in a different one after #include stdio.h, suddenly the .debug_macinfo sequence for that header is different if it transitively includes included headers Unfortunately, as defined in DWARF{2,3,4}, .debug_macinfo is not really allowing extensions. DW_MACINFO_vendor_ext doesn't count, because its argument is a string, which certainly can't include embedded zeros needed for the offsets into other sections or other portions of the same section. The following approach just grabs a range of .debug_macinfo opcodes for vendor use, if the DWARF commitee would give such an approach a green light. .debug_macinfo has 256 possible opcodes and just defines 5 (plus 1 for termination), the remaining 250 are unused. Other alternative would be to come up with .debug_gnu_macinfo section or similar and defining a new DW_AT_GNU_macro_info attribute that would be used instead of DW_AT_macro_info, but I'd prefer to stay with .debug_macinfo. The newly added opcodes: DW_MACINFO_GNU_define_indirect4 0xe0 This opcode has two arguments, one is uleb128 lineno and the other is 4 byte offset into .debug_str. Except for the encoding of the string it is similar to DW_MACINFO_define. DW_MACINFO_GNU_undef_indirect4 0xe1 This opcode has two arguments, one is uleb128 lineno and the other is 4 byte offset into .debug_str. Except for the encoding of the string it is similar to DW_MACINFO_undef. DW_MACINFO_GNU_transparent_include4 0xe2 This opcode has a single argument, a 4 byte offset into .debug_macinfo. It instructs the debug info consumer that this opcode during reading should be replaced with the sequence of .debug_macinfo opcodes from the mentioned offset, up to a terminating 0 opcode (not including that 0). DW_MACINFO_GNU_define_opcode0xe3 This is an opcode for future extensibility through which a debugger could skip unknown opcodes. It has 3 arguments: 1 byte opcode number, uleb128 count of arguments and a count bytes long array, with a DW_FORM_* code how the argument is encoded. The debug info producers have to ensure that opcodes in DW_MACINFO_GNU_transparent_include4 chains reference the right sections for any .debug_macinfo that includes them (which essentially means that DW_MACINFO_start_file can't be used in the transparent_include4 chain. Perhaps cleaner would be not to define all offset sizes in the opcode values/names and instead have DW_MACINFO_GNU_define_indirect and DW_MACINFO_GNU_undef_indirect whose arguments would be DW_FORM_udata and DW_FORM_strp (i.e. offset size) - the producers would need to ensure that .debug_macinfo chains with different assumed offset size aren't merged together, which could be done e.g. by using wm4.[filename.]lineno.md5 and wm8.* comdat groups instead of the current wm.*. DW_MACINFO_GNU_transparent_include4 then would have DW_FORM_sec_offset single argument and DW_MACINFO_GNU_define_opcode would have DW_FORM_data1 and DW_FORM_block arguments and the implicit opcode definition assumed at the start of every .debug_macinfo would be: DW_MACINFO_GNU_define_opcode 0, 0 [] DW_MACINFO_GNU_define_opcode DW_MACINFO_define, 2 [DW_FORM_udata, DW_FORM_string] DW_MACINFO_GNU_define_opcode DW_MACINFO_undef, 2 [DW_FORM_udata, DW_FORM_string] DW_MACINFO_GNU_define_opcode DW_MACINFO_start_file, 2 [DW_FORM_udata, DW_FORM_sec_offset] DW_MACINFO_GNU_define_opcode DW_MACINFO_end_file, 1 [DW_FORM_udata] DW_MACINFO_GNU_define_opcode DW_MACINFO_GNU_define_indirect, 2 [DW_FORM_udata, DW_FORM_strp] DW_MACINFO_GNU_define_opcode
Re: [build] Move crtfastmath to toplevel libgcc
On 07/13/2011 09:57 AM, Rainer Orth wrote: Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? Probably. +# -frandom-seed is necessary to keep the mangled name of the constructor on +# Tru64 Unix stable, but harmless otherwise. Instead of implying permanent stability, I'd mention bootstrap comparison failures specifically. r~
Re: [build] Move crtfastmath to toplevel libgcc
Richard Henderson r...@redhat.com writes: On 07/13/2011 09:57 AM, Rainer Orth wrote: Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? Probably. Ok. I'll will take this on me to get us out of this mess. It has survived i386-pc-solaris2.11, sparc-sun-solaris2.11, x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the risk seems acceptable. +# -frandom-seed is necessary to keep the mangled name of the constructor on +# Tru64 Unix stable, but harmless otherwise. Instead of implying permanent stability, I'd mention bootstrap comparison failures specifically. Ok, will do. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac
Hallo! On Wed, 13 Jul 2011 18:23:50 +0200, Paolo Bonzini bonz...@gnu.org wrote: On 07/13/2011 06:13 PM, Thomas Schwinge wrote: Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is doing, for example. Change instead echo ifelse(...) conftest.s to AS_ECHO([m4_if(...)]) conftest.s in gcc_GAS_CHECK_FEATURE. Ah, even better. gcc/ * acinclude.m4 (gcc_GAS_CHECK_FEATURE): Use AS_ECHO instead of echo. * configure: Regenerate. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index ff38682..f092925 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -583,7 +583,7 @@ AC_CACHE_CHECK([assembler for $1], [$2], if test $in_tree_gas = yes; then gcc_GAS_VERSION_GTE_IFELSE($3, [[$2]=yes]) el])if test x$gcc_cv_as != x; then -echo ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]') conftest.s +AS_ECHO([ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')]) conftest.s if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o conftest.s AS_MESSAGE_LOG_FD]) then ifelse([$6],, [$2]=yes, [$6]) The configure differences are strictly s%echo%$as_echo%. Grüße, Thomas pgpwEtgyYnb5O.pgp Description: PGP signature
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
I just ran a spec 2006 run on the powerpc (32-bit) last night setting the reassociation to 2. I do see a win in bwaves, but unfortunately it is not enough of a win, and it is still a regression to GCC 4.5. However, I see some regressions in 3 other benchmarks (I tend to omit differences of less than 2%): 401.bzip2 97.99% 410.bwaves113.88% 436.cactusADM 93.96% 444.namd 93.74% The profile differences are as follows. Unfortunately, I'm not sure I can post sample counts under Spec rules: Bzip2: GCC 4.7 GCC 4.7 with patchesFunction === 28.96% 28.39% mainSort 15.94% 15.49% BZ2_decompress 12.56% 12.35% mainGtU.part.0 11.59% 11.54% generateMTFValues 8.89% 9.04% fallbackSort 6.60% 8.28% BZ2_compressBlock 7.48% 7.21% handle_compress.isra.2 6.24% 5.95% BZ2_bzDecompress 0.55% 0.58% add_pair_to_block 0.54% 0.54% BZ2_hbMakeCodeLengths Bwaves: GCC 4.7 GCC 4.7 with patchesFunction === 78.70% 74.73% mat_times_vec_ 11.68% 13.21% bi_cgstab_block_ 6.72% 8.47% shell_ 2.11% 2.62% jacobian_ 0.79% 0.96% flux_ CactusADM: GCC 4.7 GCC 4.7 with patchesFunction === 99.67% 99.69% bench_staggeredleapfrog2_ Namd: GCC 4.7 GCC 4.7 with patchesFunction === 15.43% 14.71% _ZN20ComputeNonbondedUtil26calc_pair_energy_fullelectEP9nonbonded.part.39 11.94% 11.80% _ZN20ComputeNonbondedUtil19calc_pair_fullelectEP9nonbonded.part.40 10.18% 11.52% _ZN20ComputeNonbondedUtil32calc_pair_energy_merge_fullelectEP9nonbonded.part.37 9.87% 9.02% _ZN20ComputeNonbondedUtil16calc_pair_energyEP9nonbonded.part.41 9.55% 8.85% _ZN20ComputeNonbondedUtil9calc_pairEP9nonbonded.part.42 9.52% 9.05% _ZN20ComputeNonbondedUtil25calc_pair_merge_fullelectEP9nonbonded.part.38 7.24% 8.72% _ZN20ComputeNonbondedUtil26calc_self_energy_fullelectEP9nonbonded.part.31 6.28% 6.42% _ZN20ComputeNonbondedUtil19calc_self_fullelectEP9nonbonded.part.32 5.23% 6.18% _ZN20ComputeNonbondedUtil32calc_self_energy_merge_fullelectEP9nonbonded.part.29 5.13% 4.66% _ZN20ComputeNonbondedUtil16calc_self_energyEP9nonbonded.part.33 4.72% 4.43% _ZN20ComputeNonbondedUtil25calc_self_merge_fullelectEP9nonbonded.part.30 4.60% 4.37% _ZN20ComputeNonbondedUtil9calc_selfEP9nonbonded.part.34 -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On Wed, Jul 13, 2011 at 9:54 AM, Paolo Bonzini bonz...@gnu.org wrote: Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. No, only if we have ptr_extend. Oops, hit send too early, I understand now what you mean. But even more so, let's make sure x32 is done right so that perhaps we can remove the bogus test on XEXP (x, 1) for other Pmode != ptr_mode targets, non-ptr_extend. Then we can worry perhaps of POINTERS_EXTEND_UNSIGNED 0. Here is the patch. OK for trunk? Thanks. -- H.J. 2011-07-12 H.J. Lu hongjiu...@intel.com PR middle-end/49721 * explow.c (convert_memory_address_addr_space_1): New. (convert_memory_address_addr_space): Use it. * expr.c (convert_modes_1): New. (convert_modes): Use it. * expr.h (convert_modes_1): New. * rtl.h (convert_memory_address_addr_space_1): New. (convert_memory_address_1): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Call convert_memory_address_1 instead of convert_memory_address. 2011-07-12 H.J. Lu hongjiu...@intel.com PR middle-end/49721 * explow.c (convert_memory_address_addr_space_1): New. (convert_memory_address_addr_space): Use it. * expr.c (convert_modes_1): New. (convert_modes): Use it. * expr.h (convert_modes_1): New. * rtl.h (convert_memory_address_addr_space_1): New. (convert_memory_address_1): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Call convert_memory_address_1 instead of convert_memory_address. diff --git a/gcc/explow.c b/gcc/explow.c index 3c692f4..8551fe8 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -320,8 +320,10 @@ break_out_memory_refs (rtx x) arithmetic insns can be used. */ rtx -convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED, - rtx x, addr_space_t as ATTRIBUTE_UNUSED) +convert_memory_address_addr_space_1 (enum machine_mode to_mode ATTRIBUTE_UNUSED, +rtx x, addr_space_t as ATTRIBUTE_UNUSED, +bool no_emit ATTRIBUTE_UNUSED, +bool ignore_address_wrap_around ATTRIBUTE_UNUSED) { #ifndef POINTERS_EXTEND_UNSIGNED gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode); @@ -377,28 +379,28 @@ convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED, break; case CONST: - return gen_rtx_CONST (to_mode, - convert_memory_address_addr_space - (to_mode, XEXP (x, 0), as)); + temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0), + as, no_emit, + ignore_address_wrap_around); + return temp ? gen_rtx_CONST (to_mode, temp) : temp; break; case PLUS: case MULT: - /* For addition we can safely permute the conversion and addition -operation if one operand is a constant and converting the constant -does not change it or if one operand is a constant and we are -using a ptr_extend instruction (POINTERS_EXTEND_UNSIGNED 0). -We can always safely permute them if we are making the address -narrower. */ + /* For addition, we can safely permute the conversion and addition +operation if one operand is a constant and we are using a +ptr_extend instruction (POINTERS_EXTEND_UNSIGNED 0) or address +wrap-around is ignored. We can always safely permute them if +we are making the address narrower. */ if (GET_MODE_SIZE (to_mode) GET_MODE_SIZE (from_mode) || (GET_CODE (x) == PLUS CONST_INT_P (XEXP (x, 1)) - (XEXP (x, 1) == convert_memory_address_addr_space - (to_mode, XEXP (x, 1), as) - || POINTERS_EXTEND_UNSIGNED 0))) + (POINTERS_EXTEND_UNSIGNED 0 + || ignore_address_wrap_around))) return gen_rtx_fmt_ee (GET_CODE (x), to_mode, - convert_memory_address_addr_space -(to_mode, XEXP (x, 0), as), + convert_memory_address_addr_space_1 +(to_mode, XEXP (x, 0), as, no_emit, + ignore_address_wrap_around), XEXP (x, 1)); break; @@ -406,10 +408,17 @@ convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED, break; } - return convert_modes (to_mode, from_mode, -
Avoid overriding LIB_THREAD_LDFLAGS_SPEC on Solaris 8 (PR target/49541)
As reported in the PR, LIB_THREAD_LDFLAGS_SPEC (effectively -L/usr/lib/lwp(/64)? -R/usr/lib/lwp(/64)? to make use of the alternate thread library on Solaris 8, which also provides the only implementation of __tls_get_addr) could be overridden by the regular -L flags from the %D spec for 64-bit compilations to find /lib/sparcv9/libthread.so instead, which lacks that function, causing link failures. This patch fixes this by moving the -L/-R flags from LIB_SPEC to LINK_SPEC which is before %D. Bootstrapped without regressions on sparc-sun-solaris2.8 and i386-pc-solaris2.8 by myself (though in a branded zone which doesn't show the problem directly) and by Eric on bare-metal Solaris 8. Installed on mainline, will backport to the 4.6 branch after testing. Rainer 2011-07-08 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/49541 * config/sol2.h (LIB_SPEC): Simplify. Move LIB_THREAD_LDFLAGS_SPEC ... (LINK_SPEC): ... here. diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -109,10 +109,8 @@ along with GCC; see the file COPYING3. #undef LIB_SPEC #define LIB_SPEC \ %{!symbolic:\ - %{pthreads|pthread: \ -LIB_THREAD_LDFLAGS_SPEC -lpthread LIB_TLS_SPEC } \ - %{fprofile-generate*: \ -LIB_THREAD_LDFLAGS_SPEC LIB_TLS_SPEC } \ + %{pthreads|pthread:-lpthread} \ + %{pthreads|pthread|fprofile-generate*: LIB_TLS_SPEC } \ %{p|pg:-ldl} -lc} #ifndef CROSS_DIRECTORY_STRUCTURE @@ -175,6 +173,7 @@ along with GCC; see the file COPYING3. %{static:-dn -Bstatic} \ %{shared:-G -dy %{!mimpure-text:-z text}} \ %{symbolic:-Bsymbolic -G -dy -z text} \ + %{pthreads|pthread|fprofile-generate*: LIB_THREAD_LDFLAGS_SPEC } \ %(link_arch) \ %{Qy:} %{!Qn:-Qy} -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] bash vs. dash: Avoid unportable shell feature in gcc/configure.ac
Ok. Paolo On Wed, Jul 13, 2011 at 19:17, Thomas Schwinge tho...@schwinge.name wrote: Hallo! On Wed, 13 Jul 2011 18:23:50 +0200, Paolo Bonzini bonz...@gnu.org wrote: On 07/13/2011 06:13 PM, Thomas Schwinge wrote: Alternatively, gcc_GAS_CHECK_FEATURE could be changed to emit the temporary file by using a shell here-doc, which is what AC_TRY_COMPILE is doing, for example. Change instead echo ifelse(...) conftest.s to AS_ECHO([m4_if(...)]) conftest.s in gcc_GAS_CHECK_FEATURE. Ah, even better. gcc/ * acinclude.m4 (gcc_GAS_CHECK_FEATURE): Use AS_ECHO instead of echo. * configure: Regenerate. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index ff38682..f092925 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -583,7 +583,7 @@ AC_CACHE_CHECK([assembler for $1], [$2], if test $in_tree_gas = yes; then gcc_GAS_VERSION_GTE_IFELSE($3, [[$2]=yes]) el])if test x$gcc_cv_as != x; then - echo ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]') conftest.s + AS_ECHO([ifelse(m4_substr([$5],0,1),[$], [$5], '[$5]')]) conftest.s if AC_TRY_COMMAND([$gcc_cv_as $gcc_cv_as_flags $4 -o conftest.o conftest.s AS_MESSAGE_LOG_FD]) then ifelse([$6],, [$2]=yes, [$6]) The configure differences are strictly s%echo%$as_echo%. Grüße, Thomas
Re: [build] Move crtfastmath to toplevel libgcc
On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Richard Henderson r...@redhat.com writes: On 07/13/2011 09:57 AM, Rainer Orth wrote: Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? Probably. Ok. I'll will take this on me to get us out of this mess. It has survived i386-pc-solaris2.11, sparc-sun-solaris2.11, x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the risk seems acceptable. +# -frandom-seed is necessary to keep the mangled name of the constructor on +# Tru64 Unix stable, but harmless otherwise. Instead of implying permanent stability, I'd mention bootstrap comparison failures specifically. Ok, will do. I think your patch caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739 -- H.J.
[commit, spu] Fix regression (Re: [PR debug/47590] rework md option overriding to delay var-tracking)
Alexandre Oliva wrote: * config/spu/spu.c (spu_flag_var_tracking): Drop. (TARGET_DELAY_VARTRACK): Define. (spu_var_tracking): New. (spu_machine_dependent_reorg): Call it. (asm_file_start): Don't save and override flag_var_tracking. This change caused crashes under certain circumstances. The problem is that spu_var_tracking calls df_analyze, which assumes the df framework has been keeping up to date on instructions. In particular, it assumes that df_scan_insn was called for each insn that was generated in the meantime. Normally, this is not a problem, because the emit_insn family itself calls df_scan_insn. However, this works only as long as the assignment of insns to basic blocks is valid. In machine-dependent reorg, this is no longer true. To fix this, the current place in spu_machine_dependent_reorg that used to call df_analyze made sure to re-install the basic-block mappings by calling compute_bb_for_insn before. The new location where your patch has added a call to df_analyze, however, is out of the scope of that existing compute_bb_for_insn call, causing those problems. Fixed by adding another compute_bb_for_insn/free_bb_for_insn pair to cover the new call site as well. Also, asm_file_start now no longer does anything interesting, and can just be removed in favor of the default implementation. Tested on spu-elf, committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.c (TARGET_ASM_FILE_START): Do not define. (asm_file_start): Remove. (spu_machine_dependent_reorg): Call compute_bb_for_insn and free_bb_for_insn around code that modifies insns before restarting df analysis. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 176209) --- gcc/config/spu/spu.c(working copy) *** static enum machine_mode spu_addr_space_ *** 224,230 static bool spu_addr_space_subset_p (addr_space_t, addr_space_t); static rtx spu_addr_space_convert (rtx, tree, tree); static int spu_sms_res_mii (struct ddg *g); - static void asm_file_start (void); static unsigned int spu_section_type_flags (tree, const char *, int); static section *spu_select_section (tree, int, unsigned HOST_WIDE_INT); static void spu_unique_section (tree, int); --- 224,229 *** static void spu_setup_incoming_varargs ( *** 462,470 #undef TARGET_SCHED_SMS_RES_MII #define TARGET_SCHED_SMS_RES_MII spu_sms_res_mii - #undef TARGET_ASM_FILE_START - #define TARGET_ASM_FILE_START asm_file_start - #undef TARGET_SECTION_TYPE_FLAGS #define TARGET_SECTION_TYPE_FLAGS spu_section_type_flags --- 461,466 *** spu_machine_dependent_reorg (void) *** 2703,2711 --- 2699,2709 { /* We still do it for unoptimized code because an external function might have hinted a call or return. */ + compute_bb_for_insn (); insert_hbrp (); pad_bb (); spu_var_tracking (); + free_bb_for_insn (); return; } *** spu_libgcc_shift_count_mode (void) *** 7039,7052 return SImode; } - /* An early place to adjust some flags after GCC has finished processing - * them. */ - static void - asm_file_start (void) - { - default_file_start (); - } - /* Implement targetm.section_type_flags. */ static unsigned int spu_section_type_flags (tree decl, const char *name, int reloc) --- 7037,7042 -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[commit, spu] Support clrsb
Hello, several builtin-bitops-1.c tests have been failing recently on SPU since the new clrsb builtin is not supported. This patch fixes this by: - installing the libgcc __clrsbdi2 routine into optabs (which doesn't happen automatically on SPU since word_mode is TImode) - providing an in-line expander for SImode clrsb. Tested on spu-elf, committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.c (spu_init_libfuncs): Install __clrsbdi2. * config/spu/spu.md (clrsbmode2): New expander. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 176247) --- gcc/config/spu/spu.c(working copy) *** spu_init_libfuncs (void) *** 5630,5635 --- 5630,5636 set_optab_libfunc (ffs_optab, DImode, __ffsdi2); set_optab_libfunc (clz_optab, DImode, __clzdi2); set_optab_libfunc (ctz_optab, DImode, __ctzdi2); + set_optab_libfunc (clrsb_optab, DImode, __clrsbdi2); set_optab_libfunc (popcount_optab, DImode, __popcountdi2); set_optab_libfunc (parity_optab, DImode, __paritydi2); Index: gcc/config/spu/spu.md === *** gcc/config/spu/spu.md (revision 176209) --- gcc/config/spu/spu.md (working copy) *** *** 2232,2237 --- 2232,2252 operands[5] = spu_const(MODEmode, 31); }) + (define_expand clrsbmode2 + [(set (match_dup 2) + (gt:VSI (match_operand:VSI 1 spu_reg_operand ) (match_dup 5))) +(set (match_dup 3) (not:VSI (xor:VSI (match_dup 1) (match_dup 2 +(set (match_dup 4) (clz:VSI (match_dup 3))) +(set (match_operand:VSI 0 spu_reg_operand) + (plus:VSI (match_dup 4) (match_dup 5)))] + + { + operands[2] = gen_reg_rtx (MODEmode); + operands[3] = gen_reg_rtx (MODEmode); + operands[4] = gen_reg_rtx (MODEmode); + operands[5] = spu_const(MODEmode, -1); + }) + (define_expand ffsmode2 [(set (match_dup 2) (neg:VSI (match_operand:VSI 1 spu_reg_operand ))) -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [commit, spu] Support clrsb
On 07/13/11 21:22, Ulrich Weigand wrote: Hello, several builtin-bitops-1.c tests have been failing recently on SPU since the new clrsb builtin is not supported. That's odd, it should just have picked the libgcc function rather than causing test failures. Why didn't that happen? Bernd
Re: [RFC] More compact (100x) -g3 .debug_macinfo
Jakub == Jakub Jelinek ja...@redhat.com writes: Jakub Currently .debug_macinfo is prohibitively large, because it doesn't Jakub allow for any kind of merging of duplicate debug information. Jakub This patch is an RFC for extensions that allow it to bring it down Jakub to manageable levels. I wrote a gdb patch for this. I've appended it in case you want to try it out; it is against git master. I tried it a little on an executable Jakub sent me and it seems to work fine. It is no trouble to change this patch if you change the format. It wasn't hard to write in the first place, it just bigger than it is because I moved a bunch of code into a new function. I don't think I really understood DW_MACINFO_GNU_define_opcode, so the implementation here is probably wrong. Tom 2011-07-13 Tom Tromey tro...@redhat.com * dwarf2read.c (read_indirect_string_at_offset): New function. (read_indirect_string): Use it. (dwarf_decode_macro_bytes): New function, taken from dwarf_decode_macros. Handle DW_MACINFO_GNU_*. (dwarf_decode_macros): Use it. handle DW_MACINFO_GNU_*. diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c index fde5b6a..af35f16 100644 --- a/gdb/dwarf2read.c +++ b/gdb/dwarf2read.c @@ -10182,32 +10182,32 @@ read_direct_string (bfd *abfd, gdb_byte *buf, unsigned int *bytes_read_ptr) } static char * -read_indirect_string (bfd *abfd, gdb_byte *buf, - const struct comp_unit_head *cu_header, - unsigned int *bytes_read_ptr) +read_indirect_string_at_offset (bfd *abfd, LONGEST str_offset) { - LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr); - dwarf2_read_section (dwarf2_per_objfile-objfile, dwarf2_per_objfile-str); if (dwarf2_per_objfile-str.buffer == NULL) -{ - error (_(DW_FORM_strp used without .debug_str section [in module %s]), - bfd_get_filename (abfd)); - return NULL; -} +error (_(DW_FORM_strp used without .debug_str section [in module %s]), + bfd_get_filename (abfd)); if (str_offset = dwarf2_per_objfile-str.size) -{ - error (_(DW_FORM_strp pointing outside of - .debug_str section [in module %s]), -bfd_get_filename (abfd)); - return NULL; -} +error (_(DW_FORM_strp pointing outside of +.debug_str section [in module %s]), + bfd_get_filename (abfd)); gdb_assert (HOST_CHAR_BIT == 8); if (dwarf2_per_objfile-str.buffer[str_offset] == '\0') return NULL; return (char *) (dwarf2_per_objfile-str.buffer + str_offset); } +static char * +read_indirect_string (bfd *abfd, gdb_byte *buf, + const struct comp_unit_head *cu_header, + unsigned int *bytes_read_ptr) +{ + LONGEST str_offset = read_offset (abfd, buf, cu_header, bytes_read_ptr); + + return read_indirect_string_at_offset (abfd, str_offset); +} + static unsigned long read_unsigned_leb128 (bfd *abfd, gdb_byte *buf, unsigned int *bytes_read_ptr) { @@ -14576,116 +14576,14 @@ parse_macro_definition (struct macro_source_file *file, int line, static void -dwarf_decode_macros (struct line_header *lh, unsigned int offset, - char *comp_dir, bfd *abfd, - struct dwarf2_cu *cu) +dwarf_decode_macro_bytes (bfd *abfd, gdb_byte *mac_ptr, gdb_byte *mac_end, + struct macro_source_file *current_file, + struct line_header *lh, char *comp_dir, + struct dwarf2_cu *cu) { - gdb_byte *mac_ptr, *mac_end; - struct macro_source_file *current_file = 0; enum dwarf_macinfo_record_type macinfo_type; int at_commandline; - dwarf2_read_section (dwarf2_per_objfile-objfile, - dwarf2_per_objfile-macinfo); - if (dwarf2_per_objfile-macinfo.buffer == NULL) -{ - complaint (symfile_complaints, _(missing .debug_macinfo section)); - return; -} - - /* First pass: Find the name of the base filename. - This filename is needed in order to process all macros whose definition - (or undefinition) comes from the command line. These macros are defined - before the first DW_MACINFO_start_file entry, and yet still need to be - associated to the base file. - - To determine the base file name, we scan the macro definitions until we - reach the first DW_MACINFO_start_file entry. We then initialize - CURRENT_FILE accordingly so that any macro definition found before the - first DW_MACINFO_start_file can still be associated to the base file. */ - - mac_ptr = dwarf2_per_objfile-macinfo.buffer + offset; - mac_end = dwarf2_per_objfile-macinfo.buffer -+ dwarf2_per_objfile-macinfo.size; - - do -{ - /* Do we at least have room for a macinfo type byte? */ - if (mac_ptr = mac_end) -{ - /* Complaint is printed during the second pass as GDB will probably -stop the first pass earlier
Re: [commit, spu] Support clrsb
Bernd Schmidt wrote: On 07/13/11 21:22, Ulrich Weigand wrote: several builtin-bitops-1.c tests have been failing recently on SPU since the new clrsb builtin is not supported. That's odd, it should just have picked the libgcc function rather than causing test failures. Why didn't that happen? That's the usual word_mode == TImode problem on SPU. By default, only libgcc functions for word_mode and up are installed into the optabs libfunc table. This means that on SPU, the default behaviour of GCC is to call __clrsbti2, which of course does not exist in libgcc ... This means that on SPU, all SImode/DImode libgcc routines that should be called need to be installed into optabs specifically by the back-end. That's what my patch does for __clrsbdi2. (For __clrsbsi2, I'm just providing an in-line expander instead, no need to call a libfunc.) Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com