date:20111006

On 5 October 2011 20:06, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 If vect_recog_func fails (or the other spot where vect_pattern_recog_1
 returns early), the vector allocated in the function isn't freed, leading
 to memory leak.  But, more importantly, doing a VEC_alloc + VEC_free
 num_stmts_in_loop * NUM_PATTERNS times seems to be completely unnecessary,
 the following patch allocates just one vector for that purpose in the caller
 and only performs VEC_truncate in each call to make it independent from
 previous uses of the vector.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Ira


 2011-10-05  Jakub Jelinek  ja...@redhat.com

        * tree-vect-patterns.c (vect_pattern_recog_1): Add stmts_to_replace
        argument, truncate it at the beginning instead of allocating there
        and freeing at the end.
        (vect_pattern_recog): Allocate stmts_to_replace here and free at end,
        pass its address to vect_pattern_recog_1.

 --- gcc/tree-vect-patterns.c.jj 2011-09-26 14:06:52.0 +0200
 +++ gcc/tree-vect-patterns.c    2011-10-05 15:57:38.0 +0200
 @@ -1281,7 +1281,8 @@ vect_mark_pattern_stmts (gimple orig_stm
  static void
  vect_pattern_recog_1 (
        gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
 -       gimple_stmt_iterator si)
 +       gimple_stmt_iterator si,
 +       VEC (gimple, heap) **stmts_to_replace)
  {
   gimple stmt = gsi_stmt (si), pattern_stmt;
   stmt_vec_info stmt_info;
 @@ -1291,14 +1292,14 @@ vect_pattern_recog_1 (
   enum tree_code code;
   int i;
   gimple next;
 -  VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);

 -  VEC_quick_push (gimple, stmts_to_replace, stmt);
 -  pattern_stmt = (* vect_recog_func) (stmts_to_replace, type_in, 
 type_out);
 +  VEC_truncate (gimple, *stmts_to_replace, 0);
 +  VEC_quick_push (gimple, *stmts_to_replace, stmt);
 +  pattern_stmt = (* vect_recog_func) (stmts_to_replace, type_in, type_out);
   if (!pattern_stmt)
     return;

 -  stmt = VEC_last (gimple, stmts_to_replace);
 +  stmt = VEC_last (gimple, *stmts_to_replace);
   stmt_info = vinfo_for_stmt (stmt);
   loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);

 @@ -1363,8 +1364,8 @@ vect_pattern_recog_1 (
   /* It is possible that additional pattern stmts are created and inserted in
      STMTS_TO_REPLACE.  We create a stmt_info for each of them, and mark the
      relevant statements.  */
 -  for (i = 0; VEC_iterate (gimple, stmts_to_replace, i, stmt)
 -               (unsigned) i  (VEC_length (gimple, stmts_to_replace) - 1);
 +  for (i = 0; VEC_iterate (gimple, *stmts_to_replace, i, stmt)
 +              (unsigned) i  (VEC_length (gimple, *stmts_to_replace) - 1);
        i++)
     {
       stmt_info = vinfo_for_stmt (stmt);
 @@ -1377,8 +1378,6 @@ vect_pattern_recog_1 (

       vect_mark_pattern_stmts (stmt, pattern_stmt, NULL_TREE);
     }
 -
 -  VEC_free (gimple, heap, stmts_to_replace);
  }


 @@ -1468,6 +1467,7 @@ vect_pattern_recog (loop_vec_info loop_v
   gimple_stmt_iterator si;
   unsigned int i, j;
   gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
 +  VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);

   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, === vect_pattern_recog ===);
 @@ -1483,8 +1483,11 @@ vect_pattern_recog (loop_vec_info loop_v
           for (j = 0; j  NUM_PATTERNS; j++)
             {
               vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
 -              vect_pattern_recog_1 (vect_recog_func_ptr, si);
 +             vect_pattern_recog_1 (vect_recog_func_ptr, si,
 +                                   stmts_to_replace);
             }
         }
     }
 +
 +  VEC_free (gimple, heap, stmts_to_replace);
  }

        Jakub

Re: [v3] add max_size and rebind to __alloc_traits

2011-10-06 Thread Jonathan Wakely

On 6 October 2011 02:57, Paolo Carlini wrote:

 today I ran the whole testsuite in C++0x mode and I'm pretty sure that
 23_containers/vector/modifiers/swap/3.cc, which is now failing, wasn't a
 couple of days ago (I ran the whole testsuite like that in order to validate
 my std::list changes). When you have time, could you please double check?
 (maybe after all we *do* want it to fail in C++0x mode, but I'd like to
 understand if the behavior changed inadvertently!)

I think you're right it wasn't failing before, as I ran the whole
testsuite in C++0x mode when I first added alloc_traits - I'll check
it today and see how I broke it.

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread Paolo Bonzini


On 10/05/2011 10:16 PM, William J. Schmidt wrote:

OK, I see.  If there's a better place downstream to make a swizzle, I'm
certainly fine with that.

I disabled locally_poor_mem_replacement and added some dump information
in should_replace_address to show the costs for the replacement I'm
trying to avoid:

In should_replace_address:
   old_rtx = (reg/f:DI 125 [ D.2036 ])
   new_rtx = (plus:DI (reg/v/f:DI 126 [ p ])
 (reg:DI 128))
   address_cost (old_rtx) = 0
   address_cost (new_rtx) = 0
   set_src_cost (old_rtx) = 0
   set_src_cost (new_rtx) = 4

In insn 11, replacing
  (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)-a S4 A32])
  with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ])
 (reg:DI 128)) [2 p_1(D)-a S4 A32])
Changed insn 11
deferring rescan insn with uid = 11.
deferring rescan insn with uid = 11.


And IIUC the other address is based on pseudo 125 as well, but the 
combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
cannot be represented on ppc.  I think _this_ is the problem, so I'm 
afraid your patch could cause pessimizations on x86 for example.  On 
x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
to propagate pseudo 125 so that you can delete the set altogether.


However, indeed there is no downstream pass that undoes the 
transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
all. :)  The attached untested (uncompiled) patch is an attempt.


Paolo
Index: cse.c
===
--- cse.c	(revision 177688)
+++ cse.c	(working copy)
@@ -3136,6 +3136,75 @@ find_comparison_args (enum rtx_code code
   return code;
 }
 
+static rtx
+lookup_addr (rtx insn, rtx *loc, enum machine_mode mode)
+{
+  struct table_elt *elt, *p;
+  int regno;
+  int hash;
+  int base_cost;
+  rtx addr = *loc;
+  rtx exp;
+
+  /* Try to reuse existing registers for addresses, in hope of shortening
+ live ranges for the registers that compose the addresses.  This happens
+ when you have
+
+ (set (reg C) (plus (reg A) (reg B))
+ (set (reg D) (mem (reg C)))
+ (set (reg E) (mem (plus (reg C) (const_int X
+
+ In this case fwprop will try to propagate into the addresses, but
+ if propagation into reg E fails, the only result will have been to
+ uselessly lengthen the live range of A and B.  */
+
+  if (!REG_P (addr))
+return;
+
+  regno = REGNO (addr);
+  if (regno == FRAME_POINTER_REGNUM
+  || regno == HARD_FRAME_POINTER_REGNUM
+  || regno == ARG_POINTER_REGNUM)
+return;
+
+   /* If this address is not in the hash table, we can't look for equivalences
+  of the whole address.  Also, ignore if volatile.  */
+
+  {
+int save_do_not_record = do_not_record;
+int save_hash_arg_in_memory = hash_arg_in_memory;
+int addr_volatile;
+
+do_not_record = 0;
+hash = HASH (addr, Pmode);
+addr_volatile = do_not_record;
+do_not_record = save_do_not_record;
+hash_arg_in_memory = save_hash_arg_in_memory;
+
+if (addr_volatile)
+  return;
+  }
+
+  /* Try to find a REG that holds the same address.  */
+
+  elt = lookup (addr, hash, Pmode);
+  if (!elt)
+return;
+
+  base_cost = address_cost (*loc, mode);
+  for (p = elt-first_same_value; p; p = p-next_same_value)
+{
+  exp = p-exp;
+  if (REG_P (exp)
+   exp_equiv_p (exp, exp, 1, false)
+   address_cost (exp, mode)  base_cost)
+break;
+}
+
+  if (p)
+validate_change (insn, loc, canon_reg (copy_rtx (exp), NULL_RTX), 0));
+}
+
 /* If X is a nontrivial arithmetic operation on an argument for which
a constant value can be determined, return the result of operating
on that value, as a constant.  Otherwise, return X, possibly with
@@ -3180,6 +3249,12 @@ fold_rtx (rtx x, rtx insn)
   switch (code)
 {
 case MEM:
+  if ((new_rtx = equiv_constant (x)) != NULL_RTX)
+return new_rtx;
+  if (insn)
+lookup_addr (insn, XEXP (x, 0), GET_MODE (x));
+  return x;
+
 case SUBREG:
   if ((new_rtx = equiv_constant (x)) != NULL_RTX)
 return new_rtx;
Index: passes.c
===
--- passes.c	(revision 177688)
+++ passes.c	(working copy)
@@ -1448,9 +1448,9 @@ init_optimization_passes (void)
 	}
   NEXT_PASS (pass_web);
   NEXT_PASS (pass_rtl_cprop);
+  NEXT_PASS (pass_rtl_fwprop_addr);
   NEXT_PASS (pass_cse2);
   NEXT_PASS (pass_rtl_dse1);
-  NEXT_PASS (pass_rtl_fwprop_addr);
   NEXT_PASS (pass_inc_dec);
   NEXT_PASS (pass_initialize_regs);
   NEXT_PASS (pass_ud_rtl_dce);

[PATCH] Fix PR38884


This handles the case of CSEing part of an SSA name that is stored
to memory and defined with a composition like COMPLEX_EXPR or
CONSTRUCTOR.  This fixes the remaining pieces of PR38884 and
PR38885.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  rguent...@suse.de

PR tree-optimization/38884
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial
reads from aggregate SSA names.

* gcc.dg/tree-ssa/ssa-fre-34.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-35.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 179556)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1489,1495 
}
  }
  
!   /* 4) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
gimple_assign_single_p (def_stmt)
--- 1489,1554 
}
  }
  
!   /* 4) Assignment from an SSA name which definition we may be able
!  to access pieces from.  */
!   else if (ref-size == maxsize
!   is_gimple_reg_type (vr-type)
!   gimple_assign_single_p (def_stmt)
!   TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME)
! {
!   tree rhs1 = gimple_assign_rhs1 (def_stmt);
!   gimple def_stmt2 = SSA_NAME_DEF_STMT (rhs1);
!   if (is_gimple_assign (def_stmt2)
!  (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR
! || gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR)
!  types_compatible_p (vr-type, TREE_TYPE (TREE_TYPE (rhs1
!   {
! tree base2;
! HOST_WIDE_INT offset2, size2, maxsize2, off;
! base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
!  offset2, size2, maxsize2);
! off = offset - offset2;
! if (maxsize2 != -1
!  maxsize2 == size2
!  operand_equal_p (base, base2, 0)
!  offset2 = offset
!  offset2 + size2 = offset + maxsize)
!   {
! tree val = NULL_TREE;
! HOST_WIDE_INT elsz
!   = TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (rhs1;
! if (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR)
!   {
! if (off == 0)
!   val = gimple_assign_rhs1 (def_stmt2);
! else if (off == elsz)
!   val = gimple_assign_rhs2 (def_stmt2);
!   }
! else if (gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR
!   off % elsz == 0)
!   {
! tree ctor = gimple_assign_rhs1 (def_stmt2);
! unsigned i = off / elsz;
! if (i  CONSTRUCTOR_NELTS (ctor))
!   {
! constructor_elt *elt = CONSTRUCTOR_ELT (ctor, i);
! if (compare_tree_int (elt-index, i) == 0)
!   val = elt-value;
!   }
!   }
! if (val)
!   {
! unsigned int value_id = get_or_alloc_constant_value_id (val);
! return vn_reference_insert_pieces
!  (vuse, vr-set, vr-type,
!   VEC_copy (vn_reference_op_s, heap, vr-operands),
!   val, value_id);
!   }
!   }
!   }
! }
! 
!   /* 5) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
gimple_assign_single_p (def_stmt)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1587,1593 
return NULL;
  }
  
!   /* 5) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
is_gimple_reg_type (vr-type)
--- 1646,1652 
return NULL;
  }
  
!   /* 6) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
is_gimple_reg_type (vr-type)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c  (revision 0)
***
*** 0 
--- 1,18 
+ /* { dg-do compile } */
+ /* { dg-options -O -fdump-tree-fre1-details } */
+ 
+ #define vector __attribute__((vector_size(16) ))
+ 
+ struct {
+ float i;
+ vector float global_res;
+ } s;
+ float foo(float f)
+ {
+   vector float res = (vector float){0.0f,f,0.0f,1.0f};
+   s.global_res = res;
+   return *((float*)s.global_res + 1);
+ }
+ 
+ /* { dg-final { scan-tree-dump Replaced BIT_FIELD_REF.*with

Re: Modify gcc for use with gdb (issue5132047)

On Wed, Oct 5, 2011 at 6:53 PM, Diego Novillo dnovi...@google.com wrote:
 On Wed, Oct 5, 2011 at 11:28, Diego Novillo dnovi...@google.com wrote:
 On Wed, Oct 5, 2011 at 10:51, Richard Guenther
 richard.guent...@gmail.com wrote:

 Did you also mark the function with always_inline?  That's a requirement
 as artificial only works for inlined function bodies.

 Yeah.  It doesn't quite work as I expect it to.  It steps into the
 function at odd places.

 So, I played with this some more with this, and there seems to be some
 inconsistency in how these attributes get handled.
 http://sourceware.org/bugzilla/show_bug.cgi?id=13263

 static inline int foo (int) __attribute__((always_inline,artificial));

 static inline int foo (int x)
 {
  int y  = x - 3;
  return y;
 }

 int bar (int y)
 {
  return y == 0;
 }

 main ()
 {
  foo (10);
  return bar (foo (3));
 }

 With GCC 4.7, the stand alone call foo(10) is not ignored by 'step'.
 However, the embedded call bar(foo(3)) is ignored as I was expecting.

Hm, nothing is ignored for me with gcc 4.6.


 Diego.

Re: Modify gcc for use with gdb (issue5132047)

On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo dnovi...@google.com wrote:
 On Wed, Oct 5, 2011 at 14:20, Mike Stump mikest...@comcast.net wrote:
 On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote:
 I think we need to find a solution for this situation.

 The solution Apple found and implemented is a __nodebug__ attribute, as can 
 be seen in Apple's gcc.

 We use it like so:

 #define __always_inline__ __always_inline__, __nodebug__
 #undef __always_inline__

 in headers like mmintrn.h:

 __STATIC_INLINE void __attribute__((__always_inline__))
 /* APPLE LOCAL end radar 5618945 */
 _mm_empty (void)
 {
  __builtin_ia32_emms ();
 }

 Ah, nice.  Though, one of the things I am liking more and more about
 the blacklist solution is that it (a) does not need any modifications
 to the source code, and (b) it works with no-inline functions as well.

 This gives total control to the developer.  I would blacklist a bunch
 of functions I never care to go into, for instance.  Others may choose
 to blacklist a different set.  And you can change that from debug
 session to the next.

 I agree with Jakub that artificial functions should be blacklisted
 automatically, however.

 Richi, Jakub, if the blacklist solution was implemented in GCC would
 you agree with promoting these macros into inline functions?  This is
 orthogonal to http://sourceware.org/bugzilla/show_bug.cgi?id=13263, of
 course.

I know you are on to that C++ thing and ending up returning a reference
to make it an lvalue.  Which I very much don't like (please, if you go
that route add _set functions and lower the case of the macros).

What's the other advantage of using inline functions?  The gdb
annoyance with the macros can be solved with the .gdbinit macro
defines (which might be nice to commit to SVN btw).

Richard.


 Thanks.  Diego.

Re: [patch, arm] Fix PR target/50305 (arm_legitimize_reload_address problem)

2011-10-06 Thread Ramana Radhakrishnan

On 4 October 2011 16:13, Ulrich Weigand uweig...@de.ibm.com wrote:
 Ramana Radhakrishnan wrote:
 On 26 September 2011 15:24, Ulrich Weigand uweig...@de.ibm.com wrote:
  Is this sufficient, or should I test any other set of options as well?

 Could you run one set of tests with neon ?

 Sorry for the delay, but I had to switch to my IGEP board for Neon
 support, and that's a bit slow ...   In any case, I've now completed
 testing the patch with Neon with no regressions.

  Just to clarify: in the presence of the other options that are already
  in dg-options, the test case now fails (with the unpatched compiler)
  for *any* setting of -mfloat-abi (hard, soft, or softfp).  Do you still
  want me to add a specific setting to the test case?

 No the mfpu=vfpv3 is fine.

 OK, thanks.

 Instead of skipping I was wondering if we
 could prune the outputs to get this through all the testers we have.

 Well, the problem is that with certain -march options (e.g. armv7) we get:
 /home/uweigand/gcc-head/gcc/testsuite/gcc.target/arm/pr50305.c:1:0:
 error: target CPU does not support ARM mode

Ah - ok.


 Since this is an *error*, pruning the output doesn't really help, the
 test isn't being run in any case.

 Otherwise this is OK.

 Given the above, is the patch now OK as-is?

OK by me.

Ramana

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hello,

Sorry attached non-updated change.  Here with proper attached patch.
This patch improves in fold_truth_andor the generation of branch-conditions for 
targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side operation 
of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't 
trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand 
operand is a simple operand, and has no side-effects.

ChangeLog

2011-10-06  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
to TRUTH_OR_EXPR, if suitable.

Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai


ndex: fold-const.c
===
--- fold-const.c(revision 179592)
+++ fold-const.c(working copy)
@@ -8387,6 +8387,33 @@
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
 return tem;

+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+   !TREE_SIDE_EFFECTS (arg1)
+   simple_operand_p (arg1)
+   LOGICAL_OP_NON_SHORT_CIRCUIT
+   !FLOAT_TYPE_P (TREE_TYPE (arg1))
+   ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
+   TREE_CODE (arg1) != TRUTH_NOT_EXPR)
+ || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
+{
+  if (TREE_CODE (arg0) == code
+   !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+   simple_operand_p (TREE_OPERAND (arg0, 1)))
+   {
+ tem = build2_loc (loc,
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+ : TRUTH_OR_EXPR),
+   type, TREE_OPERAND (arg0, 1), arg1);
+   return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
+  }
+  if (!TREE_SIDE_EFFECTS (arg0)
+   simple_operand_p (arg0))
+   return build2_loc (loc,
+  (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+: TRUTH_OR_EXPR),
+  type, arg0, arg1);
+}
+
   return NULL_TREE;
 }

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 Sorry attached non-updated change.  Here with proper attached patch.
 This patch improves in fold_truth_andor the generation of branch-conditions 
 for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side 
 operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, 
 and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if 
 left-hand operand is a simple operand, and has no side-effects.

 ChangeLog

 2011-10-06  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
        to TRUTH_OR_EXPR, if suitable.

 Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
 x86_64-unknown-linux-gnu.  Ok for apply?

 Regards,
 Kai


 ndex: fold-const.c
 ===
 --- fold-const.c        (revision 179592)
 +++ fold-const.c        (working copy)
 @@ -8387,6 +8387,33 @@
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
     return tem;

 +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
 +       !TREE_SIDE_EFFECTS (arg1)
 +       simple_operand_p (arg1)
 +       LOGICAL_OP_NON_SHORT_CIRCUIT

Why only for LOGICAL_OP_NON_SHORT_CIRCUIT?  It doesn't make
a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ...

 +       !FLOAT_TYPE_P (TREE_TYPE (arg1))

?  I hope we don't have || float.

 +       ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
 +           TREE_CODE (arg1) != TRUTH_NOT_EXPR)
 +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)

?  simple_operand_p would have rejected both ! and comparisons.

I miss a test for side-effects on arg0 (and probably simple_operand_p there,
as well).

 +    {
 +      if (TREE_CODE (arg0) == code
 +           !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
 +           simple_operand_p (TREE_OPERAND (arg0, 1)))

Err ... so why do you recurse here (and associate)?  Even with different
predicates than above ...

And similar transforms seem to happen in fold_truthop - did you
investigate why it didn't trigger there.

And I'm missing a testcase.

Richard.

 +       {
 +         tem = build2_loc (loc,
 +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                     : TRUTH_OR_EXPR),
 +                           type, TREE_OPERAND (arg0, 1), arg1);
 +       return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
 +      }
 +      if (!TREE_SIDE_EFFECTS (arg0)
 +           simple_operand_p (arg0))
 +       return build2_loc (loc,
 +                          (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                    : TRUTH_OR_EXPR),
 +                          type, arg0, arg1);
 +    }
 +
   return NULL_TREE;
  }

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries tom_devr...@mentor.com wrote:
On 10/05/2011 10:46 AM, Richard Guenther wrote:
On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries tom_devr...@mentor.com wrote:
On 10/04/2011 03:03 PM, Richard Guenther wrote:
On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries tom_devr...@mentor.com
wrote:
On 10/01/2011 05:46 PM, Tom de Vries wrote:
On 09/30/2011 03:29 PM, Richard Guenther wrote:
On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries tom_devr...@mentor.com
wrote:
On 09/28/2011 11:53 AM, Richard Guenther wrote:
On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries
tom_devr...@mentor.com wrote:
Richard,

I got a patch for PR50527.

The patch prevents the alignment of vla-related allocas to be set to
BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after
folding
the alloca.

Bootstrapped and regtested on x86_64.

OK for trunk?

Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that
the vectorizer then will no longer see that the arrays are properly
aligned.

I'm not sure what the best thing to do is here, other than trying to
record
the alignment requirement of the VLA somewhere.

Forcing the alignment of the alloca replacement decl to
BIGGEST_ALIGNMENT
has the issue that it will force stack-realignment which isn't free
(and the
point was to make the decl cheaper than the alloca). But that might
possibly be the better choice.

Any other thoughts?

How about the approach in this (untested) patch? Using the DECL_ALIGN
of the vla
for the new array prevents stack realignment for folded vla-allocas,
also for
large vlas.

This will not help in vectorizing large folded vla-allocas, but I
think it's not
reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although
that has
been the case up until we started to fold). If you want to trigger
vectorization
for a vla, you can still use the aligned attribute on the declaration.

Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also
without using
an attribute on the decl. This patch exploits this by setting it at
the end of
the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective
in
propagation though, because although the ptr_info of the lhs is
propagated via
copy_prop afterwards, it's not propagated anymore via ccp.

Another way to do this would be to set BIGGEST_ALIGNMENT at the end of
ccp2 and
not fold during ccp3.

Ugh, somehow I like this the least ;)

How about lowering VLAs to

p = __builtin_alloca (...);
p = __builtin_assume_aligned (p, DECL_ALIGN (vla));

and not assume anything for alloca itself if it feeds a
__builtin_assume_aligned?

Or rather introduce a __builtin_alloca_with_align () and for VLAs do

p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));

that's less awkward to use?

Sorry for not having a clear plan here ;)

Using assume_aligned is a more orthogonal way to represent this in
gimple, but
indeed harder to use.

Another possibility is to add a 'tree vla_decl' field to struct
gimple_statement_call, which is probably the easiest to implement.

But I think __builtin_alloca_with_align might have a use beyond vlas, so
I
decided to try this one. Attached patch implements my first stab at this
(now
testing on x86_64).

Is this an acceptable approach?

bootstrapped and reg-tested (including ada) on x86_64.

Ok for trunk?

The idea is ok I think. But

case BUILT_IN_ALLOCA:
+ case BUILT_IN_ALLOCA_WITH_ALIGN:
/* If the allocation stems from the declaration of a variable-sized
object, it cannot accumulate. */
target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
if (target)
return target;
+ if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+ == BUILT_IN_ALLOCA_WITH_ALIGN)
+ {
+ tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
+
built_in_decls[BUILT_IN_ALLOCA],
+ 1, CALL_EXPR_ARG (exp, 0));
+ CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
+ exp = new_call;
+ }

Ick. Why can't the rest of the compiler not handle
BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
(thus, arrange things so the assembler name of alloca-with-align is
alloca?)

We can set the assembler name in the local_define_builtin call. But that
will
still create a call alloca (12, 4). How do we deal with the second argument?

I don't see why you still need the special late CCP pass.

For alloca_with_align, the align will minimally be the 2nd argument. This is
independent of folding, and we can propagate this information in every ccp.

If the alloca_with_align is not folded and will not be folded anymore
(something
we know at the earliest after the propagation phase of the last ccp),
the alignment of BIGGEST_ALIGNMENT is guaranteed, because we

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011/10/6 Richard Guenther richard.guent...@gmail.com:
 On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 Sorry attached non-updated change.  Here with proper attached patch.
 This patch improves in fold_truth_andor the generation of branch-conditions 
 for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side 
 operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, 
 and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, 
 if left-hand operand is a simple operand, and has no side-effects.

 ChangeLog

 2011-10-06  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
        to TRUTH_OR_EXPR, if suitable.

 Bootstrapped and tested for all languages (including Ada and Obj-C++) on 
 host x86_64-unknown-linux-gnu.  Ok for apply?

 Regards,
 Kai


 ndex: fold-const.c
 ===
 --- fold-const.c        (revision 179592)
 +++ fold-const.c        (working copy)
 @@ -8387,6 +8387,33 @@
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
     return tem;

 +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
 +       !TREE_SIDE_EFFECTS (arg1)
 +       simple_operand_p (arg1)
 +       LOGICAL_OP_NON_SHORT_CIRCUIT

 Why only for LOGICAL_OP_NON_SHORT_CIRCUIT?  It doesn't make
 a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ...
Well, I used this check only for not doing this transformation for
targets, which have low-cost branches.  This is the same thing as in
fold_truthop.  It does this transformation only if
LOGICAL_OP_NON_SHORT_CIRCUIT is true.
 +       !FLOAT_TYPE_P (TREE_TYPE (arg1))

 ?  I hope we don't have || float.
This can happen.  Operands of TRUTH_AND|OR(IF)_EXPR aren't necessarily
of integral type.  After expansion in gimplifier, we have for sure
comparisons, but not in c-tree.

 +       ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
 +           TREE_CODE (arg1) != TRUTH_NOT_EXPR)
 +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)

 ?  simple_operand_p would have rejected both ! and comparisons.
This check is the same as in fold_truthop.  I used this check.  The
point here is that floats might trap.

 I miss a test for side-effects on arg0 (and probably simple_operand_p there,
 as well).
See inner of if condition for those checks.  I moved those checks for
arg1 out of the inner conditions to avoid double-checking.

 +    {
 +      if (TREE_CODE (arg0) == code
 +           !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
 +           simple_operand_p (TREE_OPERAND (arg0, 1)))

 Err ... so why do you recurse here (and associate)?  Even with different
 predicates than above ...

See, here is the missing check. Point is that even if arg0 has
side-effects and is a (AND|OR)IF expression, we might be able to
associate with right-hand argument of arg0, if for it no side-effects
are existing.  Otherwise we wouldn't catch this case.
We have here in maximum a recursion level of one.

 And similar transforms seem to happen in fold_truthop - did you
 investigate why it didn't trigger there.

This is pretty simple.  The point is that only for comparisons this
transformation is done.  But in c-tree we don't have here necessarily
for TRUTH_(AND|OR)[IF]_EXPR comparison arguments, not necessarily
integral ones (see above).

 And I'm missing a testcase.

Ok, I'll add one.  Effect can be seen best after gimplification.

 Richard.

 +       {
 +         tem = build2_loc (loc,
 +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                     : TRUTH_OR_EXPR),
 +                           type, TREE_OPERAND (arg0, 1), arg1);
 +       return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), 
 tem);
 +      }
 +      if (!TREE_SIDE_EFFECTS (arg0)
 +           simple_operand_p (arg0))
 +       return build2_loc (loc,
 +                          (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                    : TRUTH_OR_EXPR),
 +                          type, arg0, arg1);
 +    }
 +
   return NULL_TREE;
  }

Regards.
Kai

Re: [PATCH] Fix PR46556 (poor address generation)

On Wed, 5 Oct 2011, William J. Schmidt wrote:

 This patch addresses the poor code generation in PR46556 for the
 following code:
 
 struct x
 {
   int a[16];
   int b[16];
   int c[16];
 };
 
 extern void foo (int, int, int);
 
 void
 f (struct x *p, unsigned int n)
 {
   foo (p-a[n], p-c[n], p-b[n]);
 }
 
 Prior to the fix for PR32698, gcc calculated the offset for accessing
 the array elements as:  n*4; 64+n*4; 128+n*4.
 
 Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
 +32)*4.  This led to poor code generation on powerpc64 targets, among
 others.
 
 The poor code generation was observed to not occur in loops, as the
 IVOPTS code does a good job of lowering these expressions to MEM_REFs.
 It was previously suggested that perhaps a general pass to lower memory
 accesses to MEM_REFs in GIMPLE would solve not only this, but other
 similar problems.  I spent some time looking into various approaches to
 this, and reviewing some previous attempts to do similar things.  In the
 end, I've concluded that this is a bad idea in practice because of the
 loss of useful aliasing information.  In particular, early lowering of
 component references causes us to lose the ability to disambiguate
 non-overlapping references in the same structure, and there is no simple
 way to carry the necessary aliasing information along with the
 replacement MEM_REFs to avoid this.  While some performance gains are
 available with GIMPLE lowering of memory accesses, there are also
 offsetting performance losses, and I suspect this would just be a
 continuous source of bug reports into the future.
 
 Therefore the current patch is a much simpler approach to solve the
 specific problem noted in the PR.  There are two pieces to the patch:
 
  * The offending addressing pattern is matched in GIMPLE and transformed
 into a restructured MEM_REF that distributes the multiply, so that (n
 +32)*4 becomes 4*n+128 as before.  This is done during the reassociation
 pass, for reasons described below.  The transformation only occurs in
 non-loop blocks, since IVOPTS does a good job on such things within
 loops.
  * A tweak is added to the RTL forward-propagator to avoid propagating
 into memory references based on a single base register with no offset,
 under certain circumstances.  This improves sharing of base registers
 for accesses within the same structure and slightly lowers register
 pressure.
 
 It would be possible to separate these into two patches if that's
 preferred.  I chose to combine them because together they provide the
 ideal code generation that the new test cases test for.
 
 I initially implemented the pattern matcher during expand, but I found
 that the expanded code for two accesses to the same structure was often
 not being CSEd well.  So I moved it back into the GIMPLE phases prior to
 DOM to take advantage of its CSE.  To avoid an additional complete pass
 over the IL, I chose to piggyback on the reassociation pass.  This
 transformation is not technically a reassociation, but it is related
 enough to not be a complete wart.
 
 One noob question about this:  It would probably be preferable to have
 this transformation only take place during the second reassociation
 pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
 there an easy way to detect that it's the second pass without having to
 generate a separate pass entry point?
 
 One other general question about the pattern-match transformation:  Is
 this an appropriate transformation for all targets, or should it be
 somehow gated on available addressing modes on the target processor?
 
 Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
 performance degradations on that target for SPEC CPU2000 and CPU2006.
 
 I'm looking for eventual approval for trunk after any comments are
 resolved.  Thanks!

People have already commented on pieces, so I'm looking only
at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
on IVOPTs instead?  The idea is to expose additional CSE
opportunities, right?  So it's sort-of a strength-reduction
optimization on scalar code (classically strength reduction
in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
That might be worth in general, even for non-address cases.
So - if you rename that thing to tree-ssa-strength-reduce.c you
can get away without piggy-backing on anything ;)  If you
structure it to detect a strength reduction opportunity
(thus, you'd need to match two/multiple of the patterns at the same time)
that would be a bonus ... generalizing it a little bit would be
another.

Now some comments on the patch ...

 Bill
 
 
 2011-10-05  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
 gcc:
   
   PR rtl-optimization/46556
   * fwprop.c (fwprop_bb_aux_d): New struct.
   (MEM_PLUS_REGS): New macro.
   (record_mem_plus_reg): New function.
   (record_mem_plus_regs): Likewise.
   (single_def_use_enter_block): Record

Re: Unreviewed libgcc patches

2011-10-06 Thread Paolo Bonzini


On 10/06/2011 12:21 PM, Rainer Orth wrote:

  Can you post an updated patch for this one?  I'll try to review the others
  as soon as possible.

Do you see a change to get the other patches reviewed before stage1
closes?  I'd like to get them into 4.7 rather than carry them forward
for several months.


Yes, I'm very sorry for the delay.

Paolo

Re: Commit: RX: Codegen bug fixes

2011-10-06 Thread Nick Clifton


Hi Richard,


The SMIN pattern has the same problem.


*sigh*  Fixed.

Cheers
  Nick

Re: Initial shrink-wrapping patch

On 10/06/11 05:17, Ian Lance Taylor wrote:
 Thinking about it I think this is the wrong approach.  The -fsplit-stack
 code by definition has to wrap the entire function and it can not modify
 any callee-saved registers.  We should do shrink wrapping before
 -fsplit-stack, not the other way around.

Sorry, I'm not following what you're saying here. Can you elaborate?


Bernd

[PATCH] Some TLC


Noticed when working on vector/complex folding and simplification.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  rguent...@suse.de

* fold-const.c (fold_ternary_loc): Also fold non-constant
vector CONSTRUCTORs.  Make more efficient.
* tree-ssa-dom.c (cprop_operand): Don't handle virtual operands.
(cprop_into_stmt): Don't propagate into virtual operands.
(optimize_stmt): Really dump original statement.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 179592)
--- gcc/fold-const.c(working copy)
*** fold_ternary_loc (location_t loc, enum t
*** 13647,13653 
  
  case BIT_FIELD_REF:
if ((TREE_CODE (arg0) == VECTOR_CST
!  || (TREE_CODE (arg0) == CONSTRUCTOR  TREE_CONSTANT (arg0)))
   type == TREE_TYPE (TREE_TYPE (arg0)))
{
  unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1);
--- 13647,13653 
  
  case BIT_FIELD_REF:
if ((TREE_CODE (arg0) == VECTOR_CST
!  || TREE_CODE (arg0) == CONSTRUCTOR)
   type == TREE_TYPE (TREE_TYPE (arg0)))
{
  unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1);
*** fold_ternary_loc (location_t loc, enum t
*** 13659,13682 
   (idx = idx / width)
  TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
{
- tree elements = NULL_TREE;
- 
  if (TREE_CODE (arg0) == VECTOR_CST)
-   elements = TREE_VECTOR_CST_ELTS (arg0);
- else
{
! unsigned HOST_WIDE_INT idx;
! tree value;
! 
! FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg0), idx, 
value)
!   elements = tree_cons (NULL_TREE, value, elements);
}
! while (idx--  0  elements)
!   elements = TREE_CHAIN (elements);
! if (elements)
!   return TREE_VALUE (elements);
! else
!   return build_zero_cst (type);
}
}
  
--- 13659,13675 
   (idx = idx / width)
  TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
{
  if (TREE_CODE (arg0) == VECTOR_CST)
{
! tree elements = TREE_VECTOR_CST_ELTS (arg0);
! while (idx--  0  elements)
!   elements = TREE_CHAIN (elements);
! if (elements)
!   return TREE_VALUE (elements);
}
! else if (idx  CONSTRUCTOR_NELTS (arg0))
!   return CONSTRUCTOR_ELT (arg0, idx)-value;
! return build_zero_cst (type);
}
}
  
Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c  (revision 179592)
--- gcc/tree-ssa-dom.c  (working copy)
*** cprop_operand (gimple stmt, use_operand_
*** 1995,2011 
val = SSA_NAME_VALUE (op);
if (val  val != op)
  {
-   /* Do not change the base variable in the virtual operand
-tables.  That would make it impossible to reconstruct
-the renamed virtual operand if we later modify this
-statement.  Also only allow the new value to be an SSA_NAME
-for propagation into virtual operands.  */
-   if (!is_gimple_reg (op)
-  (TREE_CODE (val) != SSA_NAME
- || is_gimple_reg (val)
- || get_virtual_var (val) != get_virtual_var (op)))
-   return;
- 
/* Do not replace hard register operands in asm statements.  */
if (gimple_code (stmt) == GIMPLE_ASM
   !may_propagate_copy_into_asm (op))
--- 1995,2000 
*** cprop_into_stmt (gimple stmt)
*** 2076,2086 
use_operand_p op_p;
ssa_op_iter iter;
  
!   FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_ALL_USES)
! {
!   if (TREE_CODE (USE_FROM_PTR (op_p)) == SSA_NAME)
!   cprop_operand (stmt, op_p);
! }
  }
  
  /* Optimize the statement pointed to by iterator SI.
--- 2065,2072 
use_operand_p op_p;
ssa_op_iter iter;
  
!   FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE)
! cprop_operand (stmt, op_p);
  }
  
  /* Optimize the statement pointed to by iterator SI.
*** optimize_stmt (basic_block bb, gimple_st
*** 2107,2124 
  
old_stmt = stmt = gsi_stmt (si);
  
-   if (gimple_code (stmt) == GIMPLE_COND)
- canonicalize_comparison (stmt);
- 
-   update_stmt_if_modified (stmt);
-   opt_stats.num_stmts++;
- 
if (dump_file  (dump_flags  TDF_DETAILS))
  {
fprintf (dump_file, Optimizing statement );
print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
  }
  
/* Const/copy propagate into USES, VUSES and the RHS of VDEFs.  */
cprop_into_stmt (stmt);
  
--- 2093,2110 
  
old_stmt = stmt = gsi_stmt

Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Gabriel Dos Reis

On Tue, Oct 4, 2011 at 5:46 AM, Pedro Alves pe...@codesourcery.com wrote:
 On Tuesday 04 October 2011 11:16:30, Gabriel Dos Reis wrote:

  Do we need to consider ABIs that have calling conventions that
  treat unprototyped and varargs functions differently? (is there any?)

 Could you elaborate on the equivalence of these declarations?

 I expected that with:

  extern void foo();
  extern void bar(...);
  foo (1, 2, 0.3f, NULL, 5);
  bar (1, 2, 0.3f, NULL, 5);

 the compiler would emit the same for both of those
 calls (calling convention wise).  That is, for example,
 on x86-64, %rax is set to 1 (number of floating point
 parameters passed to the function in SSE registers) in
 both cases.

Except that variadics use a different kind of calling convention
than the rest.


 But not to be equivalent at the source level, that is:

  extern void foo();
  extern void foo(int a);
  extern void bar(...);
  extern void bar(int a);

 should be a conflicting types for ’bar’ error in C.

 --
 Pedro Alves

Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay

Artem Shinkarov schrieb:
 Hi, Richard
 
 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.
 
 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.
 
 Thanks,
 Artem.
 
 On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote:
 On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
 Hi, can anyone commit it please?

 Richard?
 Or may be Richard?
 Committed.

 r~

 Hi, Richard
 
 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.
 
 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.
 
 
 Thanks,
 Artem.
 

The following test cases cause FAILs because main cannot be found by the linker
 because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.

 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c

 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c

The following patch avoids __SIZEOF_INT__.

Ok by some maintainer to commit?

Johann

testsuite/
* lib/target-supports.exp (check_effective_target_int32): New
function.
* gcc.c-torture/execute/vect-shuffle-1.c: Don't use
__SIZEOF_INT__.
* gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
* gcc.c-torture/execute/vect-shuffle-1.x: New file.
* gcc.c-torture/execute/vect-shuffle-5.x: New file.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp	(revision 179599)
+++ lib/target-supports.exp	(working copy)
@@ -1583,6 +1583,15 @@ proc check_effective_target_int16 { } {
 }]
 }
 
+# Returns 1 if we're generating 32-bit integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_int32 { } {
+return [check_no_compiler_messages int32 object {
+	int dummy[sizeof (int) == 4 ? 1 : -1];
+}]
+}
+
 # Return 1 if we're generating 64-bit code using default options, 0
 # otherwise.
 
Index: gcc.c-torture/execute/vect-shuffle-1.c
===
--- gcc.c-torture/execute/vect-shuffle-1.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-1.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -64,5 +63,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-1.x
===
--- gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;
Index: gcc.c-torture/execute/vect-shuffle-5.c
===
--- gcc.c-torture/execute/vect-shuffle-5.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-5.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -60,5 +59,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-5.x
===
--- gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;

Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Gabriel Dos Reis

On Tue, Oct 4, 2011 at 1:24 PM, Douglas Rupp r...@gnat.com wrote:
 On 10/3/2011 8:35 AM, Gabriel Dos Reis wrote:

 unnamed variadic functions sounds as if the function itself is
 unnamed, so not good.


 -funnamed-variadic-parameter

 How about
 -fvariadic-parameters-unnamed

 there's already a -fvariadic-macros, so maybe putting variadic first is more
 consistent?

consistent with what?
Consistency would imply -fvariadic-functions.  But that does not make
much sense since variadic functions already exist in C.

-fvariadic-parameters-unnamed sounds as if the function could have
several variadic parameters, but that is what is being proposed.

Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay

Richard Guenther schrieb:
 On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay a...@gjlay.de wrote:
 Artem Shinkarov schrieb:
 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.

 Thanks,
 Artem.

 On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote:
 On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
 Hi, can anyone commit it please?

 Richard?
 Or may be Richard?
 Committed.

 r~

 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.


 Thanks,
 Artem.

 The following test cases cause FAILs because main cannot be found by the 
 linker
  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
 file.

 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
 The following patch avoids __SIZEOF_INT__.

 Ok by some maintainer to commit?
 
 On a general note, if you need to add .x files, consider moving the
 test to gcc.dg/torture instead.

So should I move all vect-shuffle-*.c files so that they are kept together?

Johann

 Richard.
 
 Johann

 testsuite/
* lib/target-supports.exp (check_effective_target_int32): New
function.
* gcc.c-torture/execute/vect-shuffle-1.c: Don't use
__SIZEOF_INT__.
* gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
* gcc.c-torture/execute/vect-shuffle-1.x: New file.
* gcc.c-torture/execute/vect-shuffle-5.x: New file.

Re: Vector shuffling

On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay a...@gjlay.de wrote:
 Richard Guenther schrieb:
 On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay a...@gjlay.de wrote:
 Artem Shinkarov schrieb:
 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.

 Thanks,
 Artem.

 On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote:
 On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
 Hi, can anyone commit it please?

 Richard?
 Or may be Richard?
 Committed.

 r~

 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.


 Thanks,
 Artem.

 The following test cases cause FAILs because main cannot be found by the 
 linker
  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
 file.

 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
 The following patch avoids __SIZEOF_INT__.

 Ok by some maintainer to commit?

 On a general note, if you need to add .x files, consider moving the
 test to gcc.dg/torture instead.

 So should I move all vect-shuffle-*.c files so that they are kept together?

Yes.

 Johann

 Richard.

 Johann

 testsuite/
        * lib/target-supports.exp (check_effective_target_int32): New
        function.
        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
        __SIZEOF_INT__.
        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
        * gcc.c-torture/execute/vect-shuffle-5.x: New file.

Re: Vector shuffling

On Thu, Oct 06, 2011 at 12:51:54PM +0200, Georg-Johann Lay wrote:
 The following patch avoids __SIZEOF_INT__.
 
 Ok by some maintainer to commit?

That is unnecessary.  You can just add
#else
int
main ()
{
  return 0;
}
before the final #endif in the files instead.
Or move around the #ifdefs, so that it ifdefs out for weirdo targets
just everything before main and then also main's body except for return 0;
at the end.

Jakub

[Committed] s390 bootstrap: last_bb_active set but not used

2011-10-06 Thread Andreas Krebbel

Hi,

this fixes a bootstrap problem on s390.  s390 doesn't have return
nor simple_return expanders so the last_bb_active variable stays
unused in thread_prologue_and_epilogue_insns.

Committed to mainline as obvious.

Bye,

-Andreas-


2011-10-06  Andreas Krebbel  andreas.kreb...@de.ibm.com

* function.c (thread_prologue_and_epilogue_insns): Mark
last_bb_active as possibly unused.  It is unused for targets which
do neither have return nor simple_return expanders.


Index: gcc/function.c
===
*** gcc/function.c.orig
--- gcc/function.c
*** thread_prologue_and_epilogue_insns (void
*** 5453,5459 
  {
bool inserted;
basic_block last_bb;
!   bool last_bb_active;
  #ifdef HAVE_simple_return
bool unconverted_simple_returns = false;
basic_block simple_return_block_hot = NULL;
--- 5453,5459 
  {
bool inserted;
basic_block last_bb;
!   bool last_bb_active ATTRIBUTE_UNUSED;
  #ifdef HAVE_simple_return
bool unconverted_simple_returns = false;
basic_block simple_return_block_hot = NULL;

Re: Modify gcc for use with gdb (issue5132047)

2011-10-06 Thread Diego Novillo


On 11-10-06 04:58 , Richard Guenther wrote:


I know you are on to that C++ thing and ending up returning a reference
to make it an lvalue.  Which I very much don't like (please, if you go
that route add _set functions and lower the case of the macros).


Not necessarily.  I'm after making the debugging experience easier 
(among other things).  Only a handful of macros were converted into 
functions in this patch, not all of them.  We may not *need* to convert 
all of them either.



What's the other advantage of using inline functions?  The gdb
annoyance with the macros can be solved with the .gdbinit macro
defines (which might be nice to commit to SVN btw).


Static type checking, of course.  Ability to set breakpoints, and as 
time goes on, more inline functions will start showing up.


We already have several.  The blacklist feature would solve your 
annoyance with tree_operand_length, too.  Additionally, blacklist can 
deal with non-inline functions, which can be useful.



Diego.

Re: [Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING

2011-10-06 Thread Tobias Burnus


*ping*
http://gcc.gnu.org/ml/fortran/2011-09/msg00150.html

On 09/28/2011 04:28 PM, Tobias Burnus wrote:
This patch makes the GCC extension __float128 (_Complex) available in 
the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX.


Additionally, I have improved the diagnostic for explicitly use 
associating -std= versioned symbols. And I have finally added the 
iso*.def files to the makefile dependencies.


As usual, with -std=f2008, the GNU extensions are not loaded. I have 
also updated the documentation.


OK for the trunk?

Tobias

PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for 
C's __float128, please speak up before gfortran - and other compilers 
implement it. (At least one vendor is implementing __float128 support 
and plans to modify ISO_C_BINDING.) The proper name would be 
C___FLOAT128, but that looks awkward!

Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Uros Bizjak

On Thu, Oct 6, 2011 at 2:55 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Oct 6, 2011 at 2:51 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Wow, it works!

 Thank you. New patch attached.
 ChangeLogs were not touched.

 Tests pass both on ia32/x86-64 with and without simulator.

 You are missing closing curly braces in dg-do compile directives.

 Also, please write:

 TYPE __attribute__((sseregparm))
 test_noneg_sub_noneg_sub (TYPE a, TYPE b, TYPE c)

 The patch is OK with these changes.

BTW, don't you also need -mfmpath=sse in dg-options?

Uros.

Re: [PATCH] Fix PR46556 (poor address generation)

On Thu, 2011-10-06 at 09:47 +0200, Paolo Bonzini wrote:
 And IIUC the other address is based on pseudo 125 as well, but the 
 combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
 cannot be represented on ppc.  I think _this_ is the problem, so I'm 
 afraid your patch could cause pessimizations on x86 for example.  On 
 x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
 to propagate pseudo 125 so that you can delete the set altogether.
 
 However, indeed there is no downstream pass that undoes the 
 transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
 all. :)  The attached untested (uncompiled) patch is an attempt.
 
 Paolo

Thanks, Paolo!  This makes good sense.  I will play with your (second :)
patch and let you know how it goes.

Bill

ARM: Fix PR49049

This corrects a brain fart in one of my patches last year: I added
another alternative to a subsi for subtraction of a constant. This is
bogus because such an operation should be canonicalized to a PLUS with
the negative constant, Normally that's what happens, and so testing
never showed that the alternative was only half-finished and didn't
work. PR49049 is a testcase where we do end up replacing a REG with a
constant and produce the bad alternative, leading to a crash.

Tested on arm-eabi and committed as obvious. Will do some sanity checks
on 4.6 and commit there as well.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 179606)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2011-10-06  Bernd Schmidt  ber...@codesourcery.com
+
+   PR target/49049
+   * config/arm/arm.md (arm_subsi3_insn): Lose the last alternative.
+
 2011-10-06  Ulrich Weigand  ulrich.weig...@linaro.org
 
PR target/50305
Index: gcc/testsuite/gcc.c-torture/compile/pr49049.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr49049.c   (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr49049.c   (revision 0)
@@ -0,0 +1,28 @@
+__extension__ typedef unsigned long long int uint64_t;
+
+static int
+sub (int a, int b)
+{
+  return a - b;
+}
+
+static uint64_t
+add (uint64_t a, uint64_t b)
+{
+  return a + b;
+}
+
+int *ptr;
+
+int
+foo (uint64_t arg1, int *arg2)
+{
+  int j;
+  for (; j  1; j++)
+{
+  *arg2 |= sub ( sub (sub (j || 1 ^ 0x1, 1), arg1  0x1 =
+  sub (1, *ptr  j)),
+(sub ( j != 1 || sub (j  j, 1) = 0,
+  add (!j  arg1, 0x35DLL;
+}
+}
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 179606)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2011-10-06  Bernd Schmidt  ber...@codesourcery.com
+
+   PR target/49049
+   * gcc.c-torture/compile/pr49049.c: New test.
+
 2011-10-06  Ulrich Weigand  ulrich.weig...@linaro.org
 
PR target/50305
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 179606)
+++ gcc/config/arm/arm.md   (working copy)
@@ -1213,27 +1213,24 @@ (define_insn thumb1_subsi3_insn
 
 ; ??? Check Thumb-2 split length
 (define_insn_and_split *arm_subsi3_insn
-  [(set (match_operand:SI   0 s_register_operand =r,r,rk,r,r)
-   (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,k,?n,r)
- (match_operand:SI 2 reg_or_int_operand r,rI,r, r,?n)))]
+  [(set (match_operand:SI   0 s_register_operand =r,r,rk,r)
+   (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,k,?n)
+ (match_operand:SI 2 reg_or_int_operand r,rI,r, r)))]
   TARGET_32BIT
   @
rsb%?\\t%0, %2, %1
sub%?\\t%0, %1, %2
sub%?\\t%0, %1, %2
-   #
#
-   ((GET_CODE (operands[1]) == CONST_INT
-!const_ok_for_arm (INTVAL (operands[1])))
-   || (GET_CODE (operands[2]) == CONST_INT
-   !const_ok_for_arm (INTVAL (operands[2]
+   (GET_CODE (operands[1]) == CONST_INT
+!const_ok_for_arm (INTVAL (operands[1])))
   [(clobber (const_int 0))]
   
   arm_split_constant (MINUS, SImode, curr_insn,
   INTVAL (operands[1]), operands[0], operands[2], 0);
   DONE;
   
-  [(set_attr length 4,4,4,16,16)
+  [(set_attr length 4,4,4,16)
(set_attr predicable yes)]
 )

Re: Builtin infrastructure change

2011-10-06 Thread Tobias Burnus


On 10/06/2011 03:02 PM, Michael Meissner wrote:

On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, Java, 
Ada,
and Go languages with no regressions



On a power6 box with RHEL 6.1, I
have done the same for C, C++, Objective C, Java, and Ada languages with no
regressions.


Any reason for not building and testing Fortran? Especially as you patch 
gcc/fortran/{trans*.c,f95-lang.c}?


Tobias


[gcc/fortran]
2011-10-05  Michael Meissnermeiss...@linux.vnet.ibm.com

* trans-expr.c (gfc_conv_power_op): Delete old interface with two
parallel arrays to hold standard builtin declarations, and replace
it with a function based interface that can support creating
builtins on the fly in the future.  Change all uses, and poison
the old names.  Make sure 0 is not a legitimate builtin index.
(fill_with_spaces): Ditto.
(gfc_trans_string_copy): Ditto.
(gfc_trans_zero_assign): Ditto.
(gfc_build_memcpy_call): Ditto.
(alloc_scalar_allocatable_for_assignment): Ditto.
* trans-array.c (gfc_trans_array_constructor_value): Ditto.
(duplicate_allocatable): Ditto.
(gfc_alloc_allocatable_for_assignment): Ditto.
* trans-openmp.c (gfc_omp_clause_copy_ctor): Ditto.
(gfc_omp_clause_assign_op): Ditto.
(gfc_trans_omp_atomic): Ditto.
(gfc_trans_omp_do): Ditto.
(gfc_trans_omp_task): Ditto.
* trans-stmt.c (gfc_trans_stop): Ditto.
(gfc_trans_sync): Ditto.
(gfc_trans_allocate): Ditto.
(gfc_trans_deallocate): Ditto.
* trans.c (gfc_call_malloc): Ditto.
(gfc_allocate_using_malloc): Ditto.
(gfc_call_free): Ditto.
(gfc_deallocate_with_status): Ditto.
(gfc_deallocate_scalar_with_status): Ditto.
* f95-lang.c (gfc_define_builtin): Ditto.
(gfc_init_builtin_functions): Ditto.
* trans-decl.c (create_main_function): Ditto.
* trans-intrinsic.c (builtin_decl_for_precision): Ditto.

[build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

2011-10-06 Thread Rainer Orth

As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my
previous libgcc patches.  While the crtstuff one will fix it, I'd like
to avoid breaking the target.

The following patch fixes the problem, as confirmed in the PR.

Ok for mainline?

Rainer


2011-10-04  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR bootstrap/49804
* config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to
extra_parts.

# HG changeset patch
# Parent a57e226a2b14812bfa3c37c1aa807f28fac223eb
Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

diff --git a/libgcc/config.host b/libgcc/config.host
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -777,7 +777,7 @@ sparc-wrs-vxworks)
 	;;
 sparc64-*-freebsd*|ultrasparc-*-freebsd*)
 	tmake_file=$tmake_file t-crtfm
-	extra_parts=crtfastmath.o
+	extra_parts=crtbegin.o crtbeginS.o crtend.o crtendS.o crtfastmath.o
 	;;
 sparc64-*-linux*)		# 64-bit SPARC's running GNU/Linux
 	extra_parts=$extra_parts crtfastmath.o


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

2011-10-06 Thread Paolo Bonzini


On 10/06/2011 03:29 PM, Rainer Orth wrote:

As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my
previous libgcc patches.  While the crtstuff one will fix it, I'd like
to avoid breaking the target.

The following patch fixes the problem, as confirmed in the PR.

Ok for mainline?

Rainer


2011-10-04  Rainer Orthr...@cebitec.uni-bielefeld.de

PR bootstrap/49804
* config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to
extra_parts.








Ok.

Paolo

Re: [PATCH 0/3] Fix vector shuffle problems

Hi,

On Wed, 5 Oct 2011, Richard Henderson wrote:

 Tested on x86_64 with
 
   check-gcc//unix/{,-mssse3,-msse4}
 
 Hopefully one of the AMD guys can test on a bulldozer with -mxop?

=== gcc Summary for unix//-mxop ===

# of expected passes160


Ciao,
Michael.

Re: [PATCH] Fix PR46556 (poor address generation)

On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
 People have already commented on pieces, so I'm looking only
 at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
 on IVOPTs instead?  The idea is to expose additional CSE
 opportunities, right?  So it's sort-of a strength-reduction
 optimization on scalar code (classically strength reduction
 in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
 That might be worth in general, even for non-address cases.
 So - if you rename that thing to tree-ssa-strength-reduce.c you
 can get away without piggy-backing on anything ;)  If you
 structure it to detect a strength reduction opportunity
 (thus, you'd need to match two/multiple of the patterns at the same time)
 that would be a bonus ... generalizing it a little bit would be
 another.

These are all good ideas.  I will think about casting this as a more
general strength reduction over extended basic blocks outside of loops.
First I'll put together some simple tests to see whether we're currently
missing some non-address opportunities.

snip

  +  mult_op0 = TREE_OPERAND (offset, 0);
  +  mult_op1 = TREE_OPERAND (offset, 1);
  +
  +  if (TREE_CODE (mult_op0) != PLUS_EXPR
  +  || TREE_CODE (mult_op1) != INTEGER_CST
  +  || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
  +return NULL_TREE;
  +
  +  t1 = TREE_OPERAND (base, 0);
  +  t2 = TREE_OPERAND (mult_op0, 0);
  +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
  +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
  +  c3 = TREE_INT_CST_LOW (mult_op1);
 
 Before accessing TREE_INT_CST_LOW you need to make sure the
 constants fit into a HWI using host_integerp () (which
 conveniently includes the check for INTEGER_CST).
 
 Note that you need to sign-extend the MEM_REF offset,
 thus use mem_ref_offset (base).low instead of
 TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
 to add a testcase with negative offset ;)

D'oh! .

 
  +  c4 = bitpos / BITS_PER_UNIT;
  +  c = c1 + c2 * c3 + c4;
 
 And you don't know whether this operation overflows.  Thus it's
 probably easiest to use double_ints instead of HOST_WIDE_INTs
 in all of the code.

OK, thanks, will do.

snip

 
  +  /* Determine whether the expression can be represented with base and
  + offset components.  */
  +  base = get_inner_reference (*expr, bitsize, bitpos, offset, mode,
  + unsignedp, volatilep, false);
  +  if (!base || !offset)
  +return false;
  +
  +  /* Look for a restructuring opportunity.  */
  +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
  + offset, bitpos)) == NULL_TREE)
  +return false;
 
 What I'm missing is a check whether the old address computation stmts
 will be dead after the transform.

Hm, not quite sure what to do here.  Prior to the transformation I'll
have an assignment with something like:

ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)

on LHS or RHS.  Ta and Td will be part of the replacement.  What should
I be checking for?

snip

   
  -  if (is_gimple_assign (stmt)
  -  !stmt_could_throw_p (stmt))
  +  /* Look for restructuring opportunities within an expression
  +that references memory.  We only do this for blocks not
  + contained in loops, since the ivopts machinery does a 
  + good job on loop expressions, and we don't want to interfere
  +with other loop optimizations.  */
  +  if (!in_loop  gimple_vuse (stmt)  gimple_assign_single_p (stmt))
  {
  + tree *lhs, *rhs;
  + lhs = gimple_assign_lhs_ptr (stmt);
  + chgd_mem_ref = restructure_mem_ref (lhs, gsi) || chgd_mem_ref;
  + rhs = gimple_assign_rhs1_ptr (stmt);
  + chgd_mem_ref = restructure_mem_ref (rhs, gsi) || chgd_mem_ref;
 
 It will either be a store or a load, but never both (unless it's an
 aggregate copy which I think we should not handle).  So ...
 
   if (gimple_vdef (stmt))
 ... lhs
   else if (gimple_vuse (stmt))
 ... rhs

OK, with your suggested gating on non-BLKmode I agree.

  +   }
  +
  +  else if (is_gimple_assign (stmt)
  +   !stmt_could_throw_p (stmt))
  +   {
tree lhs, rhs1, rhs2;
enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
   
  @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
  }
  }
   }
  +  /* If memory references have been restructured, immediate uses need
  + to be cleaned up.  */
  +  if (chgd_mem_ref)
  +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
  +  update_stmt (gsi_stmt (gsi));
 
 ICK.  Definitely a no ;)
 
 Why does a update_stmt () after the restructure_mem_ref call not work?

Ah, yeah, I meant to check again on that before submitting.  . 

IIRC, at some point the update_stmt () following restructure_mem_ref was
still giving me verify errors.  I thought perhaps the statements created
by force_gimple_operand_gsi might be giving me

Re: Initial shrink-wrapping patch

On 10/06/11 01:47, Bernd Schmidt wrote:
 This appears to be because the split prologue contains a jump, which
 means the find_many_sub_blocks call reorders the block numbers, and our
 indices into bb_flags are off.

Testing of the patch completed - ok? Regardless of split-stack it seems
like a cleanup and eliminates a potential source of errors.


Bernd

Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Kirill Yukhin


 BTW, don't you also need -mfmpath=sse in dg-options?


According to doc/invoke.texi
...
@itemx -mfma
...
These options will enable GCC to use these extended instructions in
generated code, even without @option{-mfpmath=sse}.

Seems it -mfpmath=sse is useless..
Although, if this is wrong, we probably have to update doc as well.

Thanks, K

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

On Thu, 6 Oct 2011, Richard Guenther wrote:

  +       ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
  +           TREE_CODE (arg1) != TRUTH_NOT_EXPR)
  +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
 
 ?  simple_operand_p would have rejected both ! and comparisons.
 
 I miss a test for side-effects on arg0 (and probably simple_operand_p there,
 as well).

He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not 
evaluate the second argument for side-effects when the first argument is 
false/true already, and further to establish an order between both 
evaluations.  The sideeffect on the first arg is always evaluated.  
AND/OR always evaluate both arguments (in unspecified order), but as he 
checks the second one for being free of side effects already that alone is 
already equivalent to ANDIF/ORIF.  No need to check something on the first 
argument.


Ciao,
Michael.

Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Tristan Gingold


On Oct 3, 2011, at 10:23 PM, Joseph S. Myers wrote:

 On Mon, 3 Oct 2011, Douglas Rupp wrote:
 
 On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
 On Fri, 30 Sep 2011, Tristan Gingold wrote:
 
 If you prefer a target hook, I'm fine with that.  I will write such a
 patch.
 
 I don't think it must be restricted to system headers, as it is possible
 that the user 'imports' such a function (and define it in one of VMS
 favorite languages such as macro-32 or bliss).
 If it's not restricted to system headers, then probably the option is
 better than the target hook.
 
 I'm not sure I understand the reasoning here.  This seems fairly VMS specific
 so what is the downside for a target hook and user written headers?
 
 The language accepted by the compiler in the user's source code (as 
 opposed to in system headers) shouldn't depend on the target except for 
 certain well-defined areas such as target attributes and built-in 
 functions; behaving the same across different systems is an important 
 feature of GCC.  This isn't one of those areas of target-dependence; it's 
 generic syntax rather than e.g. exploiting a particular processor feature.

So the consensus is for a dedicated option.  Which one do you prefer ?

-funnamed-variadic-parameter
-fpointless-variadic-functions
-fallow-parameterless-variadic-functions

I will update my patch once this is settled.

Thanks,
Tristan.

[PATCH, AIX] Add missing macros PR39950

2011-10-06 Thread David Edelsohn

The appended patch adds a few macros that XLC now defines on AIX.

- David

* config/rs6000/aix.h (TARGET_OS_AIX_CPP_BUILTINS): Define
__powerpc__, __PPC__, __unix__.

Index: aix.h
===
--- aix.h   (revision 179610)
+++ aix.h   (working copy)
@@ -97,6 +97,9 @@
 {  \
   builtin_define (_IBMR2);   \
   builtin_define (_POWER);   \
+  builtin_define (__powerpc__);   \
+  builtin_define (__PPC__);   \
+  builtin_define (__unix__);  \
   builtin_define (_AIX); \
   builtin_define (_AIX32);   \
   builtin_define (_AIX41);   \

Re: [PATCH] Fix PR46556 (poor address generation)

On Thu, 6 Oct 2011, William J. Schmidt wrote:

 On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
  People have already commented on pieces, so I'm looking only
  at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
  on IVOPTs instead?  The idea is to expose additional CSE
  opportunities, right?  So it's sort-of a strength-reduction
  optimization on scalar code (classically strength reduction
  in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
  That might be worth in general, even for non-address cases.
  So - if you rename that thing to tree-ssa-strength-reduce.c you
  can get away without piggy-backing on anything ;)  If you
  structure it to detect a strength reduction opportunity
  (thus, you'd need to match two/multiple of the patterns at the same time)
  that would be a bonus ... generalizing it a little bit would be
  another.
 
 These are all good ideas.  I will think about casting this as a more
 general strength reduction over extended basic blocks outside of loops.
 First I'll put together some simple tests to see whether we're currently
 missing some non-address opportunities.
 
 snip
 
   +  mult_op0 = TREE_OPERAND (offset, 0);
   +  mult_op1 = TREE_OPERAND (offset, 1);
   +
   +  if (TREE_CODE (mult_op0) != PLUS_EXPR
   +  || TREE_CODE (mult_op1) != INTEGER_CST
   +  || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
   +return NULL_TREE;
   +
   +  t1 = TREE_OPERAND (base, 0);
   +  t2 = TREE_OPERAND (mult_op0, 0);
   +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
   +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
   +  c3 = TREE_INT_CST_LOW (mult_op1);
  
  Before accessing TREE_INT_CST_LOW you need to make sure the
  constants fit into a HWI using host_integerp () (which
  conveniently includes the check for INTEGER_CST).
  
  Note that you need to sign-extend the MEM_REF offset,
  thus use mem_ref_offset (base).low instead of
  TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
  to add a testcase with negative offset ;)
 
 D'oh! .
 
  
   +  c4 = bitpos / BITS_PER_UNIT;
   +  c = c1 + c2 * c3 + c4;
  
  And you don't know whether this operation overflows.  Thus it's
  probably easiest to use double_ints instead of HOST_WIDE_INTs
  in all of the code.
 
 OK, thanks, will do.
 
 snip
 
  
   +  /* Determine whether the expression can be represented with base and
   + offset components.  */
   +  base = get_inner_reference (*expr, bitsize, bitpos, offset, mode,
   +   unsignedp, volatilep, false);
   +  if (!base || !offset)
   +return false;
   +
   +  /* Look for a restructuring opportunity.  */
   +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
   +   offset, bitpos)) == NULL_TREE)
   +return false;
  
  What I'm missing is a check whether the old address computation stmts
  will be dead after the transform.
 
 Hm, not quite sure what to do here.  Prior to the transformation I'll
 have an assignment with something like:
 
 ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)
 
 on LHS or RHS.  Ta and Td will be part of the replacement.  What should
 I be checking for?

Doh, I thought you were matching gimple stmts that do the address
computation.  But now I see you are matching the tree returned from
get_inner_reference.  So no need to check anything for that case.

But that keeps me wondering what you'll do if the accesses were
all pointer arithmetic, not arrays.  Thus,

extern void foo (int, int, int);

void
f (int *p, unsigned int n)
{
 foo (p[n], p[n+64], p[n+128]);
}

wouldn't that have the same issue and you wouldn't handle it?

Richard.

[PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c

Hi!

If the second argument of gimple_build_assign_with_ops is an SSA_NAME,
gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs
which does
  if (lhs  TREE_CODE (lhs) == SSA_NAME)
SSA_NAME_DEF_STMT (lhs) = gs;
so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed.
Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2011-10-06  Jakub Jelinek  ja...@redhat.com

* tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs
don't set SSA_NAME_DEF_STMT that has been already set by
gimple_build_assign_with_ops.
(vect_recog_pow_pattern, vect_recog_widen_sum_pattern,
vect_operation_fits_smaller_type, vect_recog_over_widening_pattern):
Likewise.

--- gcc/tree-vect-patterns.c.jj 2011-10-06 12:37:34.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 13:19:44.0 +0200
@@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple 
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd,
   NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
   VEC_safe_push (gimple, heap, *stmts, def_stmt);
   *oprnd = new_oprnd;
@@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0,
   oprnd1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
 
   if (vect_print_dump_info (REPORT_DETAILS))
 print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
@@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea
 
   var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL);
   stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base);
-  SSA_NAME_DEF_STMT (var) = stmt;
   return stmt;
 }
 
@@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var,
   oprnd0, oprnd1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
 
   if (vect_print_dump_info (REPORT_DETAILS))
 {
@@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
oprnd, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
   VEC_safe_push (gimple, heap, *stmts, def_stmt);
   oprnd = new_oprnd;
@@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
oprnd, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   oprnd = new_oprnd;
   *new_def_stmt = new_stmt;
 }
@@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g
 VEC_safe_push (gimple, heap, *stmts, prev_stmt);
 
   var = vect_recog_temp_ssa_var (new_type, NULL);
-  pattern_stmt = gimple_build_assign_with_ops (
-  gimple_assign_rhs_code (stmt), var, op0, op1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
+  pattern_stmt
+   = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var,
+   op0, op1);
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
   STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt;
 
@@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g
   new_oprnd = make_ssa_name (tmp, NULL);
   pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
var, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
 
   *type_in = get_vectype_for_scalar_type (new_type);

Jakub

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

On Thu, Oct 6, 2011 at 3:49 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 6 Oct 2011, Richard Guenther wrote:

  +       ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
  +           TREE_CODE (arg1) != TRUTH_NOT_EXPR)
  +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)

 ?  simple_operand_p would have rejected both ! and comparisons.

 I miss a test for side-effects on arg0 (and probably simple_operand_p there,
 as well).

 He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not
 evaluate the second argument for side-effects when the first argument is
 false/true already, and further to establish an order between both
 evaluations.  The sideeffect on the first arg is always evaluated.
 AND/OR always evaluate both arguments (in unspecified order), but as he
 checks the second one for being free of side effects already that alone is
 already equivalent to ANDIF/ORIF.  No need to check something on the first
 argument.

It seems to me it should then simply be

  if (!TREE_SIDE_EFFECTS (arg1)
  simple_operand_p (arg1))
   return fold-the-not-and-variant ();

Richard.

[PATCH] Don't fold always_inline not yet inlined builtins in gimple_fold_builtin

Hi!

The 3 functions in builtins.c that dispatch builtin folding give up
if avoid_folding_inline_builtin (fndecl) returns true, because we
want to wait with those functions until they are inlined (which for
-D_FORTIFY_SOURCE contains security checks).  Unfortunately
gimple_fold_builtin calls fold_builtin_str* etc. directly and thus bypasses
this check.  This didn't show up often because most of the inlines
have __restrict arguments and restrict casts weren't considered useless.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
preapproved by richi on IRC, will commit to trunk momentarily.

2011-10-06  Jakub Jelinek  ja...@redhat.com

* tree.h (avoid_folding_inline_builtin): New prototype.
* builtins.c (avoid_folding_inline_builtin): No longer static.
* gimple-fold.c (gimple_fold_builtin): Give up if
avoid_folding_inline_builtin returns true.

--- gcc/tree.h.jj   2011-10-03 14:27:50.0 +0200
+++ gcc/tree.h  2011-10-06 13:26:32.0 +0200
@@ -5352,6 +5352,7 @@ fold_build_pointer_plus_hwi_loc (locatio
fold_build_pointer_plus_hwi_loc (UNKNOWN_LOCATION, p, o)
 
 /* In builtins.c */
+extern bool avoid_folding_inline_builtin (tree);
 extern tree fold_call_expr (location_t, tree, bool);
 extern tree fold_builtin_fputs (location_t, tree, tree, bool, bool, tree);
 extern tree fold_builtin_strcpy (location_t, tree, tree, tree, tree);
--- gcc/builtins.c.jj   2011-10-05 08:13:55.0 +0200
+++ gcc/builtins.c  2011-10-06 13:25:39.0 +0200
@@ -10360,7 +10360,7 @@ fold_builtin_varargs (location_t loc, tr
been inlined, otherwise e.g. -D_FORTIFY_SOURCE checking
might not be performed.  */
 
-static bool
+bool
 avoid_folding_inline_builtin (tree fndecl)
 {
   return (DECL_DECLARED_INLINE_P (fndecl)
--- gcc/gimple-fold.c.jj2011-10-06 09:14:17.0 +0200
+++ gcc/gimple-fold.c   2011-10-06 13:29:08.0 +0200
@@ -828,6 +828,11 @@ gimple_fold_builtin (gimple stmt)
   if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_MD)
 return NULL_TREE;
 
+  /* Give up for always_inline inline builtins until they are
+ inlined.  */
+  if (avoid_folding_inline_builtin (callee))
+return NULL_TREE;
+
   /* If the builtin could not be folded, and it has no argument list,
  we're done.  */
   nargs = gimple_call_num_args (stmt);

Jakub

[PATCH] Improve vector lowering a bit


This makes us lookup previous intermediate vector results when
decomposing a operation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  rguent...@suse.de

* tree-vect-generic.c (vector_element): Look at previous
generated results.

Index: gcc/tree-vect-generic.c
===
*** gcc/tree-vect-generic.c (revision 179598)
--- gcc/tree-vect-generic.c (working copy)
*** vector_element (gimple_stmt_iterator *gs
*** 536,541 
--- 536,552 
  idx = build_int_cst (TREE_TYPE (idx), index);
}
  
+   /* When lowering a vector statement sequence do some easy
+  simplification by looking through intermediate vector results.  */
+   if (TREE_CODE (vect) == SSA_NAME)
+   {
+ gimple def_stmt = SSA_NAME_DEF_STMT (vect);
+ if (is_gimple_assign (def_stmt)
+  (gimple_assign_rhs_code (def_stmt) == VECTOR_CST
+ || gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR))
+   vect = gimple_assign_rhs1 (def_stmt);
+   }
+ 
if (TREE_CODE (vect) == VECTOR_CST)
  {
  unsigned i;

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

I modified the patch so, that it always just converts two leafs of a 
TRUTH(AND|OR)IF chain into a TRUTH_(AND|OR) expression, if branch costs are 
high and leafs are simple without side-effects.

Additionally I added some testcases for it.

2011-10-06  Kai Tietz  kti...@redhat.com

* fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
to TRUTH_OR_EXPR, if suitable.

2011-10-06  Kai Tietz  kti...@redhat.com

* gcc.dg/tree-ssa/ssa-ifbranch-1.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-2.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-3.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-4.c: New test.

Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! mips*-*-* s390*-*-*  avr-*-* mn10300-*-* } } 
} */
+/* { dg-options -O2 -fdump-tree-gimple } */
+/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-*  
ilp32 } } } */
+
+extern int doo1 (void);
+extern int doo2 (void);
+
+int bar (int a, int b, int c)
+{
+  if (a  b  c)
+return doo1 ();
+  return doo2 ();
+}
+
+/* { dg-final { scan-tree-dump-times if  2 gimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */
Index: gcc-head/gcc/fold-const.c
===
--- gcc-head.orig/gcc/fold-const.c
+++ gcc-head/gcc/fold-const.c
@@ -8387,6 +8387,45 @@ fold_truth_andor (location_t loc, enum t
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
 return tem;
 
+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+   !TREE_SIDE_EFFECTS (arg1)
+   LOGICAL_OP_NON_SHORT_CIRCUIT
+  /* floats might trap.  */
+   !FLOAT_TYPE_P (TREE_TYPE (arg1))
+   ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
+TREE_CODE (arg1) != TRUTH_NOT_EXPR
+simple_operand_p (arg1))
+  || ((TREE_CODE_CLASS (TREE_CODE (arg1)) == tcc_comparison
+   || TREE_CODE (arg1) == TRUTH_NOT_EXPR)
+ /* Float comparison might trap.  */
+   !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)))
+   simple_operand_p (TREE_OPERAND (arg1, 0)
+{
+  /* We want to combine truth-comparison for
+((W TRUTH-ANDOR X) TRUTH-ANDORIF Y) TRUTH-ANDORIF Z,
+if Y and Z are simple operands and have no side-effect to
+((W TRUTH-ANDOR X) TRUTH-IF (Y TRUTH-ANDOR Z).  */
+  if (TREE_CODE (arg0) == code
+   !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+   simple_operand_p (TREE_OPERAND (arg0, 1)))
+   {
+ tem = build2_loc (loc,
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+ : TRUTH_OR_EXPR),
+   type, TREE_OPERAND (arg0, 1), arg1);
+ return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
+tem);
+   }
+  /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
+are simple operands and have no side-effects.  */
+  if (simple_operand_p (arg0)
+   !TREE_SIDE_EFFECTS (arg0))
+   return build2_loc (loc,
+  (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+: TRUTH_OR_EXPR),
+  type, arg0, arg1);
+}
+
   return NULL_TREE;
 }
 
Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! mips*-*-* s390*-*-*  avr-*-* mn10300-*-* } } 
} */
+/* { dg-options -O2 -fdump-tree-gimple } */
+/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-*  
ilp32 } } } */
+
+extern int doo1 (void);
+extern int doo2 (void);
+
+int bar (int a, int b, int c, int d)
+{
+  if (a  b  c  d)
+return doo1 ();
+  return doo2 ();
+}
+
+/* { dg-final { scan-tree-dump-times if  2 gimple } } */
+/* { dg-final { cleanup-tree-dump gimple } } */
Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! mips*-*-* s390*-*-*  avr-*-*

Re: [PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c

On Thu, 6 Oct 2011, Jakub Jelinek wrote:

 Hi!
 
 If the second argument of gimple_build_assign_with_ops is an SSA_NAME,
 gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs
 which does
   if (lhs  TREE_CODE (lhs) == SSA_NAME)
 SSA_NAME_DEF_STMT (lhs) = gs;
 so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed.
 Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
 ok for trunk?

Ok.

Thanks,
Richard.

 2011-10-06  Jakub Jelinek  ja...@redhat.com
 
   * tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs
   don't set SSA_NAME_DEF_STMT that has been already set by
   gimple_build_assign_with_ops.
   (vect_recog_pow_pattern, vect_recog_widen_sum_pattern,
   vect_operation_fits_smaller_type, vect_recog_over_widening_pattern):
   Likewise.
 
 --- gcc/tree-vect-patterns.c.jj   2011-10-06 12:37:34.0 +0200
 +++ gcc/tree-vect-patterns.c  2011-10-06 13:19:44.0 +0200
 @@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple 
new_oprnd = make_ssa_name (tmp, NULL);
new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd,
  NULL_TREE);
 -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
VEC_safe_push (gimple, heap, *stmts, def_stmt);
*oprnd = new_oprnd;
 @@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp
var = vect_recog_temp_ssa_var (type, NULL);
pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0,
  oprnd1);
 -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
  
if (vect_print_dump_info (REPORT_DETAILS))
  print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
 @@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea
  
var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL);
stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base);
 -  SSA_NAME_DEF_STMT (var) = stmt;
return stmt;
  }
  
 @@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl
var = vect_recog_temp_ssa_var (type, NULL);
pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var,
  oprnd0, oprnd1);
 -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
  
if (vect_print_dump_info (REPORT_DETAILS))
  {
 @@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple
new_oprnd = make_ssa_name (tmp, NULL);
new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
 oprnd, NULL_TREE);
 -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
VEC_safe_push (gimple, heap, *stmts, def_stmt);
oprnd = new_oprnd;
 @@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple
new_oprnd = make_ssa_name (tmp, NULL);
new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
 oprnd, NULL_TREE);
 -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
oprnd = new_oprnd;
*new_def_stmt = new_stmt;
  }
 @@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g
  VEC_safe_push (gimple, heap, *stmts, prev_stmt);
  
var = vect_recog_temp_ssa_var (new_type, NULL);
 -  pattern_stmt = gimple_build_assign_with_ops (
 -  gimple_assign_rhs_code (stmt), var, op0, op1);
 -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
 +  pattern_stmt
 + = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var,
 + op0, op1);
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt;
  
 @@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g
new_oprnd = make_ssa_name (tmp, NULL);
pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
 var, NULL_TREE);
 -  SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt;
STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
  
*type_in = get_vectype_for_scalar_type (new_type);
 
   Jakub
 
 

-- 
Richard Guenther rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

[PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)

Hi!

CAST_RESTRICT based disambiguation unfortunately isn't reliable,
e.g. to store a non-restrict pointer into a restricted field,
we add a non-useless cast to restricted pointer in the gimplifier,
and while we don't consider that field to have a special restrict tag
because it is unsafe to do so, we unfortunately create it for the
CAST_RESTRICT before that and end up with different restrict tags
for the same thing.  See the PR for more details.

This patch turns off CAST_RESTRICT handling for now, in the future
we might try to replace it by explicit CAST_RESTRICT stmts in some form,
but need to solve problems with multiple inlined copies of the same function
with restrict arguments or restrict variables in it and intermixed code from
them (or similarly code from different non-overlapping source blocks).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
4.6 too?

2011-10-06  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/49279
* tree-ssa-structalias.c (find_func_aliases): Don't handle
CAST_RESTRICT.
* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow
restrict propagation.
* tree-ssa.c (useless_type_conversion_p): Don't return false
if TYPE_RESTRICT differs.

* gcc.dg/tree-ssa/restrict-4.c: XFAIL.
* gcc.c-torture/execute/pr49279.c: New test.

--- gcc/tree-ssa-structalias.c.jj   2011-10-04 10:18:29.0 +0200
+++ gcc/tree-ssa-structalias.c  2011-10-05 12:43:42.0 +0200
@@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt)
   (!in_ipa_mode
  || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
make_escape_constraint (rhsop);
-  /* If this is a conversion of a non-restrict pointer to a
-restrict pointer track it with a new heapvar.  */
-  else if (gimple_assign_cast_p (t)
-   POINTER_TYPE_P (TREE_TYPE (rhsop))
-   POINTER_TYPE_P (TREE_TYPE (lhsop))
-   !TYPE_RESTRICT (TREE_TYPE (rhsop))
-   TYPE_RESTRICT (TREE_TYPE (lhsop)))
-   make_constraint_from_restrict (get_vi_for_tree (lhsop),
-  CAST_RESTRICT);
 }
   /* Handle escapes through return.  */
   else if (gimple_code (t) == GIMPLE_RETURN
--- gcc/tree-ssa-forwprop.c.jj  2011-10-04 14:36:00.0 +0200
+++ gcc/tree-ssa-forwprop.c 2011-10-05 12:46:32.0 +0200
@@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name
((rhs_code == SSA_NAME  rhs == name)
  || CONVERT_EXPR_CODE_P (rhs_code)))
 {
-  /* Don't propagate restrict pointer's RHS.  */
-  if (TYPE_RESTRICT (TREE_TYPE (lhs))
-  !TYPE_RESTRICT (TREE_TYPE (name))
-  !is_gimple_min_invariant (def_rhs))
-   return false;
   /* Only recurse if we don't deal with a single use or we cannot
 do the propagation to the current statement.  In particular
 we can end up with a conversion needed for a non-invariant
--- gcc/tree-ssa.c.jj   2011-09-15 12:18:54.0 +0200
+++ gcc/tree-ssa.c  2011-10-05 12:44:52.0 +0200
@@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty
  != TYPE_ADDR_SPACE (TREE_TYPE (inner_type)))
return false;
 
-  /* Do not lose casts to restrict qualified pointers.  */
-  if ((TYPE_RESTRICT (outer_type)
-  != TYPE_RESTRICT (inner_type))
-  TYPE_RESTRICT (outer_type))
-   return false;
-
   /* If the outer type is (void *), the conversion is not necessary.  */
   if (VOID_TYPE_P (TREE_TYPE (outer_type)))
return true;
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj   2011-10-04 
14:33:08.0 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c  2011-10-05 16:22:33.232433231 
+0200
@@ -22,5 +22,5 @@ bar (int *x, int y)
   return p1[y];
 }
 
-/* { dg-final { scan-tree-dump-times return 1; 2 optimized } } */
+/* { dg-final { scan-tree-dump-times return 1; 2 optimized { xfail *-*-* } 
} } */
 /* { dg-final { cleanup-tree-dump optimized } } */
--- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj2011-10-05 
13:32:43.087670846 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr49279.c   2011-10-05 
13:32:43.087670846 +0200
@@ -0,0 +1,35 @@
+/* PR tree-optimization/49279 */
+extern void abort (void);
+
+struct S { int a; int *__restrict p; };
+
+__attribute__((noinline, noclone))
+struct S *bar (struct S *p)
+{
+  struct S *r;
+  asm volatile ( : =r (r) : 0 (p) : memory);
+  return r;
+}
+
+__attribute__((noinline, noclone))
+int
+foo (int *p, int *q)
+{
+  struct S s, *t;
+  s.a = 1;
+  s.p = p;
+  t = bar (s);
+  t-p = q;
+  s.p[0] = 0;
+  t-p[0] = 1;
+  return s.p[0];
+}
+
+int
+main ()
+{
+  int a, b;
+  if (foo (a, b) != 1)
+abort ();
+  return 0;
+}

Jakub

Re: Builtin infrastructure change

2011-10-06 Thread Michael Meissner

On Thu, Oct 06, 2011 at 03:23:07PM +0200, Tobias Burnus wrote:
 On 10/06/2011 03:02 PM, Michael Meissner wrote:
 On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, 
 Java, Ada,
 and Go languages with no regressions
 
 On a power6 box with RHEL 6.1, I
 have done the same for C, C++, Objective C, Java, and Ada languages with no
 regressions.
 
 Any reason for not building and testing Fortran? Especially as you
 patch gcc/fortran/{trans*.c,f95-lang.c}?
 
 Tobias

Brain fault on my part.  I tested the previous set of patches with Fortran.
Since I had to explicitly add the languages to pick up Ada and Go, I seemed to
have dropped Fortran.  Sigh.  Sorry about that.  I just started the powerpc
bootstrap, since that is a lot faster.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899

[v3] Avoid spurious fails when running the testsuite with -std=gnu++0x

2011-10-06 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed.

Paolo.


2011-10-06  Paolo Carlini  paolo.carl...@oracle.com

* testsuite/27_io/ios_base/cons/assign_neg.cc: Tidy dg- directives,
for C++0x testing too.
* testsuite/27_io/ios_base/cons/copy_neg.cc: Likewise.
* testsuite/ext/pb_ds/example/hash_resize_neg.cc: Likewise.
* testsuite/24_iterators/istreambuf_iterator/requirements/
base_classes.cc: Adjust for C++0x testing.
* testsuite/ext/codecvt/char-1.cc: Avoid warnings in C++0x mode.
* testsuite/ext/codecvt/char-2.cc: Likewise.
* testsuite/ext/codecvt/wchar_t.cc: Likewise.
Index: testsuite/27_io/ios_base/cons/assign_neg.cc
===
--- testsuite/27_io/ios_base/cons/assign_neg.cc (revision 179595)
+++ testsuite/27_io/ios_base/cons/assign_neg.cc (working copy)
@@ -18,21 +18,18 @@
 // with this library; see the file COPYING3.  If not see
 // http://www.gnu.org/licenses/.
 
-
 #include ios
 
 // Library defect report
 //50.  Copy constructor and assignment operator of ios_base
-class test_base : public std::ios_base { };
+class test_base : public std::ios_base { }; // { dg-error within this 
context|deleted } 
 
 void test01()
 {
   // assign
   test_base io1;
   test_base io2;
-  io1 = io2;
+  io1 = io2; // { dg-error synthesized|deleted }
 }
-// { dg-error synthesized  { target *-*-* } 33 } 
-// { dg-error within this context  { target *-*-* } 26 } 
-// { dg-error is private  { target *-*-* } 791 }
-// { dg-error operator=  { target *-*-* } 0 } 
+
+// { dg-prune-output include }
Index: testsuite/27_io/ios_base/cons/copy_neg.cc
===
--- testsuite/27_io/ios_base/cons/copy_neg.cc   (revision 179595)
+++ testsuite/27_io/ios_base/cons/copy_neg.cc   (working copy)
@@ -18,21 +18,18 @@
 // with this library; see the file COPYING3.  If not see
 // http://www.gnu.org/licenses/.
 
-
 #include ios
 
 // Library defect report
 //50.  Copy constructor and assignment operator of ios_base
-struct test_base : public std::ios_base 
+struct test_base : public std::ios_base // { dg-error within this 
context|deleted }
 { };
 
 void test02()
 {
   // copy ctor
   test_base io1;
-  test_base io2 = io1; 
+  test_base io2 = io1; // { dg-error synthesized|deleted } 
 }
-// { dg-error within this context  { target *-*-* } 26 }
-// { dg-error synthesized  { target *-*-* } 33 } 
-// { dg-error is private  { target *-*-* } 788 } 
-// { dg-error copy constructor  { target *-*-* } 0 } 
+
+// { dg-prune-output include }
Index: testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc
===
--- testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc 
(revision 179595)
+++ testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc 
(working copy)
@@ -1,7 +1,8 @@
 // { dg-do compile }
 // 1999-06-28 bkoz
 
-// Copyright (C) 1999, 2001, 2003, 2009 Free Software Foundation, Inc.
+// Copyright (C) 1999, 2001, 2003, 2009, 2010, 2011
+// Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -31,8 +32,15 @@
   // Check for required base class.
   typedef istreambuf_iteratorchar test_iterator;
   typedef char_traitschar::off_type off_type;
-  typedef iteratorinput_iterator_tag, char, off_type, char*, char 
base_iterator;
 
+  typedef iteratorinput_iterator_tag, char, off_type, char*,
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+char
+#else
+char
+#endif
+base_iterator;
+
   istringstream isstream(this tag);
   test_iterator  r_it(isstream);
   base_iterator* base __attribute__((unused)) = r_it;
Index: testsuite/ext/pb_ds/example/hash_resize_neg.cc
===
--- testsuite/ext/pb_ds/example/hash_resize_neg.cc  (revision 179595)
+++ testsuite/ext/pb_ds/example/hash_resize_neg.cc  (working copy)
@@ -1,7 +1,8 @@
 // { dg-do compile }
 // -*- C++ -*-
 
-// Copyright (C) 2005, 2006, 2007, 2009 Free Software Foundation, Inc.
+// Copyright (C) 2005, 2006, 2007, 2009, 2010, 2011
+// Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the terms
@@ -60,4 +61,4 @@
   h.resize(20); // { dg-error required from }
 }
 
-// { dg-error invalid  { target *-*-* } 187 } 
+// { dg-prune-output include }
Index: testsuite/ext/codecvt/char-1.cc
===
--- testsuite/ext/codecvt/char-1.cc (revision 179595)
+++ testsuite/ext/codecvt/char-1.cc (working copy)
@@ -4,6 +4,7 @@
 // 2000-08-22 Benjamin Kosnik b...@cygnus.com
 
 // Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009
+// 2010, 2011

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011/10/6 Michael Matz m...@suse.de:
 Hi,

 On Thu, 6 Oct 2011, Richard Guenther wrote:

  +       ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
  +           TREE_CODE (arg1) != TRUTH_NOT_EXPR)
  +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)

 ?  simple_operand_p would have rejected both ! and comparisons.

 I miss a test for side-effects on arg0 (and probably simple_operand_p there,
 as well).

 He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not
 evaluate the second argument for side-effects when the first argument is
 false/true already, and further to establish an order between both
 evaluations.  The sideeffect on the first arg is always evaluated.
 AND/OR always evaluate both arguments (in unspecified order), but as he
 checks the second one for being free of side effects already that alone is
 already equivalent to ANDIF/ORIF.  No need to check something on the first
 argument.


 Ciao,
 Michael.


That's not the hole story.  The difference between
TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for
TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for
TRUTH_(AND|OR)_EXPR it doesn't.

Regards,
Kai

Re: [PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)

On Thu, 6 Oct 2011, Jakub Jelinek wrote:

 Hi!
 
 CAST_RESTRICT based disambiguation unfortunately isn't reliable,
 e.g. to store a non-restrict pointer into a restricted field,
 we add a non-useless cast to restricted pointer in the gimplifier,
 and while we don't consider that field to have a special restrict tag
 because it is unsafe to do so, we unfortunately create it for the
 CAST_RESTRICT before that and end up with different restrict tags
 for the same thing.  See the PR for more details.
 
 This patch turns off CAST_RESTRICT handling for now, in the future
 we might try to replace it by explicit CAST_RESTRICT stmts in some form,
 but need to solve problems with multiple inlined copies of the same function
 with restrict arguments or restrict variables in it and intermixed code from
 them (or similarly code from different non-overlapping source blocks).
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 4.6 too?

Ok for trunk.  Ok for 4.6 with the tree-ssa.c change omitted -
and the stmt folding patch applied.

Thanks,
Richard.

 2011-10-06  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/49279
   * tree-ssa-structalias.c (find_func_aliases): Don't handle
   CAST_RESTRICT.
   * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow
   restrict propagation.
   * tree-ssa.c (useless_type_conversion_p): Don't return false
   if TYPE_RESTRICT differs.
 
   * gcc.dg/tree-ssa/restrict-4.c: XFAIL.
   * gcc.c-torture/execute/pr49279.c: New test.
 
 --- gcc/tree-ssa-structalias.c.jj 2011-10-04 10:18:29.0 +0200
 +++ gcc/tree-ssa-structalias.c2011-10-05 12:43:42.0 +0200
 @@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt)
  (!in_ipa_mode
 || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
   make_escape_constraint (rhsop);
 -  /* If this is a conversion of a non-restrict pointer to a
 -  restrict pointer track it with a new heapvar.  */
 -  else if (gimple_assign_cast_p (t)
 - POINTER_TYPE_P (TREE_TYPE (rhsop))
 - POINTER_TYPE_P (TREE_TYPE (lhsop))
 - !TYPE_RESTRICT (TREE_TYPE (rhsop))
 - TYPE_RESTRICT (TREE_TYPE (lhsop)))
 - make_constraint_from_restrict (get_vi_for_tree (lhsop),
 -CAST_RESTRICT);
  }
/* Handle escapes through return.  */
else if (gimple_code (t) == GIMPLE_RETURN
 --- gcc/tree-ssa-forwprop.c.jj2011-10-04 14:36:00.0 +0200
 +++ gcc/tree-ssa-forwprop.c   2011-10-05 12:46:32.0 +0200
 @@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name
 ((rhs_code == SSA_NAME  rhs == name)
 || CONVERT_EXPR_CODE_P (rhs_code)))
  {
 -  /* Don't propagate restrict pointer's RHS.  */
 -  if (TYPE_RESTRICT (TREE_TYPE (lhs))
 -!TYPE_RESTRICT (TREE_TYPE (name))
 -!is_gimple_min_invariant (def_rhs))
 - return false;
/* Only recurse if we don't deal with a single use or we cannot
do the propagation to the current statement.  In particular
we can end up with a conversion needed for a non-invariant
 --- gcc/tree-ssa.c.jj 2011-09-15 12:18:54.0 +0200
 +++ gcc/tree-ssa.c2011-10-05 12:44:52.0 +0200
 @@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty
 != TYPE_ADDR_SPACE (TREE_TYPE (inner_type)))
   return false;
  
 -  /* Do not lose casts to restrict qualified pointers.  */
 -  if ((TYPE_RESTRICT (outer_type)
 -!= TYPE_RESTRICT (inner_type))
 -TYPE_RESTRICT (outer_type))
 - return false;
 -
/* If the outer type is (void *), the conversion is not necessary.  */
if (VOID_TYPE_P (TREE_TYPE (outer_type)))
   return true;
 --- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj 2011-10-04 
 14:33:08.0 +0200
 +++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c2011-10-05 
 16:22:33.232433231 +0200
 @@ -22,5 +22,5 @@ bar (int *x, int y)
return p1[y];
  }
  
 -/* { dg-final { scan-tree-dump-times return 1; 2 optimized } } */
 +/* { dg-final { scan-tree-dump-times return 1; 2 optimized { xfail *-*-* 
 } } } */
  /* { dg-final { cleanup-tree-dump optimized } } */
 --- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj  2011-10-05 
 13:32:43.087670846 +0200
 +++ gcc/testsuite/gcc.c-torture/execute/pr49279.c 2011-10-05 
 13:32:43.087670846 +0200
 @@ -0,0 +1,35 @@
 +/* PR tree-optimization/49279 */
 +extern void abort (void);
 +
 +struct S { int a; int *__restrict p; };
 +
 +__attribute__((noinline, noclone))
 +struct S *bar (struct S *p)
 +{
 +  struct S *r;
 +  asm volatile ( : =r (r) : 0 (p) : memory);
 +  return r;
 +}
 +
 +__attribute__((noinline, noclone))
 +int
 +foo (int *p, int *q)
 +{
 +  struct S s, *t;
 +  s.a = 1;
 +  s.p = p;
 +  t = bar (s);
 +  t-p = q;
 +  s.p[0] = 0;
 +  t-p[0] = 1;
 +  return s.p[0];
 +}
 +
 +int
 +main ()
 +{
 +  int a, b;
 +  if (foo (a, b) != 1)
 +abort

Re: rfa: remove get_var_ann (was: Fix PR50260)

Hi,

On Sat, 3 Sep 2011, Richard Guenther wrote:

  OTOH it's a nice invariant that can actually be checked for (that all 
  reachable vars whatsoever have to be in referenced_vars), so I'm going 
  to do that.
 
 Yes, until we get rid of referenced_vars (which we still should do at 
 some point...) that's the best.

Okay, like so then.  Regstrapped on x86_64-linux.  (Note that sometimes I 
use add_referenced_vars, and sometimes find_referenced_vars_in, the latter 
when I would have to add several add_referenced_vars for one statement).

 IIRC we have some verification code even, and wonder why it doesn't 
 trigger.

Nope, we don't.  But with the patch we segfault in case this happens 
again, which is good enough checking for me.


Ciao,
Michael.

* tree-flow.h (get_var_ann): Don't declare.
* tree-flow-inline.h (get_var_ann): Remove.
(set_is_used): Use var_ann, not get_var_ann.
* tree-dfa.c (add_referenced_var): Inline body of get_var_ann.
* tree-profile.c (gimple_gen_edge_profiler): Call
find_referenced_var_in.
(gimple_gen_interval_profiler): Ditto.
(gimple_gen_pow2_profiler): Ditto.
(gimple_gen_one_value_profiler): Ditto.
(gimple_gen_average_profiler): Ditto.
(gimple_gen_ior_profiler): Ditto.
(gimple_gen_ic_profiler): Ditto plus call add_referenced_var.
(gimple_gen_ic_func_profiler): Call add_referenced_var.
* tree-mudflap.c (execute_mudflap_function_ops): Call
add_referenced_var.

Index: tree-flow.h
===
--- tree-flow.h (revision 178488)
+++ tree-flow.h (working copy)
@@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d
 typedef struct var_ann_d *var_ann_t;
 
 static inline var_ann_t var_ann (const_tree);
-static inline var_ann_t get_var_ann (tree);
 static inline void update_stmt (gimple);
 static inline int get_lineno (const_gimple);
 
Index: tree-flow-inline.h
===
--- tree-flow-inline.h  (revision 178488)
+++ tree-flow-inline.h  (working copy)
@@ -145,16 +145,6 @@ var_ann (const_tree t)
   return p ? *p : NULL;
 }
 
-/* Return the variable annotation for T, which must be a _DECL node.
-   Create the variable annotation if it doesn't exist.  */
-static inline var_ann_t
-get_var_ann (tree var)
-{
-  var_ann_t *p = DECL_VAR_ANN_PTR (var);
-  gcc_checking_assert (p);
-  return *p ? *p : create_var_ann (var);
-}
-
 /* Get the number of the next statement uid to be allocated.  */
 static inline unsigned int
 gimple_stmt_max_uid (struct function *fn)
@@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us
 static inline void
 set_is_used (tree var)
 {
-  var_ann_t ann = get_var_ann (var);
+  var_ann_t ann = var_ann (var);
   ann-used = true;
 }
 
Index: tree-dfa.c
===
--- tree-dfa.c  (revision 178488)
+++ tree-dfa.c  (working copy)
@@ -580,8 +580,9 @@ set_default_def (tree var, tree def)
 bool
 add_referenced_var (tree var)
 {
-  get_var_ann (var);
   gcc_assert (DECL_P (var));
+  if (!*DECL_VAR_ANN_PTR (var))
+create_var_ann (var);
 
   /* Insert VAR into the referenced_vars hash table if it isn't present.  */
   if (referenced_var_check_and_insert (var))
Index: tree-profile.c
===
--- tree-profile.c  (revision 178408)
+++ tree-profile.c  (working copy)
@@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed
   one = build_int_cst (gcov_type_node, 1);
   stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
   gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1));
+  find_referenced_vars_in (stmt1);
   stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var,
gimple_assign_lhs (stmt1), one);
   gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2));
@@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_
   val = prepare_instrumented_value (gsi, value);
   call = gimple_build_call (tree_interval_profiler_fn, 4,
ref_ptr, val, start, steps);
+  find_referenced_vars_in (call);
   gsi_insert_before (gsi, call, GSI_NEW_STMT);
 }
 
@@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu
  true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (gsi, value);
   call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val);
+  find_referenced_vars_in (call);
   gsi_insert_before (gsi, call, GSI_NEW_STMT);
 }
 
@@ -310,6 +313,7 @@ gimple_gen_one_value_profiler (histogram
  true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (gsi, value);
   call = gimple_build_call (tree_one_value_profiler_fn, 2, ref_ptr, val);
+  find_referenced_vars_in (call);
   gsi_insert_before

[PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Andrew Stubbs


This patch is a follow-up both to my patches here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00049.html

and Paul Brook's patch here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01076.html

The patch fixes both the original problem, in which negative shift 
constants caused an ICE (pr50193), and the problem introduced by Paul's 
patch, in which a*64+b is not properly optimized.


However, it does not attempt to fix Richard Sandiford's observation that 
there may be a latent problem with the 'M' constraint which could lead 
reload to cause a recog ICE.


I believe this patch to be nothing but an improvement over the current 
state, and that a fix to the constraint problem should be a separate patch.


In that basis, am I OK to commit?



Now, let me explain the other problem:

As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit 
a shift to be expressed as either a shift type and amount (register or 
constant), or as a multiply and power-of-two constant.


This is necessary because the canonical form of (plus (ashift x y) z) 
appears to be (plus (mult x 2^y) z), presumably for the benefit of 
multiply-and-accumulate optimizations. (Minus is similarly affected, but 
other shiftable operations are unaffected, and this only applies to left 
shifts, of course.)


The (possible) problem is that the meanings of the constants for mult 
and ashift are very different, but the arm.md file has these unified 
into a single pattern using a single 'M' constraint that must allow both 
types of constant unconditionally. This is safe for the vast majority of 
passes because they check recog before they make a change, and anyway 
don't make changes without understanding the logic. But, reload has a 
feature where it can pull a constant from a register, and convert it to 
an immediate, if the constraints allow, but crucially, it doesn't check 
the predicates; no doubt it shouldn't need to, but the ARM port appears 
to be breaking to rules.


Problem scenario 1:

  Consider pattern (plus (mult r1 r2) r3).

  It so happens that reload knows that r2 contains a constant, say 20,
  so reload checks to see if that could be converted to an immediate.
  Now, 20 is not a power of two, so recog would reject it, but it is in
  the range 0..31 so it does match the 'M' constraint. Oops!

Problem scenario 2:

  Consider pattern (ashiftrt r1 r2).

  Again, it so happens that reload knows that r2 contains a constant, in
  this case let's say 64, so again reload checks to see if that could
  be converted to an immediate. This time, 64 is not in the range
  0..31, so recog would reject it, but it is a power of two, so it does
  match the 'M' constraint. Again, oops!

I see two ways to fix this properly:

 1. Duplicate all the patterns in the machine description, once for the
mult case, and once for the other cases. This could probably be
done with a code iterator, if preferred.

 2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
always uses the (presumably cheaper) shift-and-add option. However,
this would require all other targets where madd really is the best
option to fix it up. (I'd imagine that two instructions for shift
and add would be cheaper speed wise, if properly scheduled, on most
targets? That doesn't help the size optimization though.)

However, it's not obvious to me that this needs fixing:
 * The failure mode would be an ICE, and we've not seen any.
 * There's a comment in arm.c:shift_op that suggests that this can't
   happen, somehow, at least in the mult case.
   - I'm not sure exactly how reload works, but it seems reasonable
 that it will never try to convert a register to an immediate
 because the pattern does not allow registers in the first place.
   - This logic doesn't hold in the opposite case though.

Have I explained all that clearly?

My conclusion after studying all this is that we don't need to do 
anything until somebody reports an ICE, at which point it becomes worth 
the effort of fixing it. Other opinions welcome! :)


Andrew
2011-10-06  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/predicates.md (shift_amount_operand): Remove constant
	range check.
	(shift_operator): Check range of constants for all shift operators.

	gcc/testsuite/
	* gcc.dg/pr50193-1.c: New file.
	* gcc.target/arm/shiftable.c: New file.

---
 src/gcc-mainline/gcc/config/arm/predicates.md  |   15 ++-
 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c  |   10 +
 .../gcc/testsuite/gcc.target/arm/shiftable.c   |   43 
 3 files changed, 65 insertions(+), 3 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/shiftable.c

diff --git a/src/gcc-mainline/gcc/config/arm/predicates.md b/src/gcc-mainline/gcc/config/arm/predicates.md
index 27ba603..7307fd5 100644
--- a/src/gcc-mainline/gcc/config/arm/predicates.md
+++

Re: rfa: remove get_var_ann (was: Fix PR50260)

On Thu, Oct 6, 2011 at 4:59 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Sat, 3 Sep 2011, Richard Guenther wrote:

  OTOH it's a nice invariant that can actually be checked for (that all
  reachable vars whatsoever have to be in referenced_vars), so I'm going
  to do that.

 Yes, until we get rid of referenced_vars (which we still should do at
 some point...) that's the best.

 Okay, like so then.  Regstrapped on x86_64-linux.  (Note that sometimes I
 use add_referenced_vars, and sometimes find_referenced_vars_in, the latter
 when I would have to add several add_referenced_vars for one statement).

 IIRC we have some verification code even, and wonder why it doesn't
 trigger.

 Nope, we don't.  But with the patch we segfault in case this happens
 again, which is good enough checking for me.

Ok.

Thanks,
Richard.


 Ciao,
 Michael.
 
        * tree-flow.h (get_var_ann): Don't declare.
        * tree-flow-inline.h (get_var_ann): Remove.
        (set_is_used): Use var_ann, not get_var_ann.
        * tree-dfa.c (add_referenced_var): Inline body of get_var_ann.
        * tree-profile.c (gimple_gen_edge_profiler): Call
        find_referenced_var_in.
        (gimple_gen_interval_profiler): Ditto.
        (gimple_gen_pow2_profiler): Ditto.
        (gimple_gen_one_value_profiler): Ditto.
        (gimple_gen_average_profiler): Ditto.
        (gimple_gen_ior_profiler): Ditto.
        (gimple_gen_ic_profiler): Ditto plus call add_referenced_var.
        (gimple_gen_ic_func_profiler): Call add_referenced_var.
        * tree-mudflap.c (execute_mudflap_function_ops): Call
        add_referenced_var.

 Index: tree-flow.h
 ===
 --- tree-flow.h (revision 178488)
 +++ tree-flow.h (working copy)
 @@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d
  typedef struct var_ann_d *var_ann_t;

  static inline var_ann_t var_ann (const_tree);
 -static inline var_ann_t get_var_ann (tree);
  static inline void update_stmt (gimple);
  static inline int get_lineno (const_gimple);

 Index: tree-flow-inline.h
 ===
 --- tree-flow-inline.h  (revision 178488)
 +++ tree-flow-inline.h  (working copy)
 @@ -145,16 +145,6 @@ var_ann (const_tree t)
   return p ? *p : NULL;
  }

 -/* Return the variable annotation for T, which must be a _DECL node.
 -   Create the variable annotation if it doesn't exist.  */
 -static inline var_ann_t
 -get_var_ann (tree var)
 -{
 -  var_ann_t *p = DECL_VAR_ANN_PTR (var);
 -  gcc_checking_assert (p);
 -  return *p ? *p : create_var_ann (var);
 -}
 -
  /* Get the number of the next statement uid to be allocated.  */
  static inline unsigned int
  gimple_stmt_max_uid (struct function *fn)
 @@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us
  static inline void
  set_is_used (tree var)
  {
 -  var_ann_t ann = get_var_ann (var);
 +  var_ann_t ann = var_ann (var);
   ann-used = true;
  }

 Index: tree-dfa.c
 ===
 --- tree-dfa.c  (revision 178488)
 +++ tree-dfa.c  (working copy)
 @@ -580,8 +580,9 @@ set_default_def (tree var, tree def)
  bool
  add_referenced_var (tree var)
  {
 -  get_var_ann (var);
   gcc_assert (DECL_P (var));
 +  if (!*DECL_VAR_ANN_PTR (var))
 +    create_var_ann (var);

   /* Insert VAR into the referenced_vars hash table if it isn't present.  */
   if (referenced_var_check_and_insert (var))
 Index: tree-profile.c
 ===
 --- tree-profile.c      (revision 178408)
 +++ tree-profile.c      (working copy)
 @@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed
   one = build_int_cst (gcov_type_node, 1);
   stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
   gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1));
 +  find_referenced_vars_in (stmt1);
   stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var,
                                        gimple_assign_lhs (stmt1), one);
   gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2));
 @@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_
   val = prepare_instrumented_value (gsi, value);
   call = gimple_build_call (tree_interval_profiler_fn, 4,
                            ref_ptr, val, start, steps);
 +  find_referenced_vars_in (call);
   gsi_insert_before (gsi, call, GSI_NEW_STMT);
  }

 @@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu
                                      true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (gsi, value);
   call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val);
 +  find_referenced_vars_in (call);
   gsi_insert_before (gsi, call, GSI_NEW_STMT);
  }

 @@ -310,6 +313,7 @@ gimple_gen_one_value_profiler (histogram
                                      true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

On Thu, 6 Oct 2011, Kai Tietz wrote:

 That's not the hole story.  The difference between TRUTH_(AND|OR)IF_EXPR 
 and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier 
 creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't.

Yes, of course.  That is what implements the short-circuit semantics.  But 
as Richard already mentioned I also don't understand why you do the 
reassociation at that point.  Why not simply rewrite ANDIF - AND (when 
possible, i.e. no sideeffects on arg1, and desirable, i.e. when 
LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other 
folders do reassociation?  I ask because your comments states to 
transform:

  ((W AND X) ANDIF Y) ANDIF Z
into
  (W AND X) ANDIF (Y AND Z)

(under condition that Y and Z are simple operands).

In fact you don't check the form of arg0,0, i.e. the W AND X here.  
Independend of that it doesn't make sense, because if Y and Z are easy 
(simple and no side-effects), then Y AND Z is too, and therefore you 
should transform this (if at all) into:

  (W AND X) AND (Y AND Z)

at which point this association doesn't make sense anymore, as 

  ((W AND X) AND Y) AND Z

is just as fine.  So, the reassociation looks fishy at best, better get 
rid of it?  (which of the testcases breaks without it?)


Ciao,
Michael.

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011/10/6 Michael Matz m...@suse.de:
 Hi,

 On Thu, 6 Oct 2011, Kai Tietz wrote:

 That's not the hole story.  The difference between TRUTH_(AND|OR)IF_EXPR
 and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier
 creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't.

 Yes, of course.  That is what implements the short-circuit semantics.  But
 as Richard already mentioned I also don't understand why you do the
 reassociation at that point.  Why not simply rewrite ANDIF - AND (when
 possible, i.e. no sideeffects on arg1, and desirable, i.e. when
 LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other
 folders do reassociation?  I ask because your comments states to
 transform:

  ((W AND X) ANDIF Y) ANDIF Z
 into
  (W AND X) ANDIF (Y AND Z)

 (under condition that Y and Z are simple operands).

 In fact you don't check the form of arg0,0, i.e. the W AND X here.
 Independend of that it doesn't make sense, because if Y and Z are easy
 (simple and no side-effects), then Y AND Z is too, and therefore you
 should transform this (if at all) into:

  (W AND X) AND (Y AND Z)

 at which point this association doesn't make sense anymore, as

Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and
therefore it isn't transformed into and AND.


  ((W AND X) AND Y) AND Z

 is just as fine.  So, the reassociation looks fishy at best, better get
 rid of it?  (which of the testcases breaks without it?)

None.  I had this implemented first.  But Richard was concerned about
making non-IF conditions too long.I understand that point that it
might not that good to always modify unconditionally to AND/OR chain.
For example

if (a1  a2  a3    a100) return 1;

would be packed by this patch into 50 branches.   If we would modify
all of them into AND, then we would calculate for all 100 values the
result, even if the first a1 is zero.  This doesn't improve speed
pretty well.

But you are right, that from the point of reassociation optimization
it could be in some cases more profitable to have packed all elements
into on AND-chain.

Regards,
Kai

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote:
 None.  I had this implemented first.  But Richard was concerned about
 making non-IF conditions too long.I understand that point that it
 might not that good to always modify unconditionally to AND/OR chain.
 For example
 
 if (a1  a2  a3    a100) return 1;
 
 would be packed by this patch into 50 branches.   If we would modify
 all of them into AND, then we would calculate for all 100 values the
 result, even if the first a1 is zero.  This doesn't improve speed
 pretty well.
 
 But you are right, that from the point of reassociation optimization
 it could be in some cases more profitable to have packed all elements
 into on AND-chain.

Yeah.  Perhaps we should break them up after reassoc2, or on the other side
teach reassoc (or some other pass) to be able to do the optimizations
on a series of GIMPLE_COND with no side-effects in between.
See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4;
isn't optimized into (a - 1U)  4U, although it could, if branch cost
cause it to be broken up into several GIMPLE_COND stmts.
Or if user writes:
  if (a == 3)
return 1;
  if (a == 1)
return 1;
  if (a == 2)
return 1;
  if (a == 4)
return 1;
  return 0;
(more probably using enums).

Jakub

Re: Initial shrink-wrapping patch

On 10/06/2011 06:37 AM, Bernd Schmidt wrote:
 On 10/06/11 01:47, Bernd Schmidt wrote:
 This appears to be because the split prologue contains a jump, which
 means the find_many_sub_blocks call reorders the block numbers, and our
 indices into bb_flags are off.
 
 Testing of the patch completed - ok? Regardless of split-stack it seems
 like a cleanup and eliminates a potential source of errors.

Yes, patch is ok.


r~

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

On Thu, 6 Oct 2011, Kai Tietz wrote:

  at which point this association doesn't make sense anymore, as
 
 Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and 
 therefore it isn't transformed into and AND.

Right ...

   ((W AND X) AND Y) AND Z
 
  is just as fine.  So, the reassociation looks fishy at best, better get
  rid of it?  (which of the testcases breaks without it?)
 
 None.  I had this implemented first.  But Richard was concerned about
 making non-IF conditions too long.I understand that point that it
 might not that good to always modify unconditionally to AND/OR chain.

... and I see that (that's why the transformation should be desirable for 
some definition of desirable, which probably includes and RHS not too 
long chain).  As it stands right now your transformation seems to be a 
fairly ad-hoc try at avoiding this problem.  That's why I wonder why to do 
the reassoc at all?  Which testcases break _without_ the reassociation, 
i.e. with only rewriting ANDIF - AND at the outermost level?


Ciao,
Michael.

[cxx-mem-model] Add lockfree tests

2011-10-06 Thread Andrew MacLeod

This patch supplies __sync_mem_is_lock_free (size) and 
__sync_mem_always_lock_free (size).


__sync_mem_always_lock_free requires a compile time constant, and 
returns true if an object of the specified size will *always* generate 
lock free instructions on the current architecture.  Otherwise false is 
returned.


__sync_mem_is_lock_free also returns true if instructions will always be 
lock free, but if the answer is not true, it resolves to an external 
call  named '__sync_mem_is_lock_free' which will be supplied 
externally.  Presumably by whatever library or application is providing 
the other external __sync_mem routines as documented in 
http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary


New tests, documentation are provided, bootstraps on 
x86_64-unknown-linux-gnu and causes no new testsuite regressions.


Andrew



* optabs.h (DOI_sync_mem_always_lock_free): New.
(DOI_sync_mem_is_lock_free): New.
(sync_mem_always_lock_free_optab, sync_mem_is_lock_free_optab): New.
* builtins.c (fold_builtin_sync_mem_always_lock_free): New.
(expand_builtin_sync_mem_always_lock_free): New.
(fold_builtin_sync_mem_is_lock_free): New.
(expand_builtin_sync_mem_is_lock_free): New.
(expand_builtin): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
(fold_builtin_1): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
* sync-builtins.def: Add BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
* builtin-types.def: Add BT_FN_BOOL_SIZE type.
* fortran/types.def: Add BT_SIZE and BT_FN_BOOL_SIZE.
* doc/extend.texi: Add documentation.
* testsuite/gcc.dg/sync-mem-invalid.c: Test for invalid param.
* testsuite/gcc.dg/sync-mem-lockfree[-aux].c: New tests.

Index: optabs.h
===
*** optabs.h(revision 178916)
--- optabs.h(working copy)
*** enum direct_optab_index
*** 708,713 
--- 708,715 
DOI_sync_mem_nand,
DOI_sync_mem_xor,
DOI_sync_mem_or,
+   DOI_sync_mem_always_lock_free,
+   DOI_sync_mem_is_lock_free,
DOI_sync_mem_thread_fence,
DOI_sync_mem_signal_fence,
  
*** typedef struct direct_optab_d *direct_op
*** 801,806 
--- 803,812 
(direct_optab_table[(int) DOI_sync_mem_xor])
  #define sync_mem_or_optab \
(direct_optab_table[(int) DOI_sync_mem_or])
+ #define sync_mem_always_lock_free_optab \
+   (direct_optab_table[(int) DOI_sync_mem_always_lock_free])
+ #define sync_mem_is_lock_free_optab \
+   (direct_optab_table[(int) DOI_sync_mem_is_lock_free])
  #define sync_mem_thread_fence_optab \
(direct_optab_table[(int) DOI_sync_mem_thread_fence])
  #define sync_mem_signal_fence_optab \
Index: builtins.c
===
*** builtins.c  (revision 179522)
--- builtins.c  (working copy)
*** expand_builtin_sync_mem_fetch_op (enum m
*** 5386,5391 
--- 5386,5472 
return expand_sync_mem_fetch_op (target, mem, val, code, model, 
fetch_after);
  }
  
+ /* Return true if size ARG is always lock free on this architecture.  */
+ static tree
+ fold_builtin_sync_mem_always_lock_free (tree arg)
+ {
+   int size;
+   enum machine_mode mode;
+   enum insn_code icode;
+ 
+   if (TREE_CODE (arg) != INTEGER_CST)
+ return NULL_TREE;
+ 
+   /* Check if a compare_and_swap pattern exists for the mode which represents
+  the required size.  The pattern is not allowed to fail, so the existence
+  of the pattern indicates support is present.  */
+   size = INTVAL (expand_normal (arg)) * BITS_PER_UNIT;
+   mode = mode_for_size (size, MODE_INT, 0);
+   icode = direct_optab_handler (sync_compare_and_swap_optab, mode);
+ 
+   if (icode == CODE_FOR_nothing)
+ return integer_zero_node;
+ 
+   return integer_one_node;
+ }
+ 
+ /* Return true if the first argument to call EXP represents a size of
+object than will always generate lock-free instructions on this target.
+Otherwise return false.  */
+ static rtx
+ expand_builtin_sync_mem_always_lock_free (tree exp)
+ {
+   tree size;
+   tree arg = CALL_EXPR_ARG (exp, 0);
+ 
+   if (TREE_CODE (arg) != INTEGER_CST)
+ {
+   error (non-constant argument to __sync_mem_always_lock_free);
+   return const0_rtx;
+ }
+ 
+   size = fold_builtin_sync_mem_always_lock_free (arg);
+   if (size == integer_one_node)
+ return const1_rtx;
+   return const0_rtx;
+ }
+ 
+ /* Return a one or zero if it can be determined that size ARG is lock free on
+this architecture.  */
+ static tree
+ fold_builtin_sync_mem_is_lock_free (tree arg)
+ {
+   tree always = fold_builtin_sync_mem_always_lock_free (arg);
+ 
+   /* If it isnt always lock free, don't generate a result.  */
+   if (always == integer_one_node)
+ return always;
+ 
+   return NULL_TREE;
+ }
+ 
+ /* Return one or zero if the first argument to call EXP represents a size of
+object than can generate lock-free instructions on

[testsuite] Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125)

2011-10-06 Thread Rainer Orth

After almost two months, two tests are still XPASSing everywhere:

XPASS: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12)
XPASS: gcc.dg/uninit-pr19430.c  (test for warnings, line 32)
XPASS: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41)

I think it's time to remove the xfail's.

Tested with the appropriate runtest invocation on i386-pc-solaris2.10,
ok for mainline?

Rainer


2011-10-06  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR middle-end/50125
* gcc.dg/uninit-B.c (baz): Remove xfail *-*-*.
* gcc.dg/uninit-pr19430.c (main): Remove xfail *-*-*.
(bar3): Likewise.

# HG changeset patch
# Parent 60c73f26147c2e549be69d750637ed45ca48e93c
Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125)

diff --git a/gcc/testsuite/gcc.dg/uninit-B.c b/gcc/testsuite/gcc.dg/uninit-B.c
--- a/gcc/testsuite/gcc.dg/uninit-B.c
+++ b/gcc/testsuite/gcc.dg/uninit-B.c
@@ -9,7 +9,7 @@ void
 baz (void)
 {
   int i;
-  if (i) /* { dg-warning is used uninitialized uninit i warning { xfail *-*-* } } */
+  if (i) /* { dg-warning is used uninitialized uninit i warning } */
 bar (i);
   foo (i);
 }
diff --git a/gcc/testsuite/gcc.dg/uninit-pr19430.c b/gcc/testsuite/gcc.dg/uninit-pr19430.c
--- a/gcc/testsuite/gcc.dg/uninit-pr19430.c
+++ b/gcc/testsuite/gcc.dg/uninit-pr19430.c
@@ -29,7 +29,7 @@ void frob(int *pi);
 int main(void)
 {
   int i; 
-  printf(i = %d\n, i); /* { dg-warning 'i' is used uninitialized in this function  { xfail *-*-* } } */
+  printf(i = %d\n, i); /* { dg-warning 'i' is used uninitialized in this function  } */
   frob(i);
 
   return 0;
@@ -38,6 +38,6 @@ int main(void)
 void foo3(int*);
 void bar3(void) { 
   int x; 
-  if(x) /* { dg-warning 'x' is used uninitialized in this function uninitialized { xfail *-*-* } } */
+  if(x) /* { dg-warning 'x' is used uninitialized in this function uninitialized } */
 foo3(x); 
 }


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Initial shrink-wrapping patch

2011-10-06 Thread Ian Lance Taylor

Bernd Schmidt ber...@codesourcery.com writes:

 On 10/06/11 05:17, Ian Lance Taylor wrote:
 Thinking about it I think this is the wrong approach.  The -fsplit-stack
 code by definition has to wrap the entire function and it can not modify
 any callee-saved registers.  We should do shrink wrapping before
 -fsplit-stack, not the other way around.

 Sorry, I'm not following what you're saying here. Can you elaborate?

Basically -fsplit-stack wraps the entire function in code that (on
x86_64) looks like

cmpq%fs:112, %rsp
jae .L2
movl$24, %r10d
movl$0, %r11d
call__morestack
ret
.L2:

There is absolutely no reason to try to shrink wrap that code.  It will
never help.  That code always has to be first.  It especially has to be
first because the gold linker recognizes the prologue specially when a
split-stack function calls a non-split-stack function, in order to
request a larger stack.

Therefore, it seems to me that we should apply shrink wrapping to the
function as it exists *before* the split-stack prologue is created.  The
flag_split_stack bit should be moved after the flag_shrink_wrap bit.

Ian

Re: Initial shrink-wrapping patch

On 10/06/11 17:57, Ian Lance Taylor wrote:
 There is absolutely no reason to try to shrink wrap that code.  It will
 never help.  That code always has to be first.  It especially has to be
 first because the gold linker recognizes the prologue specially when a
 split-stack function calls a non-split-stack function, in order to
 request a larger stack.

Urgh, ok.

 Therefore, it seems to me that we should apply shrink wrapping to the
 function as it exists *before* the split-stack prologue is created.  The
 flag_split_stack bit should be moved after the flag_shrink_wrap bit.

Sounds like we just need to always emit the split prologue on the
original entry edge then. Can you test the following with Go?


Bernd
* function.c (thread_prologue_and_epilogue_insns): Emit split
prologue on the orig_entry_edge. Don't account for it in
prologue_clobbered.

Index: gcc/function.c
===
--- gcc/function.c  (revision 179619)
+++ gcc/function.c  (working copy)
@@ -5602,10 +5602,6 @@ thread_prologue_and_epilogue_insns (void
  note_stores (PATTERN (p_insn), record_hard_reg_sets,
   prologue_clobbered);
}
-  for (p_insn = split_prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn))
-   if (NONDEBUG_INSN_P (p_insn))
- note_stores (PATTERN (p_insn), record_hard_reg_sets,
-  prologue_clobbered);
 
   bitmap_initialize (bb_antic_flags, bitmap_default_obstack);
   bitmap_initialize (bb_on_list, bitmap_default_obstack);
@@ -5758,7 +5754,7 @@ thread_prologue_and_epilogue_insns (void
 
   if (split_prologue_seq != NULL_RTX)
 {
-  insert_insn_on_edge (split_prologue_seq, entry_edge);
+  insert_insn_on_edge (split_prologue_seq, orig_entry_edge);
   inserted = true;
 }
   if (prologue_seq != NULL_RTX)

[PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

Hi!

Since Richard's changes recently to allow different modes in vcond
patterns (so far on i?86/x86_64 only I think) we can vectorize more
COND_EXPRs than before, and this patch improves it a tiny bit more
- even i?86/x86_64 support vconds only if the sizes of vector element
modes are the same.  With this patch we can optimize even if it is wider
or narrower, by vectorizing it as the COND_EXPR in integer mode matching
the size of the comparsion operands and then a cast.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-06  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/50596
* tree-vectorizer.h (vect_is_simple_cond): New prototype.
(NUM_PATTERNS): Change to 6.
* tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern): New
function.
(vect_vect_recog_func_ptrs): Add vect_recog_mixed_size_cond_pattern.
(vect_mark_pattern_stmts): Don't create stmt_vinfo for def_stmt
if it already has one, and don't set STMT_VINFO_VECTYPE in it
if it is already set.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Handle
COND_EXPR and VEC_COND_EXPR in pattern stmts.
(vect_is_simple_cond): No longer static.

* lib/target-supports.exp (check_effective_target_vect_cond_mixed):
New.
* gcc.dg/vect/vect-cond-8.c: New test.

--- gcc/tree-vectorizer.h.jj2011-09-26 14:06:52.0 +0200
+++ gcc/tree-vectorizer.h   2011-10-06 10:04:03.0 +0200
@@ -1,5 +1,5 @@
 /* Vectorizer
-   Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
Free Software Foundation, Inc.
Contributed by Dorit Naishlos do...@il.ibm.com
 
@@ -818,6 +818,7 @@ extern bool vect_transform_stmt (gimple,
  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool vect_is_simple_cond (tree, loop_vec_info, tree *);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
 tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
@@ -902,7 +903,7 @@ extern void vect_slp_transform_bb (basic
Additional pattern recognition functions can (and will) be added
in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 5
+#define NUM_PATTERNS 6
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
--- gcc/tree-vect-patterns.c.jj 2011-10-06 09:14:17.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 14:37:12.0 +0200
@@ -49,12 +49,15 @@ static gimple vect_recog_dot_prod_patter
 static gimple vect_recog_pow_pattern (VEC (gimple, heap) **, tree *, tree *);
 static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **, tree *,
  tree *);
+static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **,
+ tree *, tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
vect_recog_dot_prod_pattern,
vect_recog_pow_pattern,
-vect_recog_over_widening_pattern};
+   vect_recog_over_widening_pattern,
+   vect_recog_mixed_size_cond_pattern};
 
 
 /* Function widened_name_p
@@ -1218,6 +1214,120 @@ vect_recog_over_widening_pattern (VEC (g
 }
 
 
+/* Function vect_recog_mixed_size_cond_pattern
+
+   Try to find the following pattern:
+
+ type x_t, y_t;
+ TYPE a_T, b_T, c_T;
+   loop:
+ S1  a_T = x_t CMP y_t ? b_T : c_T;
+
+   where type 'TYPE' is an integral type which has different size
+   from 'type'.  b_T and c_T are constants and if 'TYPE' is wider
+   than 'type', the constants need to fit into an integer type
+   with the same width as 'type'.
+
+   Input:
+
+   * LAST_STMT: A stmt from which the pattern search begins.
+
+   Output:
+
+   * TYPE_IN: The type of the input arguments to the pattern.
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.
+   Additionally a def_stmt is added.
+
+   a_it = x_t CMP y_t ? b_it : c_it;
+   a_T = (TYPE) a_it;  */
+
+static gimple
+vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **stmts, tree *type_in,
+   tree *type_out)
+{
+  gimple last_stmt = VEC_index (gimple, *stmts, 0);
+  tree cond_expr, then_clause, else_clause;
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt), def_stmt_info;
+  tree type, vectype, comp_vectype, itype, vecitype;
+  enum machine_mode cmpmode;
+  gimple pattern_stmt, def_stmt;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+
+  if (!is_gimple_assign

[PATCH] Minor readability improvement in vect_pattern_recog{,_1}

Hi!

tree-vectorizer.h already has typedefs for the recog functions,
and using that typedef we can make these two functions slightly more
readable.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-06  Jakub Jelinek  ja...@redhat.com

* tree-vect-patterns.c (vect_pattern_recog_1): Use
vect_recog_func_ptr typedef for the first argument.
(vect_pattern_recog): Rename vect_recog_func_ptr variable
to vect_recog_func, use vect_recog_func_ptr typedef for it.

--- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 15:50:12.0 +0200
@@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm
for vect_recog_pattern.  */
 
 static void
-vect_pattern_recog_1 (
-   gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
-   gimple_stmt_iterator si,
-   VEC (gimple, heap) **stmts_to_replace)
+vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
+ gimple_stmt_iterator si,
+ VEC (gimple, heap) **stmts_to_replace)
 {
   gimple stmt = gsi_stmt (si), pattern_stmt;
   stmt_vec_info stmt_info;
@@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v
   unsigned int nbbs = loop-num_nodes;
   gimple_stmt_iterator si;
   unsigned int i, j;
-  gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
+  vect_recog_func_ptr vect_recog_func;
   VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);
 
   if (vect_print_dump_info (REPORT_DETAILS))
@@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v
   /* Scan over all generic vect_recog_xxx_pattern functions.  */
   for (j = 0; j  NUM_PATTERNS; j++)
 {
-  vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
- vect_pattern_recog_1 (vect_recog_func_ptr, si,
+ vect_recog_func = vect_vect_recog_func_ptrs[j];
+ vect_pattern_recog_1 (vect_recog_func, si,
stmts_to_replace);
 }
 }

Jakub

[PATCH] vshuffle: Use correct mode for mask operand.

---
 gcc/ChangeLog |5 +
 gcc/optabs.c  |   16 +++-
 2 files changed, 12 insertions(+), 9 deletions(-)

* optabs.c (expand_vec_shuffle_expr): Use the proper mode for the
mask operand.  Tidy the code.

This patch is required before I rearrange the testsuite to actually
test floating-point shuffle.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index 3a52fb0..aa233d5 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6650,9 +6650,8 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, 
tree mask, rtx target)
   struct expand_operand ops[4];
   enum insn_code icode;
   enum machine_mode mode = TYPE_MODE (type);
-  rtx rtx_v0, rtx_mask;
 
-  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+  gcc_checking_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
 
   if (TREE_CODE (mask) == VECTOR_CST)
 {
@@ -6675,24 +6674,23 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, 
tree mask, rtx target)
   return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
 }
 
-vshuffle:
+ vshuffle:
   icode = direct_optab_handler (vshuffle_optab, mode);
 
   if (icode == CODE_FOR_nothing)
 return 0;
 
-  rtx_mask = expand_normal (mask);
-
   create_output_operand (ops[0], target, mode);
-  create_input_operand (ops[3], rtx_mask, mode);
+  create_input_operand (ops[3], expand_normal (mask),
+   TYPE_MODE (TREE_TYPE (mask)));
 
   if (operand_equal_p (v0, v1, 0))
 {
-  rtx_v0 = expand_normal (v0);
-  if (!insn_operand_matches(icode, 1, rtx_v0))
+  rtx rtx_v0 = expand_normal (v0);
+  if (!insn_operand_matches (icode, 1, rtx_v0))
 rtx_v0 = force_reg (mode, rtx_v0);
 
-  gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
+  gcc_checking_assert (insn_operand_matches (icode, 2, rtx_v0));
 
   create_fixed_operand (ops[1], rtx_v0);
   create_fixed_operand (ops[2], rtx_v0);
-- 
1.7.6.4

[PATCH] Rework vector shuffle tests.

Test vector sizes 8, 16, and 32.  Test most data types for each size.

This should also solve the problem that Georg reported for AVR.
Indeed, I hope that except for the DImode/DFmode tests, these
actually execute on that target.


r~


Cc: Georg-Johann Lay a...@gjlay.de
---
 gcc/testsuite/ChangeLog|   29 ++
 .../gcc.c-torture/execute/vect-shuffle-1.c |   68 -
 .../gcc.c-torture/execute/vect-shuffle-2.c |   68 -
 .../gcc.c-torture/execute/vect-shuffle-3.c |   58 ---
 .../gcc.c-torture/execute/vect-shuffle-4.c |   51 --
 .../gcc.c-torture/execute/vect-shuffle-5.c |   64 
 .../gcc.c-torture/execute/vect-shuffle-6.c |   64 
 .../gcc.c-torture/execute/vect-shuffle-7.c |   70 --
 .../gcc.c-torture/execute/vect-shuffle-8.c |   55 ---
 gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc   |   81 
 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc|   38 
 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc|   39 
 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc|  101 
 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc |   26 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c  |5 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c   |   21 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c   |   18 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c   |   19 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c   |   19 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c   |   25 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c   |   22 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c   |   23 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c   |   23 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c   |   30 ++
 27 files changed, 564 insertions(+), 498 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c

+   * gcc.c-torture/execute/vect-shuffle-1.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-2.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-3.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-4.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-5.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-6.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-7.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-8.c: Remove.
+   * gcc.c-torture/execute/vshuf-16.inc: New file.
+   * gcc.c-torture/execute/vshuf-2.inc: New file.
+   * gcc.c-torture/execute/vshuf-4.inc: New file.
+   * gcc.c-torture/execute/vshuf-8.inc: New file.
+   * gcc.c-torture/execute/vshuf-main.inc: New file.
+   * gcc.c-torture/execute/vshuf-v16qi.c: New test.
+   * gcc.c-torture/execute/vshuf-v2df.c: New test.
+   * gcc.c-torture/execute/vshuf-v2di.c: New test.
+   * gcc.c-torture/execute/vshuf-v2sf.c: New test.
+   *

Re: [PATCH] Fix PR46556 (poor address generation)

On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote:

snip

 
 Doh, I thought you were matching gimple stmts that do the address
 computation.  But now I see you are matching the tree returned from
 get_inner_reference.  So no need to check anything for that case.
 
 But that keeps me wondering what you'll do if the accesses were
 all pointer arithmetic, not arrays.  Thus,
 
 extern void foo (int, int, int);
 
 void
 f (int *p, unsigned int n)
 {
  foo (p[n], p[n+64], p[n+128]);
 }
 
 wouldn't that have the same issue and you wouldn't handle it?
 
 Richard.
 

Good point.  This indeed gets missed here, and that's more fuel for
doing a generalized strength reduction along with the special cases like
p-a[n] that are only exposed with get_inner_reference.

(The pointer arithmetic cases were picked up in my earlier big-hammer
approach using the aff-comb machinery, but that had too many problems in
the end, as you know.)

So for the long term I will look into a full strength reducer for
non-loop code.  For the short term, what do you think about keeping this
single transformation in reassoc to make sure it gets into 4.7?  I would
plan to strip it back out and fold it into the strength reducer
thereafter, which might or might not make 4.7 depending on my other
responsibilities and how the 4.7 schedule goes.  I haven't seen anything
official, but I'm guessing we're getting towards the end of 4.7 stage 1?

Re: [PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Paul Brook

 I believe this patch to be nothing but an improvement over the current
 state, and that a fix to the constraint problem should be a separate patch.
 
 In that basis, am I OK to commit?

One minor nit:

 (define_special_predicate shift_operator
...
+  (ior (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
+   ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  
32)
+  (match_test REG_P (XEXP (op, 1))
 
We're already enforcing the REG_P elsewhere, and it's only valid in some 
contexts, so I'd change this to:
(match_test GET_CODE (XEXP (op, 1)) != CONST_INT
|| ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)


 Now, let me explain the other problem:
 
 As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit
 a shift to be expressed as either a shift type and amount (register or
 constant), or as a multiply and power-of-two constant.

Added complication is that only ARM mode accepts a register.
 
 Problem scenario 1:
 
Consider pattern (plus (mult r1 r2) r3).
 
It so happens that reload knows that r2 contains a constant, say 20,
so reload checks to see if that could be converted to an immediate.
Now, 20 is not a power of two, so recog would reject it, but it is in
the range 0..31 so it does match the 'M' constraint. Oops!

Though as you mention below we the predicate don't allow the second operand to 
be a register, so this can never happen.  Reload may do unexpected things, but 
if it starts randomly changing valid const_int values then we have much bigger 
problems.
 
 Problem scenario 2:
 
Consider pattern (ashiftrt r1 r2).
 
Again, it so happens that reload knows that r2 contains a constant, in
this case let's say 64, so again reload checks to see if that could
be converted to an immediate. This time, 64 is not in the range
0..31, so recog would reject it, but it is a power of two, so it does
match the 'M' constraint. Again, oops!
 
 I see two ways to fix this properly:
 
   1. Duplicate all the patterns in the machine description, once for the
  mult case, and once for the other cases. This could probably be
  done with a code iterator, if preferred.
 
   2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
  always uses the (presumably cheaper) shift-and-add option. However,
  this would require all other targets where madd really is the best
  option to fix it up. (I'd imagine that two instructions for shift
  and add would be cheaper speed wise, if properly scheduled, on most
  targets? That doesn't help the size optimization though.)

3. Consistently accept both power-of-two and 0..31 for shifts.  Large shift 
counts give undefined results[1], so replace them with an arbitrary value
(e.g. 0) during assembly output.  Argualy not an entirely proper fix, but I 
think it'll keep everything happy.
 
 However, it's not obvious to me that this needs fixing:
   * The failure mode would be an ICE, and we've not seen any.

Then again noone noticed the negative-shift ICE until recently :-/

   * There's a comment in arm.c:shift_op that suggests that this can't
 happen, somehow, at least in the mult case.
 - I'm not sure exactly how reload works, but it seems reasonable
   that it will never try to convert a register to an immediate
   because the pattern does not allow registers in the first place.
 - This logic doesn't hold in the opposite case though.
 Have I explained all that clearly?

I think you've convered most of it.

For bonus points we should probably disallow MULT in the arm_shiftsi3 pattern, 
stop it interacting with the regulat mulsi3 pattern in undesirable ways.

Paul

[1] Or at least not any result gcc will be expecting.

Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

On 6 October 2011 18:17, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 Since Richard's changes recently to allow different modes in vcond
 patterns (so far on i?86/x86_64 only I think) we can vectorize more
 COND_EXPRs than before, and this patch improves it a tiny bit more
 - even i?86/x86_64 support vconds only if the sizes of vector element
 modes are the same.  With this patch we can optimize even if it is wider
 or narrower, by vectorizing it as the COND_EXPR in integer mode matching
 the size of the comparsion operands and then a cast.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK, but...

 --- gcc/tree-vect-stmts.c.jj    2011-09-29 14:25:46.0 +0200
 +++ gcc/tree-vect-stmts.c       2011-10-06 12:16:43.0 +0200
 @@ -652,9 +652,26 @@ vect_mark_stmts_to_be_vectorized (loop_v
              have to scan the RHS or function arguments instead.  */
           if (is_gimple_assign (stmt))
             {
 -              for (i = 1; i  gimple_num_ops (stmt); i++)
 +             enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 +             tree op = gimple_assign_rhs1 (stmt);
 +
 +             i = 1;
 +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)

I don't understand why we need VEC_COND_EXPR here.

 +                  COMPARISON_CLASS_P (op))
 +               {
 +                 if (!process_use (stmt, TREE_OPERAND (op, 0), loop_vinfo,
 +                                   live_p, relevant, worklist)
 +                     || !process_use (stmt, TREE_OPERAND (op, 1), loop_vinfo,
 +                                      live_p, relevant, worklist))
 +                   {
 +                     VEC_free (gimple, heap, worklist);
 +                     return false;
 +                   }
 +                 i = 2;
 +               }
 +             for (; i  gimple_num_ops (stmt); i++)
                 {
 -                  tree op = gimple_op (stmt, i);
 +                 op = gimple_op (stmt, i);
                   if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
                                     worklist))
                     {


Thanks,
Ira

Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}

On 6 October 2011 18:19, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 tree-vectorizer.h already has typedefs for the recog functions,
 and using that typedef we can make these two functions slightly more
 readable.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Ira


 2011-10-06  Jakub Jelinek  ja...@redhat.com

        * tree-vect-patterns.c (vect_pattern_recog_1): Use
        vect_recog_func_ptr typedef for the first argument.
        (vect_pattern_recog): Rename vect_recog_func_ptr variable
        to vect_recog_func, use vect_recog_func_ptr typedef for it.

 --- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200
 +++ gcc/tree-vect-patterns.c    2011-10-06 15:50:12.0 +0200
 @@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm
    for vect_recog_pattern.  */

  static void
 -vect_pattern_recog_1 (
 -       gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
 -       gimple_stmt_iterator si,
 -       VEC (gimple, heap) **stmts_to_replace)
 +vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
 +                     gimple_stmt_iterator si,
 +                     VEC (gimple, heap) **stmts_to_replace)
  {
   gimple stmt = gsi_stmt (si), pattern_stmt;
   stmt_vec_info stmt_info;
 @@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v
   unsigned int nbbs = loop-num_nodes;
   gimple_stmt_iterator si;
   unsigned int i, j;
 -  gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
 +  vect_recog_func_ptr vect_recog_func;
   VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);

   if (vect_print_dump_info (REPORT_DETAILS))
 @@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v
           /* Scan over all generic vect_recog_xxx_pattern functions.  */
           for (j = 0; j  NUM_PATTERNS; j++)
             {
 -              vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
 -             vect_pattern_recog_1 (vect_recog_func_ptr, si,
 +             vect_recog_func = vect_vect_recog_func_ptrs[j];
 +             vect_pattern_recog_1 (vect_recog_func, si,
                                    stmts_to_replace);
             }
         }

        Jakub

Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote:
  +             i = 1;
  +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)
 
 I don't understand why we need VEC_COND_EXPR here.

Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like
COND_EXPR.  I can leave that out if you want.

Jakub

Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}

On 10/06/2011 09:19 AM, Jakub Jelinek wrote:
   * tree-vect-patterns.c (vect_pattern_recog_1): Use
   vect_recog_func_ptr typedef for the first argument.
   (vect_pattern_recog): Rename vect_recog_func_ptr variable
   to vect_recog_func, use vect_recog_func_ptr typedef for it.

Ok.


r~

Re: Initial shrink-wrapping patch

On 10/06/2011 09:01 AM, Bernd Schmidt wrote:
 On 10/06/11 17:57, Ian Lance Taylor wrote:
 There is absolutely no reason to try to shrink wrap that code.  It will
 never help.  That code always has to be first.  It especially has to be
 first because the gold linker recognizes the prologue specially when a
 split-stack function calls a non-split-stack function, in order to
 request a larger stack.
 
 Urgh, ok.
 
 Therefore, it seems to me that we should apply shrink wrapping to the
 function as it exists *before* the split-stack prologue is created.  The
 flag_split_stack bit should be moved after the flag_shrink_wrap bit.
 
 Sounds like we just need to always emit the split prologue on the
 original entry edge then. Can you test the following with Go?

Looks reasonable.

I wonder if we can have this as a generic feature?  I'm thinking about
things like the MIPS and Alpha load-gp stuff.  That operation also needs
to happen exactly at the start of the function, due to the pc-relative
nature of the operation.

I do see that MIPS works around this by emitting the load-gp as text
in the legacy prologue.  But Alpha makes some effort to emit this as
rtl, so that the scheduler knows about the two pipeline reservations
and the latency of any use of the gp register.

Would a pre_prologue named pattern seem wrong to anyone?


r~

Re: [PATCH] Fix PR38885

2011-10-06 Thread H.J. Lu

On Wed, Oct 5, 2011 at 6:48 AM, Richard Guenther rguent...@suse.de wrote:

 I'm testing a pair of patches to fix PR38885 (for constants)
 and PR38884 (for non-constants) stores to complex/vector memory
 and CSE of component accesses from SCCVN.

 This is the piece that handles stores from constants and partial
 reads of it.  We can conveniently re-use fold-const native
 encode/interpret code for this.

 Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

 Richard.

 2011-10-05  Richard Guenther  rguent...@suse.de

        PR tree-optimization/38885
        * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads
        from constants.

        * gcc.dg/tree-ssa/ssa-fre-33.c: New testcase.


This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50634


-- 
H.J.

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/11 09:37, Jakub Jelinek wrote:
 On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote:
 None.  I had this implemented first.  But Richard was concerned
 about making non-IF conditions too long.I understand that
 point that it might not that good to always modify
 unconditionally to AND/OR chain. For example
 
 if (a1  a2  a3    a100) return 1;
 
 would be packed by this patch into 50 branches.   If we would
 modify all of them into AND, then we would calculate for all 100
 values the result, even if the first a1 is zero.  This doesn't
 improve speed pretty well.
 
 But you are right, that from the point of reassociation
 optimization it could be in some cases more profitable to have
 packed all elements into on AND-chain.
 
 Yeah.  Perhaps we should break them up after reassoc2, or on the
 other side teach reassoc (or some other pass) to be able to do the
 optimizations on a series of GIMPLE_COND with no side-effects in
 between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a
 == 4; isn't optimized into (a - 1U)  4U, although it could, if
 branch cost cause it to be broken up into several GIMPLE_COND
 stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return
 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more
 probably using enums).
I haven't followed this thread as closely as perhaps I should; what
I'm seeing discussed now looks a lot like condition merging and I'm
pretty sure there's some research in this area that might guide us.
multi-variable condition merging is the term the authors used.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjeYFAAoJEBRtltQi2kC7eFMIALjFM/GIg1DDryU59EoFQe5A
x7pvx3FSlcjLWeyIlzYJvWF4wGybRNNXp5qziIedO6qp0Z/06VvCU07A10VoWSig
/EFufo5l+Jef5s1d0mA6mBJ9A52HDL2ipOK8YDQbVzJWqHdaXLrrzUni3wGwcUVs
v3UIi5OevjRhJ55fRVxBcReJKF6YAzxFDxqOnVGAbf9R3BEJ2T9JW2CLhIcd/T1L
D9K+6YymHaN9eYh7B7gPKG88q+5JjcStHuMQODKSAegt3T4iP9CH/G5dV8u95Y+q
6mxo8gOGAwYR7N/U6fuXRaGJEzWSdrqRy2EBF5B7+Rt6lSWXdfzUEBusT24i67A=
=HIrU
-END PGP SIGNATURE-

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011/10/6 Michael Matz m...@suse.de:
 Hi,

 On Thu, 6 Oct 2011, Kai Tietz wrote:

  at which point this association doesn't make sense anymore, as

 Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and
 therefore it isn't transformed into and AND.

 Right ...

   ((W AND X) AND Y) AND Z
 
  is just as fine.  So, the reassociation looks fishy at best, better get
  rid of it?  (which of the testcases breaks without it?)

 None.  I had this implemented first.  But Richard was concerned about
 making non-IF conditions too long.    I understand that point that it
 might not that good to always modify unconditionally to AND/OR chain.

 ... and I see that (that's why the transformation should be desirable for
 some definition of desirable, which probably includes and RHS not too
 long chain).  As it stands right now your transformation seems to be a
 fairly ad-hoc try at avoiding this problem.  That's why I wonder why to do
 the reassoc at all?  Which testcases break _without_ the reassociation,
 i.e. with only rewriting ANDIF - AND at the outermost level?

I don't do here reassociation in inner.  See that patch calls
build2_loc, and not fold_build2_loc anymore.  So it doesn't retries to
associate in inner anymore (which might be something of interest for
the issue Jakub mentioned).

There is no test actual failing AFAICS.  I just noticed
size-differences by this.  Nevertheless it might be better to enhance
reassociation pass to break-up (and repropagate) GIMPLE_CONDs with
non-side-effect, as Jakub suggested.

The other chance might be here to allow deeper chains then two
elements within one AND/OR element, but this would be architecture
dependent.  For x86 -as example- used instruction cycles for a common
for branching would suggest that it might be profitable to have here 3
or 4 leafs within one AND|OR chain.  But for sure on other
architectures the amount of leafs varies.

Regards,
Kai

Re: [PATCH] Fix PR46556 (poor address generation)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/11 04:13, Richard Guenther wrote:

 
 People have already commented on pieces, so I'm looking only at the
 tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
 instead?  The idea is to expose additional CSE opportunities,
 right?  So it's sort-of a strength-reduction optimization on scalar
 code (classically strength reduction in loops transforms for (i) {
 z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
 general, even for non-address cases. So - if you rename that thing
 to tree-ssa-strength-reduce.c you can get away without
 piggy-backing on anything ;)  If you structure it to detect a
 strength reduction opportunity (thus, you'd need to match
 two/multiple of the patterns at the same time) that would be a
 bonus ... generalizing it a little bit would be another.
There's a variety of literature that uses PRE to detect and optimize
straightline code strength reduction.  I poked at it at one time (RTL
gcse framework) and it looked reasonably promising.  Never pushed it
all the way through.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjebJAAoJEBRtltQi2kC71ogH/AkMNzXpYK1GXp2EhoS+3Dhn
T1mWDKdHT5+ozpuAxRFzuCSQ8HmkbLJk8fGpOyUuLr15zEnT1isE7cU3i4ZzY3o0
lduo9Ck23rMWNroYgxbV+zPvArW5MG9qrGO6XSBynfipmlpznEo8zQPiaoaASlHz
8G7gd9P2la1QHha9OVtiCMKs0zgckU55RqiwV7d8DMi5tgoq5wkN+qcKCoSI7+b0
jxAukIcp6O8QZ6ADcHyAdav+zZzGDBycEhgakam71WifjFlysah2TG05SsK75Dxi
h3S13yPpx/A8zBuex5osL0qOGn0H7L93uAsTxcv4dTEpUl4Jx7Y5FoPOEp5D1Z4=
=LcZy
-END PGP SIGNATURE-

Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

On 6 October 2011 19:28, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote:
  +             i = 1;
  +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)

 I don't understand why we need VEC_COND_EXPR here.

 Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like
 COND_EXPR.  I can leave that out if you want.

But we mark stmts that we want to vectorize here. I think that
expecting a vector stmt is confusing. So yes, please, leave it out.

Thanks,
Ira


        Jakub

Re: [PATCH] Add support for lzd and popc instructions on sparc.

On 10/05/2011 11:41 PM, David Miller wrote:
 +(define_expand popcountmode2
 +  [(set (match_operand:SIDI 0 register_operand )
 +(popcount:SIDI (match_operand:SIDI 1 register_operand )))]
 +  TARGET_POPC
 +{
 +  if (! TARGET_ARCH64)
 +{
 +  emit_insn (gen_popcountmode_v8plus (operands[0], operands[1]));
 +  DONE;
 +}
 +})
 +
 +(define_insn *popcountmode_sp64
 +  [(set (match_operand:SIDI 0 register_operand =r)
 +(popcount:SIDI (match_operand:SIDI 1 register_operand r)))]
 +  TARGET_POPC  TARGET_ARCH64
 +  popc\t%1, %0)

You've said that POPC only operates on the full 64-bit register,
but I see no zero-extend of the SImode input?  Similarly for 
the clzsi patterns.

If it weren't for the v8plus ugliness, it would be sufficient to
only expose the DImode patterns, and let optabs.c do the work to
extend from SImode...


r~

[Patch 0/5] ARM 64 bit sync atomic operations [V3]


Hi,
  This is V3 of a series of 5 patches relating to ARM atomic operations;
they incorporate most of the feedback from V2.  Note the patch numbering/
ordering is different from v2; the two simple patches are now first.

  1) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
 produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
 as per the docs).
  2) Fix pr48126 which is a misplaced barrier in the atomic generation
  3) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
 and above.
  4) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
 asssist is called (as per 32bit and smaller ops)
  5) Add test cases and support for those test cases, for the operations
 added in (3) and (4).

This code has been tested built on x86-64 cross to ARM run in ARM and Thumb
(C, C++, Fortran).

It is against git rev 68a79dfc.

Relative to v2:
  Test cases split out
  Code sharing between the test cases
  More coding style cleanup
  A handful of NULL-NULL_RTX changes

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave

[Patch 1/5] ARM 64 bit sync atomic operations [V3]

   gcc/
   * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 993e3a0..f6f1da7 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k  ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6  ! TARGET_HAVE_DMB \
+ ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)

[Patch 2/5] ARM 64 bit sync atomic operations [V3]

Micahel K. Edwards points out in PR/48126 that the sync is in the wrong 
place
relative to the branch target of the compare, since the load could float
up beyond the ldrex.
  
PR target/48126

  * config/arm/arm.c (arm_output_sync_loop): Move label before barrier

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5161439..6e7105a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
}
 }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+ a barrier to stop subsequent loads floating upwards past the ldrex
+ pr/48126.  */
   arm_output_asm_insn (emit, 1, operands, %sLSYB%%=:, LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx

[Patch 3/5] ARM 64 bit sync atomic operations [V3]

Add support for ARM 64bit sync intrinsics.

gcc/
* arm.c (arm_output_ldrex): Support ldrexd.
  (arm_output_strex): Support strexd.
  (arm_output_it): New helper to output it in Thumb2 mode only.
  (arm_output_sync_loop): Support DI mode,
 Change comment to not support const_int.
  (arm_expand_sync): Support DI mode.

* arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.

* iterators.md (NARROW): move from sync.md.
  (QHSD): New iterator for all current ARM integer modes.
  (SIDI): New iterator for SI and DI modes only.

* sync.md  (sync_predtab): New mode_attr
  (sync_compare_and_swapsi): Fold into sync_compare_and_swapmode
  (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsimode
  (sync_sync_optabsi): Fold into sync_sync_optabmode
  (sync_nandsi): Fold into sync_nandmode
  (sync_new_sync_optabsi): Fold into sync_new_sync_optabmode
  (sync_new_nandsi): Fold into sync_new_nandmode
  (sync_old_sync_optabsi): Fold into sync_old_sync_optabmode
  (sync_old_nandsi): Fold into sync_old_nandmode
  (sync_compare_and_swapmode): Support SI  DI
  (sync_lock_test_and_setmode): Likewise
  (sync_sync_optabmode): Likewise
  (sync_nandmode): Likewise
  (sync_new_sync_optabmode): Likewise
  (sync_new_nandmode): Likewise
  (sync_old_sync_optabmode): Likewise
  (sync_old_nandmode): Likewise
  (arm_sync_compare_and_swapsi): Turn into iterator on SI  DI
  (arm_sync_lock_test_and_setsi): Likewise
  (arm_sync_new_sync_optabsi): Likewise
  (arm_sync_new_nandsi): Likewise
  (arm_sync_old_sync_optabsi): Likewise
  (arm_sync_old_nandsi): Likewise
  (arm_sync_compare_and_swapmode NARROW): use sync_predtab, fix indent
  (arm_sync_lock_test_and_setsimode NARROW): Likewise
  (arm_sync_new_sync_optabmode NARROW): Likewise
  (arm_sync_new_nandmode NARROW): Likewise
  (arm_sync_old_sync_optabmode NARROW): Likewise
  (arm_sync_old_nandmode NARROW): Likewise

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e7105a..51c0f3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24039,12 +24039,26 @@ arm_output_ldrex (emit_f emit,
  rtx target,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, ldrex%s\t%%0, %%C1, suffix);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[1] = memory;
+  arm_output_asm_insn (emit, 0, operands, ldrex%s\t%%0, %%C1, suffix);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (target)  1) == 0);
+  operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, ldrexd\t%%0, %%1, %%C2);
+}
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -24057,14 +24071,41 @@ arm_output_strex (emit_f emit,
  rtx value,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, strex%s%s\t%%0, %%1, %%C2, suffix,
-  cc);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, strex%s%s\t%%0, %%1, %%C2,
+ suffix, cc);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (value)  1) == 0 || TARGET_THUMB2);
+  operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+  operands[3] = memory;
+  arm_output_asm_insn (emit, 0, operands, strexd%s\t%%0, %%1, %%2, %%C3,
+  cc);
+}
+}
+
+/* Helper to emit an it instruction in Thumb2 mode only; although the assembler
+   will ignore it in ARM mode, emitting it will mess up instruction counts we
+   sometimes keep 'flags' are the extra t's and e's if it's more than one
+   instruction that is conditional.  */
+static void

[Patch 4/5] ARM 64 bit sync atomic operations [V3]

Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
versions but with check for sufficiently new kernel version.

gcc/
* config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
* config/arm/linux-atomic.c: Change comment to point to 64bit version
  (SYNC_LOCK_RELEASE): Instantiate 64bit version.
* config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c


diff --git a/gcc/config/arm/linux-atomic-64bit.c 
b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 000..6966e66
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,166 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilb...@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+http://www.gnu.org/licenses/.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write (int fd, const void *buf, unsigned int count);
+extern void abort (void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+   const long long* newval,
+   long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
+
+/* Kernel helper page version number.  */
+#define __kernel_helper_version (*(unsigned int *)0x0ffc)
+
+/* Check that the kernel has a new enough version at load.  */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version  5)
+{
+  const char err[] = A newer kernel is required to run this binary. 
+   (__kernel_cmpxchg64 helper)\n;
+  /* At this point we need a way to crash with some information
+for the user - I'm not sure I can rely on much else being
+available at this point, so do the same as generic-morestack.c
+write () and abort ().  */
+  __write (2 /* stderr.  */, err, sizeof (err));
+  abort ();
+}
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void)
+   __attribute__ ((used, section (.init_array))) = {
+  __check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility (hidden)))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\
+  {\
+int failure;   \
+long long tmp,tmp2;\
+   \
+do {   \
+  tmp = *ptr;  \
+  tmp2 = PFX_OP (tmp INF_OP val);  \
+  failure = __kernel_cmpxchg64 (tmp, tmp2, ptr); \
+} while (failure != 0);\
+   \
+return tmp;\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,, |)
+FETCH_AND_OP_WORD64 (and,   , )
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, )
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync_op_and_fetch and __sync_fetch_and_op for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN

[Patch 5/5] ARM 64 bit sync atomic operations [V3]

   Test support for ARM 64bit sync intrinsics.

  gcc/testsuite/
* gcc.dg/di-longlong64-sync-1.c: New test.
* gcc.dg/di-sync-multithread.c: New test.
* gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
* gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
* lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
tests for v5, v6, v6k, and v7-a, and add-options helpers.
  (check_effective_target_arm_arm_ok): New helper.
  (check_effective_target_sync_longlong): New helper.

diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c 
b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 000..82a4ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,164 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options -std=gnu99 } */
+/* { dg-message note: '__sync_fetch_and_nand' changed semantics in GCC 4.4 
 { target *-*-* } 0 } */
+/* { dg-message note: '__sync_nand_and_fetch' changed semantics in GCC 4.4 
 { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+
+/* Note: This file is #included by some of the ARM tests.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done.  */
+static long long AL[24];
+/* Values copied into AL before we start.  */
+static long long init_di[24] = { 0x10002ll, 0x20003ll, 0, 1,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll};
+/* This is what should be in AL at the end.  */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll };
+
+/* First check they work in terms of what they do to memory.  */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll);
+  __sync_lock_test_and_set (AL+2, 1);
+  __sync_lock_release (AL+3);
+
+  /* The following tests should not change the value since the
+ original does NOT match.  */
+  __sync_val_compare_and_swap (AL+4, 0x2ll, 0x1234567890ll);
+  __sync_val_compare_and_swap (AL+5, 0x1ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+6, 0x2ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+7, 0x1ll, 0x1234567890ll);
+
+  __sync_fetch_and_add (AL+8, 1);
+  __sync_fetch_and_add (AL+9, 0xb000e000ll); /* + to both halves  carry.  
*/
+  __sync_fetch_and_sub (AL+10, 22);
+  __sync_fetch_and_sub (AL+11, 0xb000e000ll);
+
+  __sync_fetch_and_and (AL+12, 0x30007ll);
+  __sync_fetch_and_or (AL+13, 0x50009ll);
+  __sync_fetch_and_xor (AL+14, 0xe0001ll);
+  __sync_fetch_and_nand (AL+15, 0xa0007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+ return value.  */
+  __sync_add_and_fetch (AL+16, 1);
+  /* add to both halves  carry.  */
+  __sync_add_and_fetch (AL+17, 0xb000e000ll);
+  __sync_sub_and_fetch (AL+18, 22);
+  __sync_sub_and_fetch (AL+19, 0xb000e000ll);
+
+  __sync_and_and_fetch (AL+20, 0x30007ll);
+  __sync_or_and_fetch (AL+21, 0x50009ll);
+  __sync_xor_and_fetch (AL+22, 0xe0001ll);
+  __sync_nand_and_fetch (AL+23, 0xa0007ll);
+}
+
+/* Now check return values.  */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll) !=
+   0x10002ll) abort ();
+  if

Re: [PATCH] Fix PR46556 (poor address generation)

On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 10/06/11 04:13, Richard Guenther wrote:
 
  
  People have already commented on pieces, so I'm looking only at the
  tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
  instead?  The idea is to expose additional CSE opportunities,
  right?  So it's sort-of a strength-reduction optimization on scalar
  code (classically strength reduction in loops transforms for (i) {
  z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
  general, even for non-address cases. So - if you rename that thing
  to tree-ssa-strength-reduce.c you can get away without
  piggy-backing on anything ;)  If you structure it to detect a
  strength reduction opportunity (thus, you'd need to match
  two/multiple of the patterns at the same time) that would be a
  bonus ... generalizing it a little bit would be another.
 There's a variety of literature that uses PRE to detect and optimize
 straightline code strength reduction.  I poked at it at one time (RTL
 gcse framework) and it looked reasonably promising.  Never pushed it
 all the way through.
 
 jeff

I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this
question has come up before.  The former also suggested a PRE-based
approach.

Re: Initial shrink-wrapping patch

HJ found some more maybe_record_trace_start failures. In one case I
debugged, we have

(insn/f 31 0 32 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
(clobber (mem:BLK (scratch) [0 A8]))
]) -1
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8])))
(nil)))

The insn pattern is later changed by csa to adjust by 24, and the note
is left untouched; that seems to be triggering the problem.

Richard, is there a reason to use REG_CFA_ADJUST_CFA rather than
REG_CFA_DEF_CFA? If no, I'll just try to fix i386.c not to emit the former.


Bernd

Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay


Richard Henderson schrieb:

On 10/06/2011 04:46 AM, Georg-Johann Lay wrote:


So here it is.  Lightly tested on my target: All tests either PASS or are
UNSUPPORTED now.

Ok?


Not ok, but only because I've completely restructured the tests again.
Patch coming very shortly...


Thanks, I hope your patch fixed the issues addressed in my patch :-)

Johann



r~

Re: [PATCH] Fix PR46556 (poor address generation)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/11 12:02, William J. Schmidt wrote:
 On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 On 10/06/11 04:13, Richard Guenther wrote:
 
 
 People have already commented on pieces, so I'm looking only at
 the tree-ssa-reassoc.c pieces (did you consider piggy-backing
 on IVOPTs instead?  The idea is to expose additional CSE
 opportunities, right?  So it's sort-of a strength-reduction
 optimization on scalar code (classically strength reduction in
 loops transforms for (i) { z = i*x; } to z = 0; for (i) { z +=
 x }). That might be worth in general, even for non-address
 cases. So - if you rename that thing to
 tree-ssa-strength-reduce.c you can get away without 
 piggy-backing on anything ;)  If you structure it to detect a 
 strength reduction opportunity (thus, you'd need to match 
 two/multiple of the patterns at the same time) that would be a 
 bonus ... generalizing it a little bit would be another.
 There's a variety of literature that uses PRE to detect and
 optimize straightline code strength reduction.  I poked at it at
 one time (RTL gcse framework) and it looked reasonably promising.
 Never pushed it all the way through.
 
 jeff
 
 I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this 
 question has come up before.  The former also suggested a
 PRE-based approach.
Yea.  We've kicked it around several times over the last 15 or so
years.

When I briefly looked at it, I was doing so more in the context of
eliminating all the optimize_related_values crap, purely as a cleanup
and utlimately couldn't justify the time.

IIRC Morgan  Muchnick both had write-ups on the basic concepts.
There's probably other literature as well.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOje4wAAoJEBRtltQi2kC7q9UIAIXdiGG5Reu75PBkMPO9KKhn
RRGKQMMNSinGDyyORGqxfqwrirtFuQqzn+ITRsfjegHydUTwwDDaAtTyqEqFmpt0
2phGYBOS5pN4VImKzrd2fxlwbuW0jUlOpDuWFK+K10W8jU3SlJSIZhfaSMh1PwC5
IQm6FLDiuRuNSgyYattUnI5KZ2chN2QEkfUBQgDvbxHXfPDqjNQymIfv1K9iymrG
j3Wq7i47fBkYbnPYtAQ9GCYsmT6Wo2v+2/ZeFWE417FYYhgCdBeYu2iZPE6Nm8Pb
SypPDyi1AQ3QRfg+LPiN1bdQk40MhfPlMhHZtnVtq8nEa9+fLTgO/ERzCD0G+r8=
=XWLf
-END PGP SIGNATURE-

Re: Initial shrink-wrapping patch

On 10/06/2011 11:03 AM, Bernd Schmidt wrote:
 HJ found some more maybe_record_trace_start failures. In one case I
 debugged, we have
 
 (insn/f 31 0 32 (parallel [
 (set (reg/f:DI 7 sp)
 (plus:DI (reg/f:DI 7 sp)
 (const_int 8 [0x8])))
 (clobber (reg:CC 17 flags))
 (clobber (mem:BLK (scratch) [0 A8]))
 ]) -1
  (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp)
 (plus:DI (reg/f:DI 7 sp)
 (const_int 8 [0x8])))
 (nil)))
 
 The insn pattern is later changed by csa to adjust by 24, and the note
 is left untouched; that seems to be triggering the problem.

Hmm.  That seems a bit odd, considering this function probably does not
use alloca (no frame pointer), and uses accumulated outgoing arguments
(x86_64 never uses no-accumulate-outgoing-args, afaik).

 Richard, is there a reason to use REG_CFA_ADJUST_CFA rather than
 REG_CFA_DEF_CFA? If no, I'll just try to fix i386.c not to emit the former.

Not that I can think of.  But if that change makes any difference at all,
that's almost certainly another bug.

What PR are you looking at here?


r~

Re: Initial shrink-wrapping patch

On 10/06/11 20:13, Richard Henderson wrote:
 What PR are you looking at here?

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50632

Testcase is gcc.dg/pr50132.c.


Bernd

Re: [PATCH] Add support for lzd and popc instructions on sparc.

2011-10-06 Thread David Miller

From: Richard Henderson r...@redhat.com
Date: Thu, 06 Oct 2011 10:47:28 -0700

 You've said that POPC only operates on the full 64-bit register,
 but I see no zero-extend of the SImode input?  Similarly for 
 the clzsi patterns.

Thanks for catching this.

I guess if I emit the zero-extend, the compiler will eliminate it if
possible.

This is another reason why I want to do v8plus differently.  The
compiler would take care to optimize away zero and sign extensions
instead of how we use that sparc_check_64 () thing now.

 If it weren't for the v8plus ugliness, it would be sufficient to
 only expose the DImode patterns, and let optabs.c do the work to
 extend from SImode...

Understood.

Re: Initial shrink-wrapping patch

2011-10-06 Thread H.J. Lu

On Tue, Oct 4, 2011 at 3:10 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 On 09/30/11 18:51, Richard Henderson wrote:

 Please do leave out RETURN_ADDR_REGNUM for now.  If you remember why,
 then you could bring it back alongside the patch for the ARM backend.

 Changed.

 As for the i386 backend changes, not an objection per se, but I'm
 trying to understand why we need so many copies of patterns.

 Also changed.

 I don't see anything glaringly wrong in the middle end.  Although
 the thread_prologue_and_epilogue_insns function is now gigantic.
 If there were an easy way to break that up and reduce the amount
 of conditional compilation at the same time... that'd be great,
 but not a requirement.

 I don't think there's an easy way; and it's almost certain to break
 stuff again, so I'd rather avoid doing it at the same time as this patch
 if possible.

 I can see one possible way of tackling it; have an analysis phase that
 fills up a few basic_block VECs (those which need sibcall returns, those
 which need plain returns, those which need simple returns) and computes
 other information, such as the edges on which prologue and epilogue are
 to be inserted, and then a worker phase (probably split across several
 functions) which does all the fiddling.

 Richard S. suggested:
 ...how about adding a bit to crtl to say whether shrink-wrap occured,
 and check that instead of flag_shrink_wrap?

 Good idea, also changed.

 New version below.  Bootstrapped and tested i686-linux.



It also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50633

Don't you need to update ix86_expand_prologue?


-- 
H.J.

Re: Initial shrink-wrapping patch

On 10/06/11 20:27, H.J. Lu wrote:
 It also caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50633
 
 Don't you need to update ix86_expand_prologue?

In theory it should just work. It seems the x32 stuff has entertaining
properties :-( Haven't quite figured out how to build it yet, but:

-subq$136, %rsp
-.cfi_def_cfa_offset 144
 movl$0, %eax
 movl%esp, %ecx
 addl$60, %ecx
@@ -16,6 +14,8 @@ main:
 movl%eax, (%edx)
 cmpl$16, %eax
 jne.L2
+subq$136, %rsp
+.cfi_def_cfa_offset 144

So, this looks like we have both $esp and $rsp - i.e. not using
stack_pointer_rtx in all cases? Is there a way to avoid this?

BTW, one other thing that occurred to me - what about drap_reg? Does
that need to be added to the set of registers whose use requires a prologue?


Bernd

Re: Modify gcc for use with gdb (issue5132047)

2011-10-06 Thread Mike Stump

On Oct 6, 2011, at 1:58 AM, Richard Guenther wrote:
 On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo dnovi...@google.com wrote:
 What's the other advantage of using inline functions?  The gdb
 annoyance with the macros can be solved with the .gdbinit macro
 defines (which might be nice to commit to SVN btw).

http://old.nabble.com/-incremental--Patch%3A-FYI%3A-add-accessor-macros-to-gdbinit-td17370385.html

And yet, this still isn't in gcc.  :-(  I wonder how much programmer 
productivity we've lost due to it.

Re: Modify gcc for use with gdb (issue5132047)