Re: [5/8] Add narrow_bit_field_mem

2012-10-31 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 This patch splits out a fairly common operation: that of narrowing a MEM
 to a particular mode and adjusting the bit number accordingly.
 
 I've kept with bit_field rather than bitfield for consistency with
 the callers, although we do have bitfield in adjust_bitfield_address.

 My bad.  Feel free to rename it.

TBH, I prefer bitfield, so I'll leave it be :-)

 gcc/
  * expmed.c (narrow_bit_field_mem): New function.
  (store_bit_field_using_insv, store_bit_field_1, store_fixed_bit_field)
  (extract_bit_field_1): Use it.

 This is a good cleanup but...

 Index: gcc/expmed.c
 ===
 --- gcc/expmed.c 2012-10-30 19:25:44.797368678 +
 +++ gcc/expmed.c 2012-10-30 19:25:47.730368671 +
 @@ -387,6 +387,23 @@ mode_for_extraction (enum extraction_pat
return data-operand[opno].mode;
  }
 
 +/* Adjust bitfield memory MEM so that it points to the first unit of
 +   mode MODE that contains the bitfield at bit position BITNUM.
 +   Set NEW_BITNUM to the bit position of the field within the
 +   new memory.  */
 +
 +static rtx
 +narrow_bit_field_mem (rtx mem, enum machine_mode mode,
 +  unsigned HOST_WIDE_INT bitnum,
 +  unsigned HOST_WIDE_INT new_bitnum)
 +{
 +  unsigned int unit = GET_MODE_BITSIZE (mode);
 +  unsigned HOST_WIDE_INT offset = bitnum / unit * GET_MODE_SIZE (mode);
 +  mem = adjust_bitfield_address (mem, mode, offset);
 +  new_bitnum = bitnum % unit;
 +  return mem;
 +}

 Ugh. :-)  I cannot see any advantages in using references here except for...

 +/* Get a reference to the first byte of the field.  */
 +xop0 = narrow_bit_field_mem (xop0, byte_mode, bitnum, bitnum);

 +op0 = narrow_bit_field_mem (op0, byte_mode, bitnum, bitnum);

 ... mightily confusing the reader here.

Would it be OK with a pointer, but keeping the interface the same?
That's certainly fine by me.

That's one of the things I'm not sure about after the C++ conversion:
I've noticed some references creep in, but when should we use references
and when pointers?  I think Richard B made a comment about using references
for things that can't be null.

Richard


[ping] Fix unwind/debug info on x86-64/Windows

2012-10-31 Thread Eric Botcazou
Original message at:
  http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00013.html

Thanks in advance.

-- 
Eric Botcazou


Re: [5/8] Add narrow_bit_field_mem

2012-10-31 Thread Eric Botcazou
 Would it be OK with a pointer, but keeping the interface the same?
 That's certainly fine by me.

Yes, a pointer would make things more legible here.

 That's one of the things I'm not sure about after the C++ conversion:
 I've noticed some references creep in, but when should we use references
 and when pointers?  I think Richard B made a comment about using references
 for things that can't be null.

I'm personally dubious about using references, especially in the middle-end of 
a file where all other functions use pointers; consistency should come first.

-- 
Eric Botcazou


Re: [PATCH] Inter-bb range test optimization (PRs tree-optimization/19105, tree-optimization/21643, tree-optimization/46309)

2012-10-31 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 03:56:08PM +0100, Richard Biener wrote:
 Ok, but the code could really really have some more comments - functions
 not fitting in my 80x24 terminal without seeing any comment what happens
 here are a maintainance nightmare.

Here is the updated patch I'm about to commit:

2012-10-31  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/19105
PR tree-optimization/21643
PR tree-optimization/46309
* tree-ssa-reassoc.c (init_range_entry): Add STMT argument
and use it if EXP is NULL.
(update_range_test): Handle OPCODE equal to ERROR_MARK
and oe-op NULL.
(optimize_range_tests): Likewise.
(final_range_test_p, suitable_cond_bb, no_side_effect_bb, get_ops,
maybe_optimize_range_tests): New functions.
(reassociate_bb): Call maybe_optimize_range_tests if last
stmt of bb is GIMPLE_COND that hasn't been visited yet.

* gcc.dg/pr19105.c: New test.
* gcc.dg/pr21643.c: New test.
* gcc.dg/pr46309-2.c: New test.
* gcc.c-torture/execute/pr46309.c: New test.

--- gcc/tree-ssa-reassoc.c.jj   2012-10-30 18:45:07.581366165 +0100
+++ gcc/tree-ssa-reassoc.c  2012-10-31 00:13:17.173319482 +0100
@@ -1,5 +1,5 @@
 /* Reassociation for trees.
-   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011
+   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011, 2012
Free Software Foundation, Inc.
Contributed by Daniel Berlin d...@dberlin.org
 
@@ -1713,10 +1713,12 @@ struct range_entry
 };
 
 /* This is similar to make_range in fold-const.c, but on top of
-   GIMPLE instead of trees.  */
+   GIMPLE instead of trees.  If EXP is non-NULL, it should be
+   an SSA_NAME and STMT argument is ignored, otherwise STMT
+   argument should be a GIMPLE_COND.  */
 
 static void
-init_range_entry (struct range_entry *r, tree exp)
+init_range_entry (struct range_entry *r, tree exp, gimple stmt)
 {
   int in_p;
   tree low, high;
@@ -1727,7 +1729,8 @@ init_range_entry (struct range_entry *r,
   r-strict_overflow_p = false;
   r-low = NULL_TREE;
   r-high = NULL_TREE;
-  if (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+  if (exp != NULL_TREE
+   (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp
 return;
 
   /* Start with simply saying EXP != 0 and then look at the code of EXP
@@ -1735,12 +1738,14 @@ init_range_entry (struct range_entry *r,
  happen, but it doesn't seem worth worrying about this.  We continue
  the outer loop when we've changed something; otherwise we break
  the switch, which will break the while.  */
-  low = build_int_cst (TREE_TYPE (exp), 0);
+  low = exp ? build_int_cst (TREE_TYPE (exp), 0) : boolean_false_node;
   high = low;
   in_p = 0;
   strict_overflow_p = false;
   is_bool = false;
-  if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
+  if (exp == NULL_TREE)
+is_bool = true;
+  else if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
 {
   if (TYPE_UNSIGNED (TREE_TYPE (exp)))
is_bool = true;
@@ -1752,25 +1757,35 @@ init_range_entry (struct range_entry *r,
 
   while (1)
 {
-  gimple stmt;
   enum tree_code code;
   tree arg0, arg1, exp_type;
   tree nexp;
   location_t loc;
 
-  if (TREE_CODE (exp) != SSA_NAME)
-   break;
+  if (exp != NULL_TREE)
+   {
+ if (TREE_CODE (exp) != SSA_NAME)
+   break;
 
-  stmt = SSA_NAME_DEF_STMT (exp);
-  if (!is_gimple_assign (stmt))
-   break;
+ stmt = SSA_NAME_DEF_STMT (exp);
+ if (!is_gimple_assign (stmt))
+   break;
+
+ code = gimple_assign_rhs_code (stmt);
+ arg0 = gimple_assign_rhs1 (stmt);
+ arg1 = gimple_assign_rhs2 (stmt);
+ exp_type = TREE_TYPE (exp);
+   }
+  else
+   {
+ code = gimple_cond_code (stmt);
+ arg0 = gimple_cond_lhs (stmt);
+ arg1 = gimple_cond_rhs (stmt);
+ exp_type = boolean_type_node;
+   }
 
-  code = gimple_assign_rhs_code (stmt);
-  arg0 = gimple_assign_rhs1 (stmt);
   if (TREE_CODE (arg0) != SSA_NAME)
break;
-  arg1 = gimple_assign_rhs2 (stmt);
-  exp_type = TREE_TYPE (exp);
   loc = gimple_location (stmt);
   switch (code)
{
@@ -1916,7 +1931,11 @@ range_entry_cmp (const void *a, const vo
[EXP, IN_P, LOW, HIGH, STRICT_OVERFLOW_P] is a merged range for
RANGE and OTHERRANGE through OTHERRANGE + COUNT - 1 ranges,
OPCODE and OPS are arguments of optimize_range_tests.  Return
-   true if the range merge has been successful.  */
+   true if the range merge has been successful.
+   If OPCODE is ERROR_MARK, this is called from within
+   maybe_optimize_range_tests and is performing inter-bb range optimization.
+   Changes should be then performed right away, and whether an op is
+   BIT_AND_EXPR or BIT_IOR_EXPR is found in oe-rank.  */
 
 static bool
 update_range_test (struct range_entry *range, struct range_entry *otherrange,
@@ 

Re: Adapt one fold-const optimization for vectors

2012-10-31 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote:
 On Tue, 30 Oct 2012, Marek Polacek wrote:
 
 On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:
 Hello,
 
 one more optimization that needed help for vectors, it crashed on
 (xy)0. Because of PR 55001, testcases are awkward to add (I could
 do a x86-only one if needed).
 
 bootstrap+testsuite.
 
 2012-10-30  Marc Glisse  marc.gli...@inria.fr
 
 * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
 (fold_binary_loc): call it for VEC_COND_EXPR.

Capital C instead of lowercase.

 
 Patch missing?
 
 Indeed, thanks for the notice.

This regressed
FAIL: g++.dg/other/vector-compare.C (internal compiler error)
FAIL: g++.dg/other/vector-compare.C (test for excess errors)
on i686-linux.  The problem is that tree-vect-generic.c doesn't expand
BLKmode VEC_COND_EXPR piecewise.  So, either fold-const on this optimization
needs to test whether expand_vec_cond_expr_p is true for the chosen
pair of comparison type and operand type, or tree-vect-generic.c needs to be
tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not
true for the particular VEC_COND_EXPR.

Jakub


RE: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Bin Cheng


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
 Behalf Of Jakub Jelinek
 Sent: Tuesday, October 30, 2012 1:57 AM
 To: g...@gcc.gnu.org
 Cc: gcc-patches@gcc.gnu.org
 Subject: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
 
 Status
 ==
 
 I'd like to close the stage 1 phase of GCC 4.8 development on Monday,
November
 5th.  If you have still patches for new features you'd like to see in GCC
4.8,
 please post them for review soon.  Patches posted before the freeze, but
 reviewed shortly after the freeze, may still go in, further changes should
be
 just bugfixes and documentation fixes.
 
 
 Quality Data
 
 
 Priority  #   Change from Last Report
 ---   ---
 P1   23   + 23
 P2   77   +  8
 P3   85   + 84
 ---   ---
 Total   185   +115
 
 
 Previous Report
 ===
 
 http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html
 
 The next report will be sent by me again, announcing end of stage 1.

Hi,
I am working on register pressure directed hoist pass and have committed the
main patch in trunk. Here I still have two patches in this area improving
it. I will send these two patches recently and hope it can be included in
4.8 if OK.

Thanks.





Re: [C++] Omit overflow check for new char[n]

2012-10-31 Thread Jakub Jelinek
On Mon, Oct 08, 2012 at 05:50:34PM +0200, Florian Weimer wrote:
 gcc/:
 
 2012-10-08  Florian Weimer  fwei...@redhat.com
 
   * init.c (build_new_1): Do not check for arithmetic overflow if
   inner array size is 1.
 
 gcc/testsuite/:
 
 2012-10-08  Florian Weimer  fwei...@redhat.com
 
   * g++.dg/init/new40.C: New.

 @@ -2450,7 +2450,13 @@
 if (array_p  TYPE_VEC_NEW_USES_COOKIE (elt_type))
   size = size_binop (PLUS_EXPR, size, cookie_size);
 else
 - cookie_size = NULL_TREE;
 + {
 +   cookie_size = NULL_TREE;
 +   /* No size arithmetic necessary, so the size check is
 +  not needed. */
 +   if (outer_nelts_check != NULL  inner_size == double_int_one)
 + outer_nelts_check = NULL_TREE;
 + }

I don't see any == double_int_one (or zero) comparisons in grep,
so I'd say inner_size.is_one () should be used instead (which is used
pretty frequently).  Ditto in the second spot.
Otherwise the patch looks good to me, but I'd like Jason to chime in too.

 /* Perform the overflow check.  */
 if (outer_nelts_check != NULL_TREE)
  size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
 @@ -2486,7 +2492,13 @@
 /* Use a global operator new.  */
 /* See if a cookie might be required.  */
 if (!(array_p  TYPE_VEC_NEW_USES_COOKIE (elt_type)))
 - cookie_size = NULL_TREE;
 + {
 +   cookie_size = NULL_TREE;
 +   /* No size arithmetic necessary, so the size check is
 +  not needed. */
 +   if (outer_nelts_check != NULL  inner_size == double_int_one)
 + outer_nelts_check = NULL_TREE;
 + }
  
 alloc_call = build_operator_new_call (fnname, placement,
   size, cookie_size,

Jakub


Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4

2012-10-31 Thread Richard Biener
On Tue, 30 Oct 2012, Jan Hubicka wrote:

 Hi,
 this patch implements the second part of planned change - to determine loop 
 bounds
 based by shortest path discovery.  This allows to bound number of iterations
 on loops with bounds in statements that do not dominate the latch.
 
 I originally planned to implement this as part of maybe_lower_iteration_bound,
 but I found doing it separately is more readable.  While both performs walk of
 the loop body, both do that for different reasons.
 
 discover_iteration_bound_by_body_walk walks from header to latch, while
 maybe_lower_iteration_bound walks from header to first statement with side
 effects looking if there is exit on the way.
 
 Both walks can be skipped for many loops, but each with different reasons.
 
 Bootstrapped/regtested x86_64-linux, OK?
 
   * tree-ssa-loop-niter.c (double_int_cmp, bound_index,
   discover_iteration_bound_by_body_walk): New functions.
   (discover_iteration_bound_by_body_walk): Use it.
 
   * gcc.dg/tree-ssa/loop-38.c: New testcase.
 
 Index: tree-ssa-loop-niter.c
 ===
 --- tree-ssa-loop-niter.c (revision 192989)
 +++ tree-ssa-loop-niter.c (working copy)
 @@ -2955,6 +2955,234 @@ gcov_type_to_double_int (gcov_type val)
return ret;
  }
  
 +/* Compare double ints, callback for qsort.  */
 +
 +int
 +double_int_cmp (const void *p1, const void *p2)
 +{
 +  const double_int *d1 = (const double_int *)p1;
 +  const double_int *d2 = (const double_int *)p2;
 +  if (*d1 == *d2)
 +return 0;
 +  if (d1-ult (*d2))
 +return -1;
 +  return 1;
 +}
 +
 +/* Return index of BOUND in BOUNDS array sorted in increasing order.
 +   Lookup by binary search.  */
 +
 +int
 +bound_index (VEC (double_int, heap) *bounds, double_int bound)
 +{
 +  unsigned int end = VEC_length (double_int, bounds);
 +  unsigned int begin = 0;
 +
 +  /* Find a matching index by means of a binary search.  */
 +  while (begin != end)
 +{
 +  unsigned int middle = (begin + end) / 2;
 +  double_int index = VEC_index (double_int, bounds, middle);
 +
 +  if (index == bound)
 + return middle;
 +  else if (index.ult (bound))
 + begin = middle + 1;
 +  else
 + end = middle;
 +}
 +  gcc_unreachable ();
 +}
 +
 +/* Used to hold vector of queues of basic blocks bellow.  */
 +typedef VEC (basic_block, heap) *bb_queue;
 +DEF_VEC_P(bb_queue);
 +DEF_VEC_ALLOC_P(bb_queue,heap);
 +
 +/* We recorded loop bounds only for statements dominating loop latch (and 
 thus
 +   executed each loop iteration).  If there are any bounds on statements not
 +   dominating the loop latch we can improve the estimate by walking the loop
 +   body and seeing if every path from loop header to loop latch contains
 +   some bounded statement.  */
 +
 +static void
 +discover_iteration_bound_by_body_walk (struct loop *loop)
 +{
 +  pointer_map_t *bb_bounds;
 +  struct nb_iter_bound *elt;
 +  VEC (double_int, heap) *bounds = NULL;
 +  VEC (bb_queue, heap) *queues = NULL;
 +  bb_queue queue = NULL;
 +  ptrdiff_t queue_index;
 +  ptrdiff_t latch_index = 0;
 +  pointer_map_t *visited;
 +
 +  /* Discover what bounds may interest us.  */
 +  for (elt = loop-bounds; elt; elt = elt-next)
 +{
 +  double_int bound = elt-bound;
 +
 +  /* Exit terminates loop at given iteration, while non-exits produce 
 undefined
 +  effect on the next iteration.  */
 +  if (!elt-is_exit)
 + {
 +   bound += double_int_one;
 +   /* Overflow, give up on this bound.  */
 +   if (bound == double_int_zero)
 + continue;
 + }
 +
 +  if (!loop-any_upper_bound
 +   || bound.ult (loop-nb_iterations_upper_bound))
 +VEC_safe_push (double_int, heap, bounds, bound);
 +}
 +
 +  /* Exit early if there is nothing to do.  */
 +  if (!bounds)
 +return;
 +
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 +fprintf (dump_file,  Trying to walk loop body to reduce the bound.\n);
 +
 +  /* Sort the bounds in decreasing order.  */
 +  qsort (VEC_address (double_int, bounds), VEC_length (double_int, bounds),
 +  sizeof (double_int), double_int_cmp);
 +
 +  /* For every basic block record the lowest bound that is guaranteed to
 + terminate the loop.  */
 +
 +  bb_bounds = pointer_map_create ();
 +  for (elt = loop-bounds; elt; elt = elt-next)
 +{
 +  double_int bound = elt-bound;
 +  if (!elt-is_exit)
 + {
 +   bound += double_int_one;
 +   /* Overflow, give up on this bound.  */
 +   if (bound == double_int_zero)
 + continue;
 + }
 +
 +  if (!loop-any_upper_bound
 +   || bound.ult (loop-nb_iterations_upper_bound))
 + {
 +   ptrdiff_t index = bound_index (bounds, bound);
 +   void **entry = pointer_map_contains (bb_bounds,
 +gimple_bb (elt-stmt));
 +   if (!entry)
 + *pointer_map_insert (bb_bounds,
 +  gimple_bb 

Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-31 Thread Richard Biener
On Tue, Oct 30, 2012 at 9:23 PM, Peter Bergner berg...@vnet.ibm.com wrote:
 On Tue, 2012-10-30 at 20:37 +0100, Jakub Jelinek wrote:
 On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote:
  Ok, then I'll bootstrap and regtest your suggested change while we
  wait for richi to comment.  I'm fine with whatever you and richi
  decide is best.  The ObjC guys should probably test it though too.
 
  I assume you think we should change the current trunk code as well?

 Well, I haven't looked at the ObjC failures (guess they are darwin only
 anyway), so have no idea whether those set section name or not.
 So, if my proposed test instead of the trunk one doesn't work for darwin,
 perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) 
 !...).  I think DECL_USER_ALIGN test is undesirable for that though, it is
 just fine to increase alignment of anything, after all, it is still aligned
 properly then.

 I can confirm it bootstraps and regtests without any errors...and it
 fixes my problem.

I'm fine with that, but please give it some time on trunk before backporting it.

Thanks,
Richard.

 Peter





Re: [1/8] Remove redundant BLKmode test

2012-10-31 Thread Richard Biener
On Tue, Oct 30, 2012 at 11:00 PM, Eric Botcazou ebotca...@adacore.com wrote:
 gcc/
   * expmed.c (store_bit_field_1): Remove test for BLKmode values.

 This looks fine to me.

Btw, I consider Eric the best person to approve changes in this area.
Thus if there is any doubt all patches in this series are ok if Eric
thinks they are ;)

Thanks,
Richard.

 --
 Eric Botcazou


Re: [PATCH] Update source location for PRE inserted stmt

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 12:57 AM, Xinliang David Li davi...@google.com wrote:
 It will make the location info for the newly synthesized stmt more
 deterministic, I think.

Maybe, but it will increase the jumpiness in the debugger without actually
being accurate, no?  For example if the partially redundant expression is

  i + j;

then when computed at the insertion point the values of i and j do not
necessarily reflect the computed value!  Instead we may compute the
result of i + j using completely different components / operation.

Thus I think inserted expressions should not have any debug information
at all because they do not correspond to a source line.

Richard.

 David

 On Tue, Oct 30, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 On Wed, Oct 31, 2012 at 12:00 AM, Dehao Chen wrote:
 This patch aims to improve debugging of optimized code. It ensures
 that PRE inserted statements have the same source location as the
 statement at the insertion point, instead of UNKNOWN_LOCATION.

 Wrong patch attached.

 However, is it really better to have the location of the insertion
 point than to have UNKNOWN_LOCATION? It's not where the value is
 computed in the source program...

 Ciao!
 Steven


Re: [5/8] Add narrow_bit_field_mem

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 9:16 AM, Eric Botcazou ebotca...@adacore.com wrote:
 Would it be OK with a pointer, but keeping the interface the same?
 That's certainly fine by me.

 Yes, a pointer would make things more legible here.

 That's one of the things I'm not sure about after the C++ conversion:
 I've noticed some references creep in, but when should we use references
 and when pointers?  I think Richard B made a comment about using references
 for things that can't be null.

 I'm personally dubious about using references, especially in the middle-end of
 a file where all other functions use pointers; consistency should come first.

I agree.  My comment was for isolated code parts that are being rewritten
(I think it was the wide-int class).  Consistency comes first.

Thanks,
Richard.

 --
 Eric Botcazou


Re: Adapt one fold-const optimization for vectors

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 9:24 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote:
 On Tue, 30 Oct 2012, Marek Polacek wrote:

 On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:
 Hello,
 
 one more optimization that needed help for vectors, it crashed on
 (xy)0. Because of PR 55001, testcases are awkward to add (I could
 do a x86-only one if needed).
 
 bootstrap+testsuite.
 
 2012-10-30  Marc Glisse  marc.gli...@inria.fr
 
 * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
 (fold_binary_loc): call it for VEC_COND_EXPR.

 Capital C instead of lowercase.

 
 Patch missing?

 Indeed, thanks for the notice.

 This regressed
 FAIL: g++.dg/other/vector-compare.C (internal compiler error)
 FAIL: g++.dg/other/vector-compare.C (test for excess errors)
 on i686-linux.  The problem is that tree-vect-generic.c doesn't expand
 BLKmode VEC_COND_EXPR piecewise.  So, either fold-const on this optimization
 needs to test whether expand_vec_cond_expr_p is true for the chosen
 pair of comparison type and operand type, or tree-vect-generic.c needs to be
 tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not
 true for the particular VEC_COND_EXPR.

The latter (tree-vect-generic.c needs to expand it).

Richard.

 Jakub


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Richard Biener
On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:
 jakub,

 i am hoping to get the rest of my wide integer conversion posted by nov 5.
 I am under some adverse conditions here: hurricane sandy hit her pretty
 badly.  my house is hooked up to a small generator, and no one has any power
 for miles around.

 So far richi has promised to review them.   he has sent some comments, but
 so far no reviews.Some time after i get the first round of them posted,
 i will do a second round that incorporates everyones comments.

 But i would like a little slack here if possible.While this work is a
 show stopper for my private port, the patches address serious problems for
 many of the public ports, especially ones that have very flexible vector
 units.I believe that there are significant set of latent problems
 currently with the existing ports that use ti mode that these patches will
 fix.

 However, i will do everything in my power to get the first round of the
 patches posted by nov 5 deadline.

I suppose you are not going to merge your private port for 4.8 and thus
the wide-int changes are not a show-stopper for you.

That said, I considered the main conversion to be appropriate to be
defered for the next stage1.  There is no advantage in disrupting the
tree more at this stage.

Thanks,
Richard.

 kenny


 On 10/29/2012 01:56 PM, Jakub Jelinek wrote:

 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.


 Quality Data
 

 Priority  #   Change from Last Report
 ---   ---
 P1   23   + 23
 P2   77   +  8
 P3   85   + 84
 ---   ---
 Total   185   +115


 Previous Report
 ===

 http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html

 The next report will be sent by me again, announcing end of stage 1.




Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

2012-10-31 Thread Richard Biener
On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
joern.renne...@embecosm.com wrote:
 Quoting Richard Biener richard.guent...@gmail.com:

 Apart from the iteration_threshold the hookization would be
 straight-forward.
 Now I cannot decipher from the patch what functional change it introduces
 ;)


 The only change occurs if we reach an iteration count of MAX_INT iterations
 -
 which should already be indicative of a problem.

 At the MAX_INTth iteration, we will apply the length locking logic to
 instructions inside a delay slot sequence as well.

 If we disregard this exceptional case, there should be no functional changes
 unless someone defines TARGET_ADJUST_INSN_LENGTH.

 uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
 particular, the elements corresponding to instructions inside delay slot
 sequences are initialized to zero.

 As default hook sets *iter_threshold to MAX_INT when inside a delay
 sequence,
 and doesn't change length, the max operation with uid_lock_length is a
 no-op,
 and *niter  iter_threshold is true, hence a length change results in
 updating the length immediately, without changing uid_lock_length.

 In the case that we are not inside a delay slot sequence, the default hook
 leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
 niter
 is 0 or larger, as is the case for the ordinary looping operation, we
 always find *niter = iter_threshold, and thus apply the length locking
 mechanism.

 If we are in the preliminary pass, or doing the single !increasing
 iteration,
 niter is set to -1, so *inter  iter_threshold is always true, so again we
 update the length immediately.

Ok, I see.

The patch is ok then.

 There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
 warrant introducing a hook without removing the macro IMHO.


 Ok, I can prepare a patch for that, even though it makes it even a bit
 less obvious that there's no functional change.

That would be nice (you can do that as followup).

 What about the preparatory patch
 http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ?
 Can I check that in now?

Richard S. had some comments there so I defer to him.

Richard.


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:
 jakub,

 i am hoping to get the rest of my wide integer conversion posted by nov 5.
 I am under some adverse conditions here: hurricane sandy hit her pretty
 badly.  my house is hooked up to a small generator, and no one has any power
 for miles around.

 So far richi has promised to review them.   he has sent some comments, but
 so far no reviews.Some time after i get the first round of them posted,
 i will do a second round that incorporates everyones comments.

 But i would like a little slack here if possible.While this work is a
 show stopper for my private port, the patches address serious problems for
 many of the public ports, especially ones that have very flexible vector
 units.I believe that there are significant set of latent problems
 currently with the existing ports that use ti mode that these patches will
 fix.

 However, i will do everything in my power to get the first round of the
 patches posted by nov 5 deadline.

 I suppose you are not going to merge your private port for 4.8 and thus
 the wide-int changes are not a show-stopper for you.

 That said, I considered the main conversion to be appropriate to be
 defered for the next stage1.  There is no advantage in disrupting the
 tree more at this stage.

I would like the wide_int class and rtl stuff to go in 4.8 though.
IMO it's a significant improvement in its own right, and Kenny
submitted it well before the deadline.

Richard


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-31 Thread Richard Biener
On Tue, Oct 30, 2012 at 6:27 PM, Dehao Chen de...@google.com wrote:
 gcc/ChangeLog:
 2012-10-25  Dehao Chen  de...@google.com

 * tree-eh.c (do_return_redirection): Set location for jump statement.
 (do_goto_redirection): Likewise.
 (frob_into_branch_around): Likewise.
 (lower_try_finally_nofallthru): Likewise.
 (lower_try_finally_copy): Likewise.
 (lower_try_finally_switch): Likewise.
 * expr.c (store_expr): Use current insn location instead of expr
 location.
 (expand_expr_real): Likewise.
 (expand_expr_real_1): Likewise.

 gcc/testsuite/ChangeLog:
 2012-10-25  Dehao Chen  de...@google.com

 * g++.dg/debug/dwarf2/block.C: New testcase.

 This patch bootstrapped and passed gcc regression tests.

 Okay for trunk?

It looks sensible to me.  If nobody objects within 24h the patch is ok.

Thanks,
Richard.

 Thanks,
 Dehao


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 11:00:26AM +0100, Richard Biener wrote:
 On Tue, Oct 30, 2012 at 6:27 PM, Dehao Chen de...@google.com wrote:
  gcc/ChangeLog:
  2012-10-25  Dehao Chen  de...@google.com
 
  * tree-eh.c (do_return_redirection): Set location for jump 
  statement.
  (do_goto_redirection): Likewise.
  (frob_into_branch_around): Likewise.
  (lower_try_finally_nofallthru): Likewise.
  (lower_try_finally_copy): Likewise.
  (lower_try_finally_switch): Likewise.
  * expr.c (store_expr): Use current insn location instead of expr
  location.
  (expand_expr_real): Likewise.
  (expand_expr_real_1): Likewise.
 
  gcc/testsuite/ChangeLog:
  2012-10-25  Dehao Chen  de...@google.com
 
  * g++.dg/debug/dwarf2/block.C: New testcase.
 
  This patch bootstrapped and passed gcc regression tests.
 
  Okay for trunk?
 
 It looks sensible to me.  If nobody objects within 24h the patch is ok.

Yeah.  But please also check gdb testsuite for this kind of patches.

Jakub


Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4

2012-10-31 Thread Jan Hubicka
 
 visited is a poor name for a map ...
Hmm, visited_with_priority?

Thanks,
Honza
 
 Otherwise looks ok.
 
 Thanks,
 Richard.
 
  +
  +  /* Perform shortest path discovery loop-header ... loop-latch.
  +
  + The distance is given by the smallest loop bound of basic block
  + present in the path and we look for path with largest smallest bound
  + on it.
  +
  + To avoid the need for fibonaci heap on double ints we simply compress
  + double ints into indexes to BOUNDS array and then represent the queue
  + as arrays of queues for every index.
  + Index of VEC_length (BOUNDS) means that the execution of given BB has
  + no bounds determined.
  +
  + VISITED is a pointer map translating basic block into smallest index
  + it was inserted into the priority queue with.  */
  +  latch_index = -1;
  +
  +  /* Start walk in loop header with index set to infinite bound.  */
  +  queue_index = VEC_length (double_int, bounds);
  +  VEC_safe_grow_cleared (bb_queue, heap, queues, queue_index + 1);
  +  VEC_safe_push (basic_block, heap, queue, loop-header);
  +  VEC_replace (bb_queue, queues, queue_index, queue);
  +  *pointer_map_insert (visited, loop-header) = (void *)queue_index;
  +
  +  for (; queue_index = 0; queue_index--)
  +{
  +  if (latch_index  queue_index)
  +   {
  + while (VEC_length (basic_block,
  +VEC_index (bb_queue, queues, queue_index)))
  +   {
  + basic_block bb;
  + ptrdiff_t bound_index = queue_index;
  + void **entry;
  +  edge e;
  +  edge_iterator ei;
  +
  + queue = VEC_index (bb_queue, queues, queue_index);
  + bb = VEC_pop (basic_block, queue);
  +
  + /* OK, we later increased BB priority, skip it.  */
  + if ((ptrdiff_t)*pointer_map_contains (visited, bb)  queue_index)
  +   continue;
  +
  + /* See if we can improve the bound.  */
  + entry = pointer_map_contains (bb_bounds, bb);
  + if (entry  (ptrdiff_t)*entry  bound_index)
  +   bound_index = (ptrdiff_t)*entry;
  +
  + /* Insert succesors into the queue, watch for latch edge
  +and record greatest index we saw.  */
  + FOR_EACH_EDGE (e, ei, bb-succs)
  +   {
  + bool insert = false;
  + void **entry;
  +
  + if (loop_exit_edge_p (loop, e))
  +   continue;
  +
  + if (e == loop_latch_edge (loop)
  +  latch_index  bound_index)
  +   latch_index = bound_index;
  + else if (!(entry = pointer_map_contains (visited, e-dest)))
  +   {
  + insert = true;
  + *pointer_map_insert (visited, e-dest) = (void 
  *)bound_index;
  +   }
  + else if ((ptrdiff_t)*entry  bound_index)
  +   {
  + insert = true;
  + *entry = (void *)bound_index;
  +   }
  +   
  + if (insert)
  +   {
  + bb_queue queue2 = VEC_index (bb_queue, queues, 
  bound_index);
  + VEC_safe_push (basic_block, heap, queue2, e-dest);
  + VEC_replace (bb_queue, queues, bound_index, queue2);
  +   }
  +   }
  +   }
  +   }
  +  else
  +   VEC_free (basic_block, heap, VEC_index (bb_queue, queues, queue_index));
  +}
  +
  +  gcc_assert (latch_index = 0);
  +  if (latch_index  VEC_length (double_int, bounds))
  +{
  +  if (dump_file  (dump_flags  TDF_DETAILS))
  +   {
  + fprintf (dump_file, Found better loop bound );
  + dump_double_int (dump_file,
  +  VEC_index (double_int, bounds, latch_index), true);
  + fprintf (dump_file, \n);
  +   }
  +  record_niter_bound (loop, VEC_index (double_int, bounds, 
  latch_index),
  + false, true);
  +}
  +
  +  VEC_free (bb_queue, heap, queues);
  +  pointer_map_destroy (bb_bounds);
  +  pointer_map_destroy (visited);
  +}
  +
   /* See if every path cross the loop goes through a statement that is known
  to not execute at the last iteration. In that case we can decrese 
  iteration
  count by 1.  */
  @@ -3102,6 +3330,8 @@ estimate_numbers_of_iterations_loop (str
   
 infer_loop_bounds_from_undefined (loop);
   
  +  discover_iteration_bound_by_body_walk (loop);
  +
 maybe_lower_iteration_bound (loop);
   
 /* If we have a measured profile, use it to estimate the number of
  Index: testsuite/gcc.dg/tree-ssa/loop-38.c
  ===
  --- testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0)
  +++ testsuite/gcc.dg/tree-ssa/loop-38.c (revision 0)
  @@ -0,0 +1,18 @@
  +/* { dg-do compile } */
  +/* { dg-options -O2 -fdump-tree-cunrolli-details } */
  +int a[10];
  +int b[11];
  +t(int n)
  +{
  +   int i;
  +   int sum = 0;
  +   for 

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 10:59 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:
 jakub,

 i am hoping to get the rest of my wide integer conversion posted by nov 5.
 I am under some adverse conditions here: hurricane sandy hit her pretty
 badly.  my house is hooked up to a small generator, and no one has any power
 for miles around.

 So far richi has promised to review them.   he has sent some comments, but
 so far no reviews.Some time after i get the first round of them posted,
 i will do a second round that incorporates everyones comments.

 But i would like a little slack here if possible.While this work is a
 show stopper for my private port, the patches address serious problems for
 many of the public ports, especially ones that have very flexible vector
 units.I believe that there are significant set of latent problems
 currently with the existing ports that use ti mode that these patches will
 fix.

 However, i will do everything in my power to get the first round of the
 patches posted by nov 5 deadline.

 I suppose you are not going to merge your private port for 4.8 and thus
 the wide-int changes are not a show-stopper for you.

 That said, I considered the main conversion to be appropriate to be
 defered for the next stage1.  There is no advantage in disrupting the
 tree more at this stage.

 I would like the wide_int class and rtl stuff to go in 4.8 though.
 IMO it's a significant improvement in its own right, and Kenny
 submitted it well before the deadline.

If it gets in as-is then we'll have to live with the IMHO broken API
(yet another one besides the existing double-int).  So _please_
shrink the API down aggresively in favor of using non-member
helper functions with more descriptive names for things that
lump together multiple operations.  Look at double-int and
use the same API ideas as people are familiar with it
(like the unsigned flag stuff) - consistency always trumps.

I'm going to be on vacation for the next three weeks so somebody else
has to pick up the review work.  But I really think that the tree
has to recover from too many changes already.

Richard.


Re: Non-dominating loop bounds in tree-ssa-loop-niter 2/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

  
  visited is a poor name for a map ...
 Hmm, visited_with_priority?

Just block_priority?

Richard.

 Thanks,
 Honza
  
  Otherwise looks ok.
  
  Thanks,
  Richard.
  
   +
   +  /* Perform shortest path discovery loop-header ... loop-latch.
   +
   + The distance is given by the smallest loop bound of basic block
   + present in the path and we look for path with largest smallest bound
   + on it.
   +
   + To avoid the need for fibonaci heap on double ints we simply 
   compress
   + double ints into indexes to BOUNDS array and then represent the 
   queue
   + as arrays of queues for every index.
   + Index of VEC_length (BOUNDS) means that the execution of given BB 
   has
   + no bounds determined.
   +
   + VISITED is a pointer map translating basic block into smallest index
   + it was inserted into the priority queue with.  */
   +  latch_index = -1;
   +
   +  /* Start walk in loop header with index set to infinite bound.  */
   +  queue_index = VEC_length (double_int, bounds);
   +  VEC_safe_grow_cleared (bb_queue, heap, queues, queue_index + 1);
   +  VEC_safe_push (basic_block, heap, queue, loop-header);
   +  VEC_replace (bb_queue, queues, queue_index, queue);
   +  *pointer_map_insert (visited, loop-header) = (void *)queue_index;
   +
   +  for (; queue_index = 0; queue_index--)
   +{
   +  if (latch_index  queue_index)
   + {
   +   while (VEC_length (basic_block,
   +  VEC_index (bb_queue, queues, queue_index)))
   + {
   +   basic_block bb;
   +   ptrdiff_t bound_index = queue_index;
   +   void **entry;
   +  edge e;
   +  edge_iterator ei;
   +
   +   queue = VEC_index (bb_queue, queues, queue_index);
   +   bb = VEC_pop (basic_block, queue);
   +
   +   /* OK, we later increased BB priority, skip it.  */
   +   if ((ptrdiff_t)*pointer_map_contains (visited, bb)  queue_index)
   + continue;
   +
   +   /* See if we can improve the bound.  */
   +   entry = pointer_map_contains (bb_bounds, bb);
   +   if (entry  (ptrdiff_t)*entry  bound_index)
   + bound_index = (ptrdiff_t)*entry;
   +
   +   /* Insert succesors into the queue, watch for latch edge
   +  and record greatest index we saw.  */
   +   FOR_EACH_EDGE (e, ei, bb-succs)
   + {
   +   bool insert = false;
   +   void **entry;
   +
   +   if (loop_exit_edge_p (loop, e))
   + continue;
   +
   +   if (e == loop_latch_edge (loop)
   +latch_index  bound_index)
   + latch_index = bound_index;
   +   else if (!(entry = pointer_map_contains (visited, e-dest)))
   + {
   +   insert = true;
   +   *pointer_map_insert (visited, e-dest) = (void 
   *)bound_index;
   + }
   +   else if ((ptrdiff_t)*entry  bound_index)
   + {
   +   insert = true;
   +   *entry = (void *)bound_index;
   + }
   + 
   +   if (insert)
   + {
   +   bb_queue queue2 = VEC_index (bb_queue, queues, 
   bound_index);
   +   VEC_safe_push (basic_block, heap, queue2, e-dest);
   +   VEC_replace (bb_queue, queues, bound_index, queue2);
   + }
   + }
   + }
   + }
   +  else
   + VEC_free (basic_block, heap, VEC_index (bb_queue, queues, queue_index));
   +}
   +
   +  gcc_assert (latch_index = 0);
   +  if (latch_index  VEC_length (double_int, bounds))
   +{
   +  if (dump_file  (dump_flags  TDF_DETAILS))
   + {
   +   fprintf (dump_file, Found better loop bound );
   +   dump_double_int (dump_file,
   +VEC_index (double_int, bounds, latch_index), true);
   +   fprintf (dump_file, \n);
   + }
   +  record_niter_bound (loop, VEC_index (double_int, bounds, 
   latch_index),
   +   false, true);
   +}
   +
   +  VEC_free (bb_queue, heap, queues);
   +  pointer_map_destroy (bb_bounds);
   +  pointer_map_destroy (visited);
   +}
   +
/* See if every path cross the loop goes through a statement that is 
   known
   to not execute at the last iteration. In that case we can decrese 
   iteration
   count by 1.  */
   @@ -3102,6 +3330,8 @@ estimate_numbers_of_iterations_loop (str

  infer_loop_bounds_from_undefined (loop);

   +  discover_iteration_bound_by_body_walk (loop);
   +
  maybe_lower_iteration_bound (loop);

  /* If we have a measured profile, use it to estimate the number of
   Index: testsuite/gcc.dg/tree-ssa/loop-38.c
   ===
   --- testsuite/gcc.dg/tree-ssa/loop-38.c   (revision 0)
   +++ testsuite/gcc.dg/tree-ssa/loop-38.c   (revision 0)
   @@ -0,0 +1,18 @@
   +/* { dg-do compile } */
   +/* { dg-options -O2 

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Richard Biener
On Mon, Oct 29, 2012 at 6:56 PM, Jakub Jelinek ja...@redhat.com wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.

Reminds me of the stable plugin API for introspection.  David, Micha - what's
the status here?  Adding this is certainly ok during stage3 and I think that
we should have something in 4.8 to kick of further development here.

Richard.


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
 joern.renne...@embecosm.com wrote:
 Quoting Richard Biener richard.guent...@gmail.com:

 Apart from the iteration_threshold the hookization would be
 straight-forward.
 Now I cannot decipher from the patch what functional change it introduces
 ;)


 The only change occurs if we reach an iteration count of MAX_INT iterations
 -
 which should already be indicative of a problem.

 At the MAX_INTth iteration, we will apply the length locking logic to
 instructions inside a delay slot sequence as well.

 If we disregard this exceptional case, there should be no functional changes
 unless someone defines TARGET_ADJUST_INSN_LENGTH.

 uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
 particular, the elements corresponding to instructions inside delay slot
 sequences are initialized to zero.

 As default hook sets *iter_threshold to MAX_INT when inside a delay
 sequence,
 and doesn't change length, the max operation with uid_lock_length is a
 no-op,
 and *niter  iter_threshold is true, hence a length change results in
 updating the length immediately, without changing uid_lock_length.

 In the case that we are not inside a delay slot sequence, the default hook
 leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
 niter
 is 0 or larger, as is the case for the ordinary looping operation, we
 always find *niter = iter_threshold, and thus apply the length locking
 mechanism.

 If we are in the preliminary pass, or doing the single !increasing
 iteration,
 niter is set to -1, so *inter  iter_threshold is always true, so again we
 update the length immediately.

 Ok, I see.

 The patch is ok then.

I should probably have piped up earlier, but I'm really not sure about it.
ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
The patch instead ties the new hook directly into the shorten_branches
algorithm, which I think is a bad idea.

IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
a fused process: every value returned by a length attribute (whether min,
default or current) should be filtered through ADJUST_INSN_LENGTH before
being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
be used.

Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
which I agree is a good thing in itself.  It's the all-important
iteration parameter that feels wrong.

TBH, I still don't understand what property of the target we're trying
to describe here.  I gather it has something to do with alignment.
But I'd really like a description of that target property, independent
of the shorten_branches implementation.

Richard


Re: [PATCH] Fix PR53501

2012-10-31 Thread H.J. Lu
On Fri, Jun 1, 2012 at 3:29 AM, Eric Botcazou ebotca...@adacore.com wrote:
 Well, it would rather be

   TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (arg0))
TYPE_UNSIGNED (type) == TYPE_UNSIGNED (TREE_TYPE (arg1))

 but only in the !FLOAT_TYPE_P path.

 That works in all cases I think, see existing cases in the folder.

 We could even compare
 TYPE_OVERFLOW_UNDEFINED I think.  Or even just make sure
 that when TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)) also
 TYPE_OVERFLOW_UNDEFINED (type), thus

   !TYPE_OVERFLOW_UNDEFINED (type)

   || ((TREE_CODE (arg0) != MULT_EXPR
   ||
|| TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)))

(TREE_CODE (arg1) != MULT_EXPR

   || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1

 That is, the newly created multiplication in type TYPE should
 either not have undefined overflow or the inner multiplications
 all should already have.  Best done with a comment in
 fold_plusminus_mult_expr.

 I'm a little lost here. :-)  I don't really care about the mainline at this
 point, but the fix on the branches should be the minimal working one.


Your change caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142

-- 
H.J.


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread JonY
On 10/30/2012 01:56, Jakub Jelinek wrote:
 Status
 ==
 
 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.
 

Somebody with commit rights please push [Patch] Remove
_GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h.

Kai has already approved, but is off for the week.





signature.asc
Description: OpenPGP digital signature


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 11:22 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Tue, Oct 30, 2012 at 5:14 PM, Joern Rennecke
 joern.renne...@embecosm.com wrote:
 Quoting Richard Biener richard.guent...@gmail.com:

 Apart from the iteration_threshold the hookization would be
 straight-forward.
 Now I cannot decipher from the patch what functional change it introduces
 ;)


 The only change occurs if we reach an iteration count of MAX_INT iterations
 -
 which should already be indicative of a problem.

 At the MAX_INTth iteration, we will apply the length locking logic to
 instructions inside a delay slot sequence as well.

 If we disregard this exceptional case, there should be no functional changes
 unless someone defines TARGET_ADJUST_INSN_LENGTH.

 uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
 particular, the elements corresponding to instructions inside delay slot
 sequences are initialized to zero.

 As default hook sets *iter_threshold to MAX_INT when inside a delay
 sequence,
 and doesn't change length, the max operation with uid_lock_length is a
 no-op,
 and *niter  iter_threshold is true, hence a length change results in
 updating the length immediately, without changing uid_lock_length.

 In the case that we are not inside a delay slot sequence, the default hook
 leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when
 niter
 is 0 or larger, as is the case for the ordinary looping operation, we
 always find *niter = iter_threshold, and thus apply the length locking
 mechanism.

 If we are in the preliminary pass, or doing the single !increasing
 iteration,
 niter is set to -1, so *inter  iter_threshold is always true, so again we
 update the length immediately.

 Ok, I see.

 The patch is ok then.

 I should probably have piped up earlier, but I'm really not sure about it.
 ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
 The patch instead ties the new hook directly into the shorten_branches
 algorithm, which I think is a bad idea.

 IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
 a fused process: every value returned by a length attribute (whether min,
 default or current) should be filtered through ADJUST_INSN_LENGTH before
 being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
 length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
 be used.

 Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
 which I agree is a good thing in itself.  It's the all-important
 iteration parameter that feels wrong.

 TBH, I still don't understand what property of the target we're trying
 to describe here.  I gather it has something to do with alignment.
 But I'd really like a description of that target property, independent
 of the shorten_branches implementation.

Yeah, the iteration parameter doesn't feel right, I agree.  But
ADJUST_INSN_LENGTH
is only used in this code.

Which is why I originally said if the patch would not add the
iteration parameter
it would be obvious ...

So, consider my approval revoked.

Richard.

 Richard


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 06:25:45PM +0800, JonY wrote:
 On 10/30/2012 01:56, Jakub Jelinek wrote:
  I'd like to close the stage 1 phase of GCC 4.8 development
  on Monday, November 5th.  If you have still patches for new features you'd
  like to see in GCC 4.8, please post them for review soon.  Patches
  posted before the freeze, but reviewed shortly after the freeze, may
  still go in, further changes should be just bugfixes and documentation
  fixes.
  
 
 Somebody with commit rights please push [Patch] Remove
 _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h.
 
 Kai has already approved, but is off for the week.

That looks like a bugfix (or even regression bugfix).  Bugfixes are
fine through stage 3, regression bugfixes are fine even in stage 4.

Jakub


Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Jan Hubicka
Hi,
this patch implements the logic to remove statements that are known to be
undefined and thus expected to not be executed after unrolling.  It also
removes redundant exits that I originally tried to do at once, but it
does not fly, since the peeling confuse number_of_iterations_exit
and it no longer understands the ivs.

So now we
1) always remove exits that are known to be redundant by the bounds found
2) try to peel/unroll
3) if success remove statemnts from the last iteration

This silence the array-bounds warnings in my testcase and many cases of
-O3 bootstrap (I am running it now).
Still if one construct testcase touching out of bound in more than one
iteration we will produce false warning, I will do that incrementally
by similar logic in loop copying.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree-ssa-loop-niter.c (free_loop_bounds): Break out from ...
(free_numbers_of_iterations_estimates_loop): ... here.
* tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): New
function.
(remove_redundant_iv_test): New function.
(try_unroll_loop_completely): Pass in MAXITER; use
remove_exits_and_undefined_stmts
(canonicalize_loop_induction_variables): Compute MAXITER;
use remove_redundant_iv_test.
* cfgloop.h (free_loop_bounds): New function.

* gcc.dg/tree-ssa/cunroll-10.c: New testcase.
* gcc.dg/tree-ssa/cunroll-9.c: New testcase.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192989)
@@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s
   return true;
 }
 
-/* Frees the information on upper bounds on numbers of iterations of LOOP.  */
-
 void
-free_numbers_of_iterations_estimates_loop (struct loop *loop)
+free_loop_bounds (struct loop *loop)
 {
   struct nb_iter_bound *bound, *next;
 
-  loop-nb_iterations = NULL;
-  loop-estimate_state = EST_NOT_COMPUTED;
   for (bound = loop-bounds; bound; bound = next)
 {
   next = bound-next;
@@ -3523,6 +3751,16 @@ free_numbers_of_iterations_estimates_loo
   loop-bounds = NULL;
 }
 
+/* Frees the information on upper bounds on numbers of iterations of LOOP.  */
+
+void
+free_numbers_of_iterations_estimates_loop (struct loop *loop)
+{
+  loop-nb_iterations = NULL;
+  loop-estimate_state = EST_NOT_COMPUTED;
+  free_loop_bounds (loop);
+}
+
 /* Frees the information on upper bounds on numbers of iterations of loops.  */
 
 void
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 192989)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -389,6 +389,116 @@ loop_edge_to_cancel (struct loop *loop)
   return NULL;
 }
 
+/* Remove all tests for exits that are known to be taken after LOOP was
+   peeled NPEELED times. Put gcc_unreachable before every statement
+   known to not be executed.  */
+
+static bool
+remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled)
+{
+  struct nb_iter_bound *elt;
+  bool changed = false;
+
+  for (elt = loop-bounds; elt; elt = elt-next)
+{
+  /* If statement is known to be undefined after peeling, turn it
+into unreachable (or trap when debugging experience is supposed
+to be good).  */
+  if (!elt-is_exit
+  elt-bound.ule (double_int::from_uhwi (npeeled)))
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt);
+ gimple stmt = gimple_build_call
+ (builtin_decl_implicit (optimize_debug
+ ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE),
+  0);
+
+ gimple_set_location (stmt, gimple_location (elt-stmt));
+ gsi_insert_before (gsi, stmt, GSI_NEW_STMT);
+ changed = true;
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, Forced statement unreachable: );
+ print_gimple_stmt (dump_file, elt-stmt, 0, 0);
+   }
+   }
+  /* If we know the exit will be taken after peeling, update.  */
+  else if (elt-is_exit
+   elt-bound.ule (double_int::from_uhwi (npeeled)))
+   {
+ basic_block bb = gimple_bb (elt-stmt);
+ edge exit_edge = EDGE_SUCC (bb, 0);
+
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, Forced exit to be taken: );
+ print_gimple_stmt (dump_file, elt-stmt, 0, 0);
+   }
+ if (!loop_exit_edge_p (loop, exit_edge))
+   exit_edge = EDGE_SUCC (bb, 1);
+ if (exit_edge-flags  EDGE_TRUE_VALUE)
+   gimple_cond_make_true (elt-stmt);
+ else
+   gimple_cond_make_false (elt-stmt);
+ update_stmt (elt-stmt);
+ changed = true;
+   }
+}
+  return changed;
+}
+
+/* Remove all exits that are known to be never taken because of the loop bound
+   discovered.  */

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way that
 the
 target does it.   Any such instantiation is by definition placing some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two implementations
 of functions that perform integer math on two-HWI sized integers.  What
 I also don't like too much is that we have two different interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from the
 compiler.  All the necessary semantics and code of double-int _has_ been
 refactored into wide-int already.  Changing wide-int in any way to vend
 anything to double-int is wrong, as once double-int is removed, then all 
 the
 api changes to make double-int share from wide-int is wasted and must then
 be removed.  The path forward is the complete removal of double-int; it is
 wrong, has been wrong and always will be wrong, nothing can change that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed reasons_.

 Richard.

 i do not know why you believe this about the speed. double int always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done natively.
 I seriously doubt that you are going to do TI mode math much faster than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative way and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one operation
 (but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in favour of separate and() and not() operations, but why
not provide a fused one if there are clients who'll make use of it?
I think Kenny's API is just taking that to its logical conclusion.
There doesn't seem to be anything sacrosanct about the current choice
of what's fused and what isn't.

The speed problem we had using trees 

[AARCH64-4.7] Merge from upstream gcc-4_7-branch r192902

2012-10-31 Thread Sofiane Naci
Hi,

I have just merged upstream gcc-4_7-branch into ARM/aarch64-4.7-branch, up
to r192902.

Thanks
Sofiane






Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Jonathan Wakely
On 31 October 2012 10:25, JonY wrote:
 On 10/30/2012 01:56, Jakub Jelinek wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.


 Somebody with commit rights please push [Patch] Remove
 _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h.

 Kai has already approved, but is off for the week.

I could have done that, if it had been sent to the right lists. All
libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org
please.

Let's move this to the libstdc++ list, I have some questions about the patch.


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

 Hi,
 this patch implements the logic to remove statements that are known to be
 undefined and thus expected to not be executed after unrolling.  It also
 removes redundant exits that I originally tried to do at once, but it
 does not fly, since the peeling confuse number_of_iterations_exit
 and it no longer understands the ivs.
 
 So now we
 1) always remove exits that are known to be redundant by the bounds found
 2) try to peel/unroll
 3) if success remove statemnts from the last iteration
 
 This silence the array-bounds warnings in my testcase and many cases of
 -O3 bootstrap (I am running it now).
 Still if one construct testcase touching out of bound in more than one
 iteration we will produce false warning, I will do that incrementally
 by similar logic in loop copying.
 
 Bootstrapped/regtested x86_64-linux, OK?
 
 Honza
 
   * tree-ssa-loop-niter.c (free_loop_bounds): Break out from ...
   (free_numbers_of_iterations_estimates_loop): ... here.
   * tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): New
   function.
   (remove_redundant_iv_test): New function.
   (try_unroll_loop_completely): Pass in MAXITER; use
   remove_exits_and_undefined_stmts
   (canonicalize_loop_induction_variables): Compute MAXITER;
   use remove_redundant_iv_test.
   * cfgloop.h (free_loop_bounds): New function.
 
   * gcc.dg/tree-ssa/cunroll-10.c: New testcase.
   * gcc.dg/tree-ssa/cunroll-9.c: New testcase.
 Index: tree-ssa-loop-niter.c
 ===
 --- tree-ssa-loop-niter.c (revision 192989)
 @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s
return true;
  }
  
 -/* Frees the information on upper bounds on numbers of iterations of LOOP.  
 */
 -

Needs a comment.

  void
 -free_numbers_of_iterations_estimates_loop (struct loop *loop)
 +free_loop_bounds (struct loop *loop)
  {
struct nb_iter_bound *bound, *next;
  
 -  loop-nb_iterations = NULL;
 -  loop-estimate_state = EST_NOT_COMPUTED;
for (bound = loop-bounds; bound; bound = next)
  {
next = bound-next;
 @@ -3523,6 +3751,16 @@ free_numbers_of_iterations_estimates_loo
loop-bounds = NULL;
  }
  
 +/* Frees the information on upper bounds on numbers of iterations of LOOP.  
 */
 +
 +void
 +free_numbers_of_iterations_estimates_loop (struct loop *loop)
 +{
 +  loop-nb_iterations = NULL;
 +  loop-estimate_state = EST_NOT_COMPUTED;
 +  free_loop_bounds (loop);
 +}
 +
  /* Frees the information on upper bounds on numbers of iterations of loops.  
 */
  
  void
 Index: tree-ssa-loop-ivcanon.c
 ===
 --- tree-ssa-loop-ivcanon.c   (revision 192989)
 +++ tree-ssa-loop-ivcanon.c   (working copy)
 @@ -389,6 +389,116 @@ loop_edge_to_cancel (struct loop *loop)
return NULL;
  }
  
 +/* Remove all tests for exits that are known to be taken after LOOP was
 +   peeled NPEELED times. Put gcc_unreachable before every statement
 +   known to not be executed.  */
 +
 +static bool
 +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled)
 +{
 +  struct nb_iter_bound *elt;
 +  bool changed = false;
 +
 +  for (elt = loop-bounds; elt; elt = elt-next)
 +{
 +  /* If statement is known to be undefined after peeling, turn it
 +  into unreachable (or trap when debugging experience is supposed
 +  to be good).  */
 +  if (!elt-is_exit
 +elt-bound.ule (double_int::from_uhwi (npeeled)))
 + {
 +   gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt);
 +   gimple stmt = gimple_build_call
 +   (builtin_decl_implicit (optimize_debug
 +   ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE),

I'm not sure we should do different things for -Og here.  In fact there
is no unrolling done for -Og at all, so just use unreachable.

 +0);
 +
 +   gimple_set_location (stmt, gimple_location (elt-stmt));
 +   gsi_insert_before (gsi, stmt, GSI_NEW_STMT);
 +   changed = true;
 +   if (dump_file  (dump_flags  TDF_DETAILS))
 + {
 +   fprintf (dump_file, Forced statement unreachable: );
 +   print_gimple_stmt (dump_file, elt-stmt, 0, 0);
 + }
 + }
 +  /* If we know the exit will be taken after peeling, update.  */
 +  else if (elt-is_exit
 + elt-bound.ule (double_int::from_uhwi (npeeled)))
 + {
 +   basic_block bb = gimple_bb (elt-stmt);
 +   edge exit_edge = EDGE_SUCC (bb, 0);
 +
 +   if (dump_file  (dump_flags  TDF_DETAILS))
 + {
 +   fprintf (dump_file, Forced exit to be taken: );
 +   print_gimple_stmt (dump_file, elt-stmt, 0, 0);
 + }
 +   if (!loop_exit_edge_p (loop, exit_edge))
 + exit_edge = EDGE_SUCC (bb, 1);
 +   if (exit_edge-flags  EDGE_TRUE_VALUE)

I think we can have abnormal/eh exit edges.  So I'm not sure you

[Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread Jonathan Wakely
On 31 October 2012 11:01, Jonathan Wakely wrote:
 On 31 October 2012 10:25, JonY wrote:
 On 10/30/2012 01:56, Jakub Jelinek wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.


 Somebody with commit rights please push [Patch] Remove
 _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h.

 Kai has already approved, but is off for the week.

 I could have done that, if it had been sent to the right lists. All
 libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org
 please.

 Let's move this to the libstdc++ list, I have some questions about the patch.

It looks like the workaround is in mingw not in GCC, so is it a
problem that it won't be possible to use GCC 4.8 with existing mingw
versions, or are users required to use a brand new mingw to use a new
GCC?  Should that be documented in
http://gcc.gnu.org/gcc-4.8/changes.html ?

Why is the define commented out by the patch, not simply removed?
If it's not needed then it's not needed. We have subversion to track
change history, we don't need to leave dead code lying around with
comments explaining why it's dead.


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread JonY
On 10/31/2012 19:12, Jonathan Wakely wrote:
 On 31 October 2012 11:01, Jonathan Wakely wrote:
 On 31 October 2012 10:25, JonY wrote:
 On 10/30/2012 01:56, Jakub Jelinek wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.


 Somebody with commit rights please push [Patch] Remove
 _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h.

 Kai has already approved, but is off for the week.

 I could have done that, if it had been sent to the right lists. All
 libstdc++ patches go to both gcc-patches and libstd...@gcc.gnu.org
 please.

 Let's move this to the libstdc++ list, I have some questions about the patch.
 
 It looks like the workaround is in mingw not in GCC, so is it a
 problem that it won't be possible to use GCC 4.8 with existing mingw
 versions, or are users required to use a brand new mingw to use a new
 GCC?  Should that be documented in
 http://gcc.gnu.org/gcc-4.8/changes.html ?
 

They are required to use the latest mingw-w64, the problem was that the
vfswprintf that libstdc++ expects isn't the same as the one MS provides,
so I've wrote a redirector to use the vsnwprintf, more precisely, the
mingw C99 compliant __mingw_vsnwprintf.

std::to_wstring and std::to_string work according to some simple tests.

I guess the current comment about require mingw-w64 trunk at least r5437
is OK for the changes page. It should probably note that this change is
mingw-w64 specific, with w64 as the vendor key.

 Why is the define commented out by the patch, not simply removed?
 If it's not needed then it's not needed. We have subversion to track
 change history, we don't need to leave dead code lying around with
 comments explaining why it's dead.

OK, I will remove it.




signature.asc
Description: OpenPGP digital signature


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread JonY
On 10/31/2012 19:23, JonY wrote:
 
 Why is the define commented out by the patch, not simply removed?
 If it's not needed then it's not needed. We have subversion to track
 change history, we don't need to leave dead code lying around with
 comments explaining why it's dead.
 
 OK, I will remove it.
 

Sorry, forgot to attach the patch.

ChangeLog
2012-10-31  Jonathan Yong  jo...@users.sourceforge.net

* config/os/mingw32-w64/os_defines.h: Remove 
_GLIBCXX_HAVE_BROKEN_VSWPRINTF
as no longer required.


Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h
===
--- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 192802)
+++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy)
@@ -63,9 +63,6 @@
 // See libstdc++/20806.
 #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1
 
-// See  libstdc++/37522.
-#define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1
-
 // See libstdc++/43738
 // On native windows targets there is no ioctl function. And the existing
 // ioctlsocket function doesn't work for normal file-descriptors.


signature.asc
Description: OpenPGP digital signature


Re: Minimize downward code motion during reassociation

2012-10-31 Thread Richard Biener
On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote:
 On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the 
 statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

 The reason I call the above multiple times is that gsi_move_before
 might get called between two calls to the above. For instance, after
 rewrite_expr_tree is called once, the following sequence of calls
 could happen:
 reassociate_bb - linearize_expr_tree - linearize_expr -
 gsi_move_before.  So it is not sufficient to call
 renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
 use renumber_gimple_stmt_uids_in_blocks instead of
 assign_uids_in_relevant_bbs?

 It's sufficient to call it once if you conservatively update UIDs at stmt
 move / insert time (assign the same UID as the stmt before/after).

 I was worried that that would make the code brittle, but looking at
 the places where a new statement is added or moved, I think it is
 fine.  I have made the change in the new patch.

Thanks.



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems to 
 me
 this is quadratic in the number of ops in the op vector.

 The call to ensure_ops_are_available inside rewrite_expr_tree is
 guarded by if (!moved) and I am setting moved = true there to ensure
 that ensure_ops_are_available inside is called once per reassociation
 of a expression tree.

 It would be a lot easier to understand this function if you call it once
 after rewrite_expr_tree finished.
 I think you meant  before rewrite_expr_tree. One thing to note here
 is this function is called only when we first see a statement whose
 operand changes after reassociation. That's the reason the moving of
 statements happen inside rewrite_expr_tree.

I meant after because rewrite_expr_tree ends up 

Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread Paolo Carlini

Applied.

Thanks,
Paolo.


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Joern Rennecke

Quoting Richard Sandiford rdsandif...@googlemail.com:


I should probably have piped up earlier, but I'm really not sure about it.
ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
The patch instead ties the new hook directly into the shorten_branches
algorithm, which I think is a bad idea.

IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
a fused process: every value returned by a length attribute (whether min,
default or current) should be filtered through ADJUST_INSN_LENGTH before
being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
be used.

Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
which I agree is a good thing in itself.  It's the all-important
iteration parameter that feels wrong.

TBH, I still don't understand what property of the target we're trying
to describe here.  I gather it has something to do with alignment.
But I'd really like a description of that target property, independent
of the shorten_branches implementation.


It's about describing complex interactions of length adjustments that
derive from branch shortening and length added for (un)alignment for
scheduling purposes.  Expressed naively, these can lead to cycles.
I could manage with the combination of length / lock_length attribute,
but there was a fine line between tuning for optimal scheduling and
risking cycles.
So, as Richard Biener has pointed out, that design was not optimal.

What is really needed is a way to express priorities between instructions
when to inhibit length shrinkage from one iteration to another.

Ports that have no issues with negative feedback in the length calculations
don't need to worry about priorities; the length locking mechanism
will be a no-op anyways, so the question when it is being applied is moot.

In the case you have negative feedback, the default, to apply length
locking immediatly to all instructions, gives the fastest convergence,
although it will likely give a sub-optimal shortening solution.

The length variation for the ARC are not alike: there are branches that
are subject to branch shortening in the usual way, but they might
shrink when other things shrink.  When we are iterating starting at
minimal length and increasing lengths, these branches should be stopped
from shrinking, as they likly will grow back again in the next iteration.
OTOH there are potentially short instructions that are lengthened to
archive scheduling-dictated alignment or misalignment.  These should
be allowed to change freely back and forth.  Unless we have a rare
cycle that only depends on alignments.

So, to get this right without having the port writer to worry too much
when a length adjustment causes a cycle, we want all varying length
instructions subject to length locking to break cycles, but not with
the same priority.  Specifying this as an iteration count is an
easy way to implement this, without having to make shorten_branches
identify every cycle and the participating instructions that are
oscillating in length.  The latter would actually run up against
computability problems in distinguishing a single complex cycle and
multiple independent cycles.

If you like, we can describe the iter_threshold parameter in a different
way, just saying that it indicates a priority, 0 being the first to
get length locking applied, and the larger the number, the later.


Re: [PATCH] pass filtering for -fopt-info

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 8:20 AM, Sharad Singhai sing...@google.com wrote:
 I am attaching an updated patch with comments inline.

 On Tue, Oct 30, 2012 at 9:04 AM, Xinliang David Li davi...@google.com wrote:
 On Tue, Oct 30, 2012 at 8:28 AM, Richard Biener
 richard.guent...@gmail.com wrote:

 What I'd expect from that would be both vec.miss and vec.opt being
 populated ... (thus go away from the duality of dump files to
 primary dump file plus a set of alternate dump files).

 Yes, that would be an ideal behavior. Each pass currently has access
 to only two dump files, its primary file and alternate file. If we
 extend that to include a set, it would be cleaner and avoid some of
 the issues like the following:

 During testing, I found that even having any two different files for
 -fopt-info is problematic. For example, consider

 gcc ... -fopt-info-vec-optimized=vec.opt -fopt-info-loop-missed=loop.missed

 Now 'loop' and 'vec' include overlapping passes and since we have only
 one alternate file per pass, it can either be vec.opt or loop.missed,
 and some information would be lost. If we have a set of files, we can
 avoid this problem.

 However, considering the stage 1 deadline, I don't think it is
 feasible for this version. Instead, a set of alternate files can be
 considered as a future extension. For now, having a default of
 'stderr' works for the common use case. To further prevent confusion,
 the current patch allows only one (non-stderr) file for -fopt-info and
 a warning is printed as in  the example below

 gcc ... -fopt-info-vec-optimized=vec.opt -fopt-info-loop-missed=loop.missed
 cc1: warning: ignoring possibly conflicting option
 '-fopt-info-loop-missed=loop.missed' [enabled by default]

 Multiple dumps can be achieved using 'stderr' (default) instead.

 gcc ... -fopt-info-vec-optimized -fopt-info-loop-missed

 I have updated the documentation to include -fopt-info examples, and
 added some details about -fopt-info command line conflicts.

 I like it overall, not sure if we want to pre-populate the OPTGROUP
 set too much at this point.  Like what is 'tree' or 'rtl' to users?
 nothing I think.
 'ipa' yes.  'lto'?  sounds redundant with 'ipa' to me.  'omp'?  we don't 
 have
 any optimizations here.


 OMP is a high level transformation, and it seems to be a good
 candidate group, but this part does not need to be designed now. For
 instance, there are a bunch of FDO related transformation (indirect
 call promotion, memcpy transformation etc), and coverage mismatch
 notes etc a good candidate to be filtered.

 Yes, perhaps later, we can consider something like an 'fdo' group for
 this purpose.

 Thus please drop TREE, RTL, LTO and OMP for now.

 Okay, I have removed 'tree', 'rtl', 'omp', 'lto' groups.


 Otherwise I'm leaving it for comments from other folks.

 Thanks,
 Richard.


 I have updated the documentation and testing is ongoing.

@@ -503,7 +509,7 @@ register_one_dump_file (struct opt_pass *pass)
   name = name ? name + 1 : pass-name;
   dot_name = concat (., name, num, NULL);
   if (pass-type == SIMPLE_IPA_PASS || pass-type == IPA_PASS)
-prefix = ipa-, flags = TDF_IPA;
+prefix = ipa-, flags = TDF_IPA, optgroup_flags |= OPTGROUP_IPA;

please do not use compound statements.

Otherwise ok if testing is ok.

Thanks,
Richard.

 Thanks,
 Sharad


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way that
 the
 target does it.   Any such instantiation is by definition placing some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two implementations
 of functions that perform integer math on two-HWI sized integers.  What
 I also don't like too much is that we have two different interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from the
 compiler.  All the necessary semantics and code of double-int _has_ been
 refactored into wide-int already.  Changing wide-int in any way to vend
 anything to double-int is wrong, as once double-int is removed, then all 
 the
 api changes to make double-int share from wide-int is wasted and must then
 be removed.  The path forward is the complete removal of double-int; it is
 wrong, has been wrong and always will be wrong, nothing can change that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed reasons_.

 Richard.

 i do not know why you believe this about the speed. double int always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done natively.
 I seriously doubt that you are going to do TI mode math much faster than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative way and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were
 complaining about too many fused ops.  Clearly this one could be
 removed in favour of separate and() and not() operations, but why
 not provide a fused one if there are clients who'll make use of it?

I was more concerned about fused operations that use precision
or bitsize as input.  That is for example

 +  bool 

Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Sandiford
Joern Rennecke joern.renne...@embecosm.com writes:
 Quoting Richard Sandiford rdsandif...@googlemail.com:
 I should probably have piped up earlier, but I'm really not sure about it.
 ADJUST_INSN_LENGTH as defined now is telling us a property of the target.
 The patch instead ties the new hook directly into the shorten_branches
 algorithm, which I think is a bad idea.

 IMO, the length attributes and ADJUST_INSN_LENGTH should be treated as
 a fused process: every value returned by a length attribute (whether min,
 default or current) should be filtered through ADJUST_INSN_LENGTH before
 being used.  Whether ADJUST_INSN_LENGTH decreases or increases the original
 length doesn't matter, because only the output of ADJUST_INSN_LENGTH should
 be used.

 Joern's patch makes us use ADJUST_INSN_LENGTH for delay slot insns,
 which I agree is a good thing in itself.  It's the all-important
 iteration parameter that feels wrong.

 TBH, I still don't understand what property of the target we're trying
 to describe here.  I gather it has something to do with alignment.
 But I'd really like a description of that target property, independent
 of the shorten_branches implementation.

 It's about describing complex interactions of length adjustments that
 derive from branch shortening and length added for (un)alignment for
 scheduling purposes.  Expressed naively, these can lead to cycles.

But shorten_branches should be written to avoid cycles, and I thought
your patch did that by making sure that the address of each instruction
only increases.

 I could manage with the combination of length / lock_length attribute,
 but there was a fine line between tuning for optimal scheduling and
 risking cycles.
 So, as Richard Biener has pointed out, that design was not optimal.

 What is really needed is a way to express priorities between instructions
 when to inhibit length shrinkage from one iteration to another.

 Ports that have no issues with negative feedback in the length calculations
 don't need to worry about priorities; the length locking mechanism
 will be a no-op anyways, so the question when it is being applied is moot.

 In the case you have negative feedback, the default, to apply length
 locking immediatly to all instructions, gives the fastest convergence,
 although it will likely give a sub-optimal shortening solution.

 The length variation for the ARC are not alike: there are branches that
 are subject to branch shortening in the usual way, but they might
 shrink when other things shrink.  When we are iterating starting at
 minimal length and increasing lengths, these branches should be stopped
 from shrinking, as they likly will grow back again in the next iteration.
 OTOH there are potentially short instructions that are lengthened to
 archive scheduling-dictated alignment or misalignment.  These should
 be allowed to change freely back and forth.  Unless we have a rare
 cycle that only depends on alignments.

Hmm, but this is still talking in terms of shorten_branches, rather than
the target property that we're trying to model.  It sounds like the property
is something along the lines of: some instructions have different encodings
(or work-alikes with different encodings), and we want to make all the
encodings available to the optimisers.  Is that right?  If so, that sounds
like a useful thing to model, but I think it should be modelled directly,
rather than as a modification to the shorten_branches algorithm.

Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
Using labels, or something else?

Richard


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread Jonathan Wakely
On 31 October 2012 11:23, JonY wrote:
 On 10/31/2012 19:12, Jonathan Wakely wrote:

 It looks like the workaround is in mingw not in GCC, so is it a
 problem that it won't be possible to use GCC 4.8 with existing mingw
 versions, or are users required to use a brand new mingw to use a new
 GCC?  Should that be documented in
 http://gcc.gnu.org/gcc-4.8/changes.html ?


 They are required to use the latest mingw-w64, the problem was that the
 vfswprintf that libstdc++ expects isn't the same as the one MS provides,
 so I've wrote a redirector to use the vsnwprintf, more precisely, the
 mingw C99 compliant __mingw_vsnwprintf.

 std::to_wstring and std::to_string work according to some simple tests.

Excellent, the testsuite should automatically start running the
relevant tests and we should be able to close
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015

 I guess the current comment about require mingw-w64 trunk at least r5437
 is OK for the changes page. It should probably note that this change is
 mingw-w64 specific, with w64 as the vendor key.

i.e. *-w64-mingw* ?  Or is *-w64-mingw32 more accurate?  (I don't
really know how the mingw target triplets work, sorry!)

I'll put something in the changes page later.

Thanks for fixing this.


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Jan Hubicka
  Index: tree-ssa-loop-niter.c
  ===
  --- tree-ssa-loop-niter.c   (revision 192989)
  @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s
 return true;
   }
   
  -/* Frees the information on upper bounds on numbers of iterations of LOOP. 
   */
  -
 
 Needs a comment.

Yep,
also the reason I export it is because I use in other patch.
(I think we should not hook the bounds into the loop structure and keep them 
dead,
so i want the max_iter*friends to free them unless they are asked to preserve 
and
explicitely free in the two users of them).
  +static bool
  +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int npeeled)
  +{
  +  struct nb_iter_bound *elt;
  +  bool changed = false;
  +
  +  for (elt = loop-bounds; elt; elt = elt-next)
  +{
  +  /* If statement is known to be undefined after peeling, turn it
  +into unreachable (or trap when debugging experience is supposed
  +to be good).  */
  +  if (!elt-is_exit
  +  elt-bound.ule (double_int::from_uhwi (npeeled)))
  +   {
  + gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt);
  + gimple stmt = gimple_build_call
  + (builtin_decl_implicit (optimize_debug
  + ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE),
 
 I'm not sure we should do different things for -Og here.  In fact there
 is no unrolling done for -Og at all, so just use unreachable.

OK, I was thinking about BUILT_IN_TRAP for -O1 too, but I am not sure either.
i guess we will gather some experience on how much are users confused.

Why we do ont inline at -Og?
  +  /* If we know the exit will be taken after peeling, update.  */
  +  else if (elt-is_exit
  +   elt-bound.ule (double_int::from_uhwi (npeeled)))
  +   {
  + basic_block bb = gimple_bb (elt-stmt);
  + edge exit_edge = EDGE_SUCC (bb, 0);
  +
  + if (dump_file  (dump_flags  TDF_DETAILS))
  +   {
  + fprintf (dump_file, Forced exit to be taken: );
  + print_gimple_stmt (dump_file, elt-stmt, 0, 0);
  +   }
  + if (!loop_exit_edge_p (loop, exit_edge))
  +   exit_edge = EDGE_SUCC (bb, 1);
  + if (exit_edge-flags  EDGE_TRUE_VALUE)
 
 I think we can have abnormal/eh exit edges.  So I'm not sure you
 can, without checking, assume the block ends in a GIMPLE_COND.

We can't - those are only edges that are found by the IV analysis and they
always test IV with some bound. (it is all done by number_of_iterations_exit)
 
 See above.  Otherwise the overall idea sounds fine.

Similarly here, simple exits are always conditionals. 


Thanks a lot!
Honza


Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*

2012-10-31 Thread Joern Rennecke

Quoting Richard Sandiford rdsandif...@googlemail.com:


I can't approve the whole thing of course, but I like the idea.
However...

Joern Rennecke joern.renne...@embecosm.com writes:

+@deftypevr {Target Hook} bool TARGET_HAVE_CC0
+@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC
+@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS
+@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED
+@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
+These flags are automatically generated;  you should not override   
them in @file{tm.c}.

+@end deftypevr


Unless this goes against something already discussed, I think it'd be
better to leave these out until there's a concerted attempt to use them
somewhere.  On its own this isn't even a partial transition. :-)


Well, I was tempted to do a full transition, but then realized that that'd
never get reviewed, so I thought a good start to get out of the hole we
got ourselves into is to stop digging.  Well, maybe we can't even do that
before 4.8 ... whatever.  I have removed that bit from the patch.


+  /* We make an exception here to provide stub definitions for
+ insn_*_length* / get_attr_enabled functions.  */
+  puts (#if !HAVE_ATTR_length\n
+   extern int hook_int_rtx_0 (rtx);\n
+   #define insn_default_length hook_int_rtx_0\n
+   #define insn_min_length hook_int_rtx_0\n
+   #define insn_variable_length_p hook_int_rtx_0\n
+   #define insn_current_length hook_int_rtx_0\n
+   #include \insn-addr.h\\n
+   #endif\n


I'd prefer defaults that call gcc_unreachable, rather than silently
return an arbitrary value.


I'm using a new hook hook_int_rtx_unreachable for this purpose.

tested with config-list.mk on x86_64-unknown-linux-gnu.

2012-10-30  Joern Rennecke  joern.renne...@embecosm.com

* doc/md.texi (Defining Attributes): Document that we are defining
HAVE_ATTR_name macors as 1 for defined attributes, and as 0
for undefined special attributes.
* final.c (asm_insn_count, align_fuzz): Always define.
(insn_current_reference_address): Likewise.
(init_insn_lengths): Use if (HAVE_ATTR_length) instead of
#ifdef HAVE_ATTR_length.
(get_attr_length_1, shorten_branches, final): Likewise.
(final_scan_insn, output_asm_name): Likewise.
* genattr.c (gen_attr): Define HAVE_ATTR_name macros for
defined attributes as 1.
Remove ancient get_attr_alternative compatibility code.
For special purpose attributes not provided, define HAVE_ATTR_name
as 0.
In case no length attribute is given, provide stub definitions
for insn_*_length* functions, and also include insn-addr.h.
In case no enabled attribute is given, provide stub definition.
* genattrtab.c (write_length_unit_log): Always write a definition.
* hooks.c (hook_int_rtx_1, hook_int_rtx_unreachable): New functions.
* hooks.h (hook_int_rtx_1, hook_int_rtx_unreachable): Declare.
* lra-int.h (struct lra_insn_recog_data): Make member
alternative_enabled_p unconditional.
* lra.c (free_insn_recog_data): Use if (HAVE_ATTR_length) instead of
#ifdef HAVE_ATTR_length.
(lra_set_insn_recog_data): Likewise.  Make initialization of
alternative_enabled_p unconditional.
(lra_update_insn_recog_data): Use #if instead of #ifdef for
HAVE_ATTR_enabled.
* recog.c [!HAVE_ATTR_enabled] (get_attr_enabled): Don't define.
(extract_insn): Check HAVE_ATTR_enabled.
(gate_handle_split_before_regstack): Use #if instead of
#if defined for HAVE_ATTR_length.

Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi (revision 193015)
+++ gcc/doc/md.texi (working copy)
@@ -7566,7 +7566,7 @@ (define_attr type branch,fp,load,stor
 the following lines will be written to the file @file{insn-attr.h}.
 
 @smallexample
-#define HAVE_ATTR_type
+#define HAVE_ATTR_type 1
 enum attr_type @{TYPE_BRANCH, TYPE_FP, TYPE_LOAD,
  TYPE_STORE, TYPE_ARITH@};
 extern enum attr_type get_attr_type ();
@@ -7591,6 +7591,10 @@ extern enum attr_type get_attr_type ();
 generation. @xref{Disable Insn Alternatives}.
 @end table
 
+For each of these special attributes, the corresponding
+@samp{HAVE_ATTR_@var{name}} @samp{#define} is also written when the
+attribute is not defined; in that case, it is defined as @samp{0}.
+
 @findex define_enum_attr
 @anchor{define_enum_attr}
 Another way of defining an attribute is to use:
Index: gcc/hooks.c
===
--- gcc/hooks.c (revision 193015)
+++ gcc/hooks.c (working copy)
@@ -203,6 +203,18 @@ hook_int_rtx_0 (rtx a ATTRIBUTE_UNUSED)
 }
 
 int
+hook_int_rtx_1 (rtx)
+{
+  return 1;
+}
+
+int
+hook_int_rtx_unreachable (rtx)
+{
+  gcc_unreachable ();
+}
+
+int
 hook_int_rtx_bool_0 (rtx a ATTRIBUTE_UNUSED, bool b ATTRIBUTE_UNUSED)
 {
   

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is bigger, 
 we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way that
 the
 target does it.   Any such instantiation is by definition placing some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two implementations
 of functions that perform integer math on two-HWI sized integers.  What
 I also don't like too much is that we have two different interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from the
 compiler.  All the necessary semantics and code of double-int _has_ been
 refactored into wide-int already.  Changing wide-int in any way to vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and must 
 then
 be removed.  The path forward is the complete removal of double-int; it 
 is
 wrong, has been wrong and always will be wrong, nothing can change that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed reasons_.

 Richard.

 i do not know why you believe this about the speed. double int always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done natively.
 I seriously doubt that you are going to do TI mode math much faster than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative way and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one 
 operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were
 complaining about too many fused ops.  Clearly this one could be
 removed in favour of separate and() and not() operations, but why
 not provide a fused one if there are clients who'll make use of it?

 I was more concerned about fused operations that use 

[Patch] Update libquadmath from GLIBC

2012-10-31 Thread Tobias Burnus

Hi all,

libquadmath's math functions are based on (but not identical to) GLIBC's 
sysdeps/ieee754/ldbl-128 functions. In the attached patch, I have ported 
the bug fixes from GLIBC over to libquadmath. Hopefully, the port is 
complete and correct.


I intent to commit the patch soon, unless there are other comments or 
suggestions.


(It was build and tested on x86-64-gnu-linux. Unfortunately, we do not 
have a test suite for it, except for two minor tests in gfortran.dg.)


Tobias
2012-10-31  Tobias Burnus  bur...@net-b.de
	Joseph Myers jos...@codesourcery.com
	David S. Miller da...@davemloft.net
	Ulrich Drepper drep...@redhat.com
	Marek Polacek pola...@redhat.com:
	Petr Baudis pa...@suse.cz

	* math/expm1q.c (expm1q): Changes from GLIBC. Use expq for
	large parameters. Fix errno for boundary conditions.
	* math/finiteq.c (finiteq): Add comment.
	* math/fmaq.c (fmaq): Changes from GLIBC. Fix missing underflows
	and bad results for some subnormal results. Fix sign of inexact
	zero return. Fix sign of exact zero return.
	Ensure additions are not scheduled after fetestexcept.
	* math/jnq.c (jnq): Changes from GLIBC. Set up errno properly
	for ynq. Fix jnq precision.
	* math/nearbyintq.c (nearbyintq): Changes from GLIBC. Do not
	manipulate bits before adding and subtracting TWO112[sx].
	* math/rintq.c (rintq): Ditto.
	* math/scalbnq.c (scalbnq): Changes from GLIBC. Fix integer
	overflow.

diff --git a/libquadmath/math/expm1q.c b/libquadmath/math/expm1q.c
index 510c98f..cd254b5 100644
--- a/libquadmath/math/expm1q.c
+++ b/libquadmath/math/expm1q.c
@@ -53,6 +53,7 @@
 
 
 
+#include errno.h
 #include quadmath-imp.h
 
 /* exp(x) - 1 = x + 0.5 x^2 + x^3 P(x)/Q(x)
@@ -100,6 +101,11 @@ expm1q (__float128 x)
   ix = u.words32.w0;
   sign = ix  0x8000;
   ix = 0x7fff;
+  if (!sign  ix = 0x4006)
+{
+  /* If num is positive and exp = 6 use plain exp.  */
+  return expq (x);
+}
   if (ix = 0x7fff)
 {
   /* Infinity. */
@@ -120,7 +126,10 @@ expm1q (__float128 x)
 
   /* Overflow.  */
   if (x  maxlog)
-return (HUGE_VALQ * HUGE_VALQ);
+{
+  __set_errno (ERANGE);
+  return (HUGE_VALQ * HUGE_VALQ);
+}
 
   /* Minimum value.  */
   if (x  minarg)
diff --git a/libquadmath/math/finiteq.c b/libquadmath/math/finiteq.c
index f22e9d7..663c928 100644
--- a/libquadmath/math/finiteq.c
+++ b/libquadmath/math/finiteq.c
@@ -15,6 +15,11 @@
 
 #include quadmath-imp.h
 
+/*
+ * finiteq(x) returns 1 is x is finite, else 0;
+ * no branching!
+ */
+
 int
 finiteq (const __float128 x)
 {
diff --git a/libquadmath/math/fmaq.c b/libquadmath/math/fmaq.c
index 126b0a2..cd1b232 100644
--- a/libquadmath/math/fmaq.c
+++ b/libquadmath/math/fmaq.c
@@ -1,5 +1,5 @@
 /* Compute x * y + z as ternary operation.
-   Copyright (C) 2010 Free Software Foundation, Inc.
+   Copyright (C) 2010-2012 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Jakub Jelinek ja...@redhat.com, 2010.
 
@@ -57,6 +57,11 @@ fmaq (__float128 x, __float128 y, __float128 z)
 	   u.ieee.exponent != 0x7fff
v.ieee.exponent != 0x7fff)
 	return (z + x) + y;
+  /* If z is zero and x are y are nonzero, compute the result
+	 as x * y to avoid the wrong sign of a zero result if x * y
+	 underflows to 0.  */
+  if (z == 0  x != 0  y != 0)
+	return x * y;
   /* If x or y or z is Inf/NaN, or if fma will certainly overflow,
 	 or if x * y is less than half of FLT128_DENORM_MIN,
 	 compute as x * y + z.  */
@@ -136,6 +141,11 @@ fmaq (__float128 x, __float128 y, __float128 z)
   y = v.value;
   z = w.value;
 }
+
+  /* Ensure correct sign of exact 0 + 0.  */
+  if (__builtin_expect ((x == 0 || y == 0)  z == 0, 0))
+return x * y + z;
+
   /* Multiplication m1 + m2 = x * y using Dekker's algorithm.  */
 #define C ((1LL  (FLT128_MANT_DIG + 1) / 2) + 1)
   __float128 x1 = x * C;
@@ -191,7 +201,7 @@ fmaq (__float128 x, __float128 y, __float128 z)
 #endif
   v.value = a1 + u.value;
   /* Ensure the addition is not scheduled after fetestexcept call.  */
-  asm volatile ( : : m (v));
+  asm volatile ( : : m (v.value));
 #ifdef USE_FENV_H
   int j = fetestexcept (FE_INEXACT) != 0;
   feupdateenv (env);
@@ -220,20 +230,14 @@ fmaq (__float128 x, __float128 y, __float128 z)
 	{
 	  /* v.ieee.mant_low  2 is LSB bit of the result before rounding,
 	 v.ieee.mant_low  1 is the round bit and j is our sticky
-	 bit.  In round-to-nearest 001 rounds down like 00,
-	 011 rounds up, even though 01 rounds down (thus we need
-	 to adjust), 101 rounds down like 10 and 111 rounds up
-	 like 11.  */
-	  if ((v.ieee.mant_low  3) == 1)
-	{
-	  v.value *= 0x1p-226Q;
-	  if (v.ieee.negative)
-		return v.value - 0x1p-16494Q /* __FLT128_DENORM_MIN__ */;
-	  else
-		return v.value + 0x1p-16494Q /* __FLT128_DENORM_MIN__ */;
-	}
-	  else
-	return v.value * 0x1p-226Q;
+	 bit. */
+	  w.value = 0.0Q;
+	  

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net 
 wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler includes 
 all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is bigger, 
 we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template 
 templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way that
 the
 target does it.   Any such instantiation is by definition placing some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two implementations
 of functions that perform integer math on two-HWI sized integers.  What
 I also don't like too much is that we have two different interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from the
 compiler.  All the necessary semantics and code of double-int _has_ been
 refactored into wide-int already.  Changing wide-int in any way to vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and must 
 then
 be removed.  The path forward is the complete removal of double-int; it 
 is
 wrong, has been wrong and always will be wrong, nothing can change that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed reasons_.

 Richard.

 i do not know why you believe this about the speed. double int always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done 
 natively.
 I seriously doubt that you are going to do TI mode math much faster than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative way 
 and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one 
 operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were
 complaining about too many fused ops.  Clearly this one could be
 removed in favour of separate and() and not() operations, but why
 not provide a fused one if 

Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*

2012-10-31 Thread Richard Sandiford
Joern Rennecke joern.renne...@embecosm.com writes:
 Quoting Richard Sandiford rdsandif...@googlemail.com:

 I can't approve the whole thing of course, but I like the idea.
 However...

 Joern Rennecke joern.renne...@embecosm.com writes:
 +@deftypevr {Target Hook} bool TARGET_HAVE_CC0
 +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC
 +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS
 +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED
 +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
 +These flags are automatically generated;  you should not override   
 them in @file{tm.c}.
 +@end deftypevr

 Unless this goes against something already discussed, I think it'd be
 better to leave these out until there's a concerted attempt to use them
 somewhere.  On its own this isn't even a partial transition. :-)

 Well, I was tempted to do a full transition, but then realized that that'd
 never get reviewed, so I thought a good start to get out of the hole we
 got ourselves into is to stop digging.  Well, maybe we can't even do that
 before 4.8 ... whatever.  I have removed that bit from the patch.

Thanks.

Sorry, a couple of minor nits that I should probably have picked
up on before:

 @@ -2853,12 +2837,13 @@ final_scan_insn (rtx insn, FILE *file, i
   if (new_rtx == insn  PATTERN (new_rtx) == body)
 fatal_insn (could not split insn, insn);
  
 -#ifdef HAVE_ATTR_length
 - /* This instruction should have been split in shorten_branches,
 -to ensure that we would have valid length info for the
 -splitees.  */
 - gcc_unreachable ();
 -#endif
 + if (HAVE_ATTR_length)
 +   {
 + /* This instruction should have been split in shorten_branches,
 +to ensure that we would have valid length info for the
 +splitees.  */
 + gcc_unreachable ();
 +   }

probably gcc_assert (!HAVE_ATTR_length);

 @@ -3816,7 +3809,7 @@ struct rtl_opt_pass pass_split_after_rel
  static bool
  gate_handle_split_before_regstack (void)
  {
 -#if defined (HAVE_ATTR_length)  defined (STACK_REGS)
 +#if (HAVE_ATTR_length)  defined (STACK_REGS)
/* If flow2 creates new instructions which need splitting
   and scheduling after reload is not done, they might not be
   split until final which doesn't allow splitting
 @@ -3900,7 +3893,7 @@ struct rtl_opt_pass pass_split_before_sc
  static bool
  gate_do_final_split (void)
  {
 -#if defined (HAVE_ATTR_length)  !defined (STACK_REGS)
 +#if (HAVE_ATTR_length)  !defined (STACK_REGS)
return 1;
  #else
return 0;

Redundant brackets around HAVE_ATTR_length.

OK with those changes for the rtl bits.  Can't approve the generator
stuff though.

Richard


Re: [Patch] Update libquadmath from GLIBC

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 1:06 PM, Tobias Burnus bur...@net-b.de wrote:
 Hi all,

 libquadmath's math functions are based on (but not identical to) GLIBC's
 sysdeps/ieee754/ldbl-128 functions.

Heh, last time I copied things from sysdeps/ieee754 into GCC rms objected
and I had to revert ... (libgccmath).

Richard.

 In the attached patch, I have ported the
 bug fixes from GLIBC over to libquadmath. Hopefully, the port is complete
 and correct.

 I intent to commit the patch soon, unless there are other comments or
 suggestions.

 (It was build and tested on x86-64-gnu-linux. Unfortunately, we do not have
 a test suite for it, except for two minor tests in gfortran.dg.)

 Tobias


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

   Index: tree-ssa-loop-niter.c
   ===
   --- tree-ssa-loop-niter.c (revision 192989)
   @@ -3505,15 +3737,11 @@ scev_probably_wraps_p (tree base, tree s
  return true;
}

   -/* Frees the information on upper bounds on numbers of iterations of 
   LOOP.  */
   -
  
  Needs a comment.
 
 Yep,
 also the reason I export it is because I use in other patch.
 (I think we should not hook the bounds into the loop structure and keep them 
 dead,
 so i want the max_iter*friends to free them unless they are asked to preserve 
 and
 explicitely free in the two users of them).
   +static bool
   +remove_exits_and_undefined_stmts (struct loop *loop, unsigned int 
   npeeled)
   +{
   +  struct nb_iter_bound *elt;
   +  bool changed = false;
   +
   +  for (elt = loop-bounds; elt; elt = elt-next)
   +{
   +  /* If statement is known to be undefined after peeling, turn it
   +  into unreachable (or trap when debugging experience is supposed
   +  to be good).  */
   +  if (!elt-is_exit
   +elt-bound.ule (double_int::from_uhwi (npeeled)))
   + {
   +   gimple_stmt_iterator gsi = gsi_for_stmt (elt-stmt);
   +   gimple stmt = gimple_build_call
   +   (builtin_decl_implicit (optimize_debug
   +   ? BUILT_IN_TRAP : BUILT_IN_UNREACHABLE),
  
  I'm not sure we should do different things for -Og here.  In fact there
  is no unrolling done for -Og at all, so just use unreachable.
 
 OK, I was thinking about BUILT_IN_TRAP for -O1 too, but I am not sure either.
 i guess we will gather some experience on how much are users confused.
 
 Why we do ont inline at -Og?

unroll you mean.  Because unrolling mutates the CFG too much.
Well - it was just a starting point, populating -Og with as little
as possible and 100% profitable transforms (in both debug and speed
metric).  In late opts we only do (early opt queue is shared):

  NEXT_PASS (pass_all_optimizations_g);
{
  struct opt_pass **p = pass_all_optimizations_g.pass.sub;
  NEXT_PASS (pass_remove_cgraph_callee_edges);
  NEXT_PASS (pass_strip_predict_hints);
  /* Lower remaining pieces of GIMPLE.  */
  NEXT_PASS (pass_lower_complex);
  NEXT_PASS (pass_lower_vector_ssa);
  /* Perform simple scalar cleanup which is constant/copy propagation.  
*/
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_object_sizes);
  /* Copy propagation also copy-propagates constants, this is 
necessary
 to forward object-size results properly.  */
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_rename_ssa_copies);
  NEXT_PASS (pass_dce);
  /* Fold remaining builtins.  */
  NEXT_PASS (pass_fold_builtins);
  /* ???  We do want some kind of loop invariant motion, but we 
possibly
 need to adjust LIM to be more friendly towards preserving 
accurate
 debug information here.  */
  NEXT_PASS (pass_late_warn_uninitialized);
  NEXT_PASS (pass_uncprop);
  NEXT_PASS (pass_local_pure_const);
}

   +  /* If we know the exit will be taken after peeling, update.  */
   +  else if (elt-is_exit
   + elt-bound.ule (double_int::from_uhwi (npeeled)))
   + {
   +   basic_block bb = gimple_bb (elt-stmt);
   +   edge exit_edge = EDGE_SUCC (bb, 0);
   +
   +   if (dump_file  (dump_flags  TDF_DETAILS))
   + {
   +   fprintf (dump_file, Forced exit to be taken: );
   +   print_gimple_stmt (dump_file, elt-stmt, 0, 0);
   + }
   +   if (!loop_exit_edge_p (loop, exit_edge))
   + exit_edge = EDGE_SUCC (bb, 1);
   +   if (exit_edge-flags  EDGE_TRUE_VALUE)
  
  I think we can have abnormal/eh exit edges.  So I'm not sure you
  can, without checking, assume the block ends in a GIMPLE_COND.
 
 We can't - those are only edges that are found by the IV analysis and they
 always test IV with some bound. (it is all done by number_of_iterations_exit)
  
  See above.  Otherwise the overall idea sounds fine.
 
 Similarly here, simple exits are always conditionals. 

I see.  Then the patch is ok (with the comment added).

Thanks,
Richard.


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
 mikest...@comcast.net wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener 
 richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte 
 mode
 size 4 on a 64bit host.

 I do not believe that this can happen.  The core compiler
 includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is
 bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template
 templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way 
 that
 the
 target does it.   Any such instantiation is by definition placing 
 some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two implementations
 of functions that perform integer math on two-HWI sized integers.  
 What
 I also don't like too much is that we have two different interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from the
 compiler.  All the necessary semantics and code of double-int _has_ 
 been
 refactored into wide-int already.  Changing wide-int in any way to vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and
 must then
 be removed.  The path forward is the complete removal of
 double-int; it is
 wrong, has been wrong and always will be wrong, nothing can change 
 that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed 
 reasons_.

 Richard.

 i do not know why you believe this about the speed. double int always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done
 natively.
 I seriously doubt that you are going to do TI mode math much faster than 
 i
 do it and if you do who cares.

 the number of calls does not effect the performance in any
 negative way and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one
 operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were
 complaining about too many fused ops.  Clearly this one could be
 removed in favour of separate and() 

Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Jan Hubicka
 
 unroll you mean.  Because unrolling mutates the CFG too much.
 Well - it was just a starting point, populating -Og with as little
 as possible and 100% profitable transforms (in both debug and speed
 metric).  In late opts we only do (early opt queue is shared):

Well, and what about early cunrolli?
 
   NEXT_PASS (pass_all_optimizations_g);
 {
   struct opt_pass **p = pass_all_optimizations_g.pass.sub;
   NEXT_PASS (pass_remove_cgraph_callee_edges);
   NEXT_PASS (pass_strip_predict_hints);
   /* Lower remaining pieces of GIMPLE.  */
   NEXT_PASS (pass_lower_complex);
   NEXT_PASS (pass_lower_vector_ssa);
   /* Perform simple scalar cleanup which is constant/copy propagation.  
 */
   NEXT_PASS (pass_ccp);
   NEXT_PASS (pass_object_sizes);
   /* Copy propagation also copy-propagates constants, this is 
 necessary
  to forward object-size results properly.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_rename_ssa_copies);
   NEXT_PASS (pass_dce);
   /* Fold remaining builtins.  */
   NEXT_PASS (pass_fold_builtins);
   /* ???  We do want some kind of loop invariant motion, but we 
 possibly
  need to adjust LIM to be more friendly towards preserving 
 accurate
  debug information here.  */
   NEXT_PASS (pass_late_warn_uninitialized);
   NEXT_PASS (pass_uncprop);
   NEXT_PASS (pass_local_pure_const);
 }
 
+  /* If we know the exit will be taken after peeling, update.  */
+  else if (elt-is_exit
+   elt-bound.ule (double_int::from_uhwi (npeeled)))
+   {
+ basic_block bb = gimple_bb (elt-stmt);
+ edge exit_edge = EDGE_SUCC (bb, 0);
+
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ fprintf (dump_file, Forced exit to be taken: );
+ print_gimple_stmt (dump_file, elt-stmt, 0, 0);
+   }
+ if (!loop_exit_edge_p (loop, exit_edge))
+   exit_edge = EDGE_SUCC (bb, 1);
+ if (exit_edge-flags  EDGE_TRUE_VALUE)
   
   I think we can have abnormal/eh exit edges.  So I'm not sure you
   can, without checking, assume the block ends in a GIMPLE_COND.
  
  We can't - those are only edges that are found by the IV analysis and they
  always test IV with some bound. (it is all done by 
  number_of_iterations_exit)
   
   See above.  Otherwise the overall idea sounds fine.
  
  Similarly here, simple exits are always conditionals. 
 
 I see.  Then the patch is ok (with the comment added).

Thanks, i will ad comment and privatize free_loop_bounds for now.

Honza
 
 Thanks,
 Richard.


Modernize loop_finite_p

2012-10-31 Thread Jan Hubicka
Hi,
this patch changes finite_loop_p to use max_loop_iterations.  Long time ago I 
made
finite_loop_p as rip-off from the max_loop_iterations skipping parts that are 
not
exactly related to the number of iteration estimates.  It went out of date 
since then
completelly missing the bounds derived from overflows and array bounds that are 
quite
useful.

So I guess these days it is better to simply implement it using 
max_loop_iterations,
we do not save that much of effort and we can store the result.

Bootstrapped/regtested x86_64, OK?

Honza

* tree-ssa-loop-niter.c (finite_loop_p): Reorg to use 
max_loop_iterations.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192991)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -1993,11 +1990,7 @@ find_loop_niter (struct loop *loop, edge
 bool
 finite_loop_p (struct loop *loop)
 {
-  unsigned i;
-  VEC (edge, heap) *exits;
-  edge ex;
-  struct tree_niter_desc desc;
-  bool finite = false;
+  double_int nit;
   int flags;
 
   if (flag_unsafe_loop_optimizations)
@@ -2011,26 +2004,22 @@ finite_loop_p (struct loop *loop)
   return true;
 }
 
-  exits = get_loop_exit_edges (loop);
-  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+  if (loop-any_upper_bound)
 {
-  if (!just_once_each_iteration_p (loop, ex-src))
-   continue;
+  if (dump_file  (dump_flags  TDF_DETAILS))
+   fprintf (dump_file, Found loop %i to be finite: upper bound is 
recorded.\n,
+loop-num);
+  return true;
+}
 
-  if (number_of_iterations_exit (loop, ex, desc, false))
-{
- if (dump_file  (dump_flags  TDF_DETAILS))
-   {
- fprintf (dump_file, Found loop %i to be finite: iterating , 
loop-num);
- print_generic_expr (dump_file, desc.niter, TDF_SLIM);
- fprintf (dump_file,  times\n);
-   }
- finite = true;
- break;
-   }
+  if (max_loop_iterations (loop, nit))
+{
+  if (dump_file  (dump_flags  TDF_DETAILS))
+   fprintf (dump_file, Found loop %i to be finite: upper bound found.\n,
+loop-num);
+  return true;
 }
-  VEC_free (edge, heap, exits);
-  return finite;
+  return false;
 }
 
 /*


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

  
  unroll you mean.  Because unrolling mutates the CFG too much.
  Well - it was just a starting point, populating -Og with as little
  as possible and 100% profitable transforms (in both debug and speed
  metric).  In late opts we only do (early opt queue is shared):
 
 Well, and what about early cunrolli?

It's not there (and I would disable it for -Og).

Richard.


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Joern Rennecke

Quoting Richard Sandiford rdsandif...@googlemail.com:


It's about describing complex interactions of length adjustments that
derive from branch shortening and length added for (un)alignment for
scheduling purposes.  Expressed naively, these can lead to cycles.


But shorten_branches should be written to avoid cycles, and I thought
your patch did that by making sure that the address of each instruction
only increases.


This actually gives a very poor branch shortening result for ARC.
What I did before with the lock_length attribute was only locking in increases
for the length of branches, and preserving a complex set of invariants to
avoid cycles from the (mis)alignment padding.

Saying that for some instructions, the length locking comes in effect
only after a few iterations of not permanently increasing the size of
other instructions is still cycle-safe.


The length variation for the ARC are not alike: there are branches that
are subject to branch shortening in the usual way, but they might
shrink when other things shrink.  When we are iterating starting at
minimal length and increasing lengths, these branches should be stopped
from shrinking, as they likly will grow back again in the next iteration.
OTOH there are potentially short instructions that are lengthened to
archive scheduling-dictated alignment or misalignment.  These should
be allowed to change freely back and forth.  Unless we have a rare
cycle that only depends on alignments.


Hmm, but this is still talking in terms of shorten_branches, rather than
the target property that we're trying to model.  It sounds like the property
is something along the lines of: some instructions have different encodings
(or work-alikes with different encodings), and we want to make all the
encodings available to the optimisers.  Is that right?


No, not really.  There are lots of redundant encodings, but we have only
one favourite encoding for each size.  We usually want the shortest version,
unless we want to tweak the alignment of a following instruction, or in
the case of unaligned conditional branches with unfilled delay slot, to
hide the branch penalty (don't ask me why it has that effect...).


If so, that sounds
like a useful thing to model, but I think it should be modelled directly,
rather than as a modification to the shorten_branches algorithm.


The shorten_branches algorithm can't give the correct result without
taking these things into account.  Also, instruction lengths are needed
to know the requirements.  So, do you want me to run a scheduling
pass in each iteration of shorten_branches?


Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
Using labels, or something else?


No, it works with instruction addresses during branch shortening, and
extra bits in cfun-machine during instruction output to keep track of
the current alignment.


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote:
 On Wed, 31 Oct 2012, Jan Hubicka wrote:
   unroll you mean.  Because unrolling mutates the CFG too much.
   Well - it was just a starting point, populating -Og with as little
   as possible and 100% profitable transforms (in both debug and speed
   metric).  In late opts we only do (early opt queue is shared):
  
  Well, and what about early cunrolli?
 
 It's not there (and I would disable it for -Og).

Generally, most of the loop transforms are undesirable for -Og.

Jakub


Re: Modernize loop_finite_p

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

 Hi,
 this patch changes finite_loop_p to use max_loop_iterations.  Long time ago I 
 made
 finite_loop_p as rip-off from the max_loop_iterations skipping parts that are 
 not
 exactly related to the number of iteration estimates.  It went out of date 
 since then
 completelly missing the bounds derived from overflows and array bounds that 
 are quite
 useful.
 
 So I guess these days it is better to simply implement it using 
 max_loop_iterations,
 we do not save that much of effort and we can store the result.
 
 Bootstrapped/regtested x86_64, OK?
 
 Honza
 
   * tree-ssa-loop-niter.c (finite_loop_p): Reorg to use 
 max_loop_iterations.
 Index: tree-ssa-loop-niter.c
 ===
 --- tree-ssa-loop-niter.c (revision 192991)
 +++ tree-ssa-loop-niter.c (working copy)
 @@ -1993,11 +1990,7 @@ find_loop_niter (struct loop *loop, edge
  bool
  finite_loop_p (struct loop *loop)
  {
 -  unsigned i;
 -  VEC (edge, heap) *exits;
 -  edge ex;
 -  struct tree_niter_desc desc;
 -  bool finite = false;
 +  double_int nit;
int flags;
  
if (flag_unsafe_loop_optimizations)
 @@ -2011,26 +2004,22 @@ finite_loop_p (struct loop *loop)
return true;
  }
  
 -  exits = get_loop_exit_edges (loop);
 -  FOR_EACH_VEC_ELT (edge, exits, i, ex)
 +  if (loop-any_upper_bound)
  {
 -  if (!just_once_each_iteration_p (loop, ex-src))
 - continue;
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 + fprintf (dump_file, Found loop %i to be finite: upper bound is 
 recorded.\n,
 +  loop-num);
 +  return true;
 +}

This looks redundant with ...

 -  if (number_of_iterations_exit (loop, ex, desc, false))
 -{
 -   if (dump_file  (dump_flags  TDF_DETAILS))
 - {
 -   fprintf (dump_file, Found loop %i to be finite: iterating , 
 loop-num);
 -   print_generic_expr (dump_file, desc.niter, TDF_SLIM);
 -   fprintf (dump_file,  times\n);
 - }
 -   finite = true;
 -   break;
 - }
 +  if (max_loop_iterations (loop, nit))
 +{

... this.  If you want to short-circuit max_loop_iterations ()
then why not

   if (loop-any_upper_bound
   || max_loop_iterations (loop, nit))
 {
   ...

?  The different dumping seems not a good reason to obfuscate it.

Richard.

 +  if (dump_file  (dump_flags  TDF_DETAILS))
 + fprintf (dump_file, Found loop %i to be finite: upper bound found.\n,
 +  loop-num);
 +  return true;
  }
 -  VEC_free (edge, heap, exits);
 -  return finite;
 +  return false;
  }
  
  /*


Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jakub Jelinek wrote:

 On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote:
  On Wed, 31 Oct 2012, Jan Hubicka wrote:
unroll you mean.  Because unrolling mutates the CFG too much.
Well - it was just a starting point, populating -Og with as little
as possible and 100% profitable transforms (in both debug and speed
metric).  In late opts we only do (early opt queue is shared):
   
   Well, and what about early cunrolli?
  
  It's not there (and I would disable it for -Og).
 
 Generally, most of the loop transforms are undesirable for -Og.

Maybe with the exception of un-looping once rolling loops.  But
yes, even loop header copying is undesirable in general.

Richard.


Re: [PATCH] Fix PR53501

2012-10-31 Thread Eric Botcazou
 Your change caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142

Please check whether it worked before Richard's fix (r188009).

-- 
Eric Botcazou


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
 mikest...@comcast.net wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener 
 richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte 
 mode
 size 4 on a 64bit host.

 I do not believe that this can happen.  The core compiler
 includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes that
 are not power of two, and, if there isn't an int mode that is
 bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide specializations 
 of
 wide_int
 for small sizes, thus ideally wide_int would be a template
 templated
 on the number of HWIs in val.  Interface-wise wide_int2 should be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for 
 it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int. This
 goal of
 this patch is to get the compiler to do the constant math the way 
 that
 the
 target does it.   Any such instantiation is by definition placing 
 some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two 
 implementations
 of functions that perform integer math on two-HWI sized integers.  
 What
 I also don't like too much is that we have two different interfaces 
 to
 operate
 on them!  Can't you see how I come to not liking this?  Especially 
 the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int is a
 waste of time, as the time is better spent removing double-int from 
 the
 compiler.  All the necessary semantics and code of double-int _has_ 
 been
 refactored into wide-int already.  Changing wide-int in any way to 
 vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and
 must then
 be removed.  The path forward is the complete removal of
 double-int; it is
 wrong, has been wrong and always will be wrong, nothing can change 
 that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at 
 least_
 templating wide_int on 'len' and providing specializations for 1 and 2.

 It might be a non-issue for math that operates on trees or RTXen due to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed 
 reasons_.

 Richard.

 i do not know why you believe this about the speed. double int 
 always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the precision's
 size, it almost never does uses synthetic operations since the sizes for
 almost every instance can be done using the native math on the machine.
 almost every call has a check to see if the operation can be done
 natively.
 I seriously doubt that you are going to do TI mode math much faster 
 than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any
 negative way and
 it fact is more efficient since common things that require more than one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
 }

 are always going to be faster than conditionally executing only one
 operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were

Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 1:33 PM, Joern Rennecke
joern.renne...@embecosm.com wrote:
 Quoting Richard Sandiford rdsandif...@googlemail.com:

 It's about describing complex interactions of length adjustments that
 derive from branch shortening and length added for (un)alignment for
 scheduling purposes.  Expressed naively, these can lead to cycles.


 But shorten_branches should be written to avoid cycles, and I thought
 your patch did that by making sure that the address of each instruction
 only increases.


 This actually gives a very poor branch shortening result for ARC.
 What I did before with the lock_length attribute was only locking in
 increases
 for the length of branches, and preserving a complex set of invariants to
 avoid cycles from the (mis)alignment padding.

 Saying that for some instructions, the length locking comes in effect
 only after a few iterations of not permanently increasing the size of
 other instructions is still cycle-safe.


 The length variation for the ARC are not alike: there are branches that
 are subject to branch shortening in the usual way, but they might
 shrink when other things shrink.  When we are iterating starting at
 minimal length and increasing lengths, these branches should be stopped
 from shrinking, as they likly will grow back again in the next iteration.
 OTOH there are potentially short instructions that are lengthened to
 archive scheduling-dictated alignment or misalignment.  These should
 be allowed to change freely back and forth.  Unless we have a rare
 cycle that only depends on alignments.


 Hmm, but this is still talking in terms of shorten_branches, rather than
 the target property that we're trying to model.  It sounds like the
 property
 is something along the lines of: some instructions have different
 encodings
 (or work-alikes with different encodings), and we want to make all the
 encodings available to the optimisers.  Is that right?


 No, not really.  There are lots of redundant encodings, but we have only
 one favourite encoding for each size.  We usually want the shortest version,
 unless we want to tweak the alignment of a following instruction, or in
 the case of unaligned conditional branches with unfilled delay slot, to
 hide the branch penalty (don't ask me why it has that effect...).

Maybe we should split the thing then into a adjust_insn_length attribute
without the iteration parameter and a hook, similar to the various scheduler
hooks, that is special for branch shortening.

Richard.


 If so, that sounds
 like a useful thing to model, but I think it should be modelled directly,
 rather than as a modification to the shorten_branches algorithm.


 The shorten_branches algorithm can't give the correct result without
 taking these things into account.  Also, instruction lengths are needed
 to know the requirements.  So, do you want me to run a scheduling
 pass in each iteration of shorten_branches?


 Is the alignment/misalignment explicitly modelled in the rtl?  If so, how?
 Using labels, or something else?


 No, it works with instruction addresses during branch shortening, and
 extra bits in cfun-machine during instruction output to keep track of
 the current alignment.


Re: Fix bugs introduced by switch-case profile propagation

2012-10-31 Thread Oleg Endo
On Fri, 2012-10-26 at 17:05 +0200, Jan Hubicka wrote:
  Hi,
  
  On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
   Ping.
  
  
   On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com 
   wrote:
Hi,
 This patch fixes bugs introduced by my previous patch to propagate
profiles during switch expansion. Bootstrap and profiledbootstrap
successful on x86_64. Confirmed that it fixes the crashes reported in
PR middle-end/54957. OK for trunk?
   
- Easwaran
   
2012-10-17   Easwaran Raman  era...@google.com
   
PR target/54938
PR middle-end/54957
* optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
only if it doesn't already exist.
* except.c (sjlj_emit_function_enter): Remove unused variable.
* stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
  
   Seems fine, but under what conditions you get NULL here?
  Wasn't sure if this is an OK for the patch or if I need to address
  anything else.
 
 Actually I think you should make the except.c to setcurrent_bb when expanding
 the switch instead.
 OK with this change.

Is there any progress regarding this issue?
It makes testing on SH against current trunk difficult.  Would it be OK
to at least commit this hunk?

Index: gcc/optabs.c
===
--- gcc/optabs.c(revision 192963)
+++ gcc/optabs.c(working copy)
@@ -4270,8 +4270,8 @@
JUMP_P (insn)
any_condjump_p (insn))
 {
-  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
-  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
+  if (!find_reg_note (insn, REG_BR_PROB, 0))
+add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 }
 }
 

I've tested this on rev 192983 with
make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

and it looks OK.

Cheers,
Oleg



Re: [PATCH] Fix PR53501

2012-10-31 Thread H.J. Lu
On Wed, Oct 31, 2012 at 5:42 AM, Eric Botcazou ebotca...@adacore.com wrote:
 Your change caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55142

 Please check whether it worked before Richard's fix (r188009).


It failed with revision 188008.

-- 
H.J.


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from (was Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon)

2012-10-31 Thread JonY
On 10/31/2012 20:01, Jonathan Wakely wrote:
 On 31 October 2012 11:23, JonY wrote:
 On 10/31/2012 19:12, Jonathan Wakely wrote:

 It looks like the workaround is in mingw not in GCC, so is it a
 problem that it won't be possible to use GCC 4.8 with existing mingw
 versions, or are users required to use a brand new mingw to use a new
 GCC?  Should that be documented in
 http://gcc.gnu.org/gcc-4.8/changes.html ?


 They are required to use the latest mingw-w64, the problem was that the
 vfswprintf that libstdc++ expects isn't the same as the one MS provides,
 so I've wrote a redirector to use the vsnwprintf, more precisely, the
 mingw C99 compliant __mingw_vsnwprintf.

 std::to_wstring and std::to_string work according to some simple tests.
 
 Excellent, the testsuite should automatically start running the
 relevant tests and we should be able to close
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52015
 

Yes, the correct way would be to check if the prototype given matches
the ISO version. MS version has 1 less argument. The workaround is only
active when C++11 is used though to maintain compatibility to older code.

The actual guard in the mingw-w64:

#if defined(__cplusplus)  (__cplusplus = 201103L) 
!defined(_GLIBCXX_HAVE_BROKEN_VSWPRINTF)
...Redirect to ISO C99 compliant adapter with 4 args...
#else
...Use MS Version with 3 args...
#endif

So the tests and checks would just need to use -std=c++11 or gnu++11.

 I guess the current comment about require mingw-w64 trunk at least r5437
 is OK for the changes page. It should probably note that this change is
 mingw-w64 specific, with w64 as the vendor key.
 
 i.e. *-w64-mingw* ?  Or is *-w64-mingw32 more accurate?  (I don't
 really know how the mingw target triplets work, sorry!)
 

In practice, *-w64-mingw32 is used more commonly when specifying
triplet, though for flexibility, checking should use *-w64-mingw*.

 I'll put something in the changes page later.
 
 Thanks for fixing this.
 

Thanks to Paolo for pushing it in too.




signature.asc
Description: OpenPGP digital signature


Re: Adapt one fold-const optimization for vectors

2012-10-31 Thread Marc Glisse

On Wed, 31 Oct 2012, Jakub Jelinek wrote:


On Tue, Oct 30, 2012 at 01:14:40PM +0100, Marc Glisse wrote:

On Tue, 30 Oct 2012, Marek Polacek wrote:


On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:

Hello,

one more optimization that needed help for vectors, it crashed on
(xy)0. Because of PR 55001, testcases are awkward to add (I could
do a x86-only one if needed).

bootstrap+testsuite.

2012-10-30  Marc Glisse  marc.gli...@inria.fr

* fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
(fold_binary_loc): call it for VEC_COND_EXPR.


Capital C instead of lowercase.


Ok.


Patch missing?


Indeed, thanks for the notice.


This regressed
FAIL: g++.dg/other/vector-compare.C (internal compiler error)
FAIL: g++.dg/other/vector-compare.C (test for excess errors)
on i686-linux.


Thanks. I am surprised, I expected this problem to happen only for code 
where the optimization was previously crashing. Actually, before my patch, 
the code was apparently silently miscompiled. We appear to be missing code 
that would check that the result of a vector operation is a vector (in 
particular not a boolean).



The problem is that tree-vect-generic.c doesn't expand
BLKmode VEC_COND_EXPR piecewise.  So, either fold-const on this optimization
needs to test whether expand_vec_cond_expr_p is true for the chosen
pair of comparison type and operand type, or tree-vect-generic.c needs to be
tought to expand VEC_COND_EXPR piecewise if expand_vec_cond_expr_p is not
true for the particular VEC_COND_EXPR.


The second one is needed anyway (that's PR 55001, which I mention in my 
email). One quick hack as long as it isn't done would be to replace:

cond_code = VEC_COND_EXPR;
with
return NULL_TREE;
(and a comment linking to PR55001 and this conversation).

Now I am a bit afraid that the vector lowering pass is not the last one, 
and some later pass might call fold and re-create unexpandable vector 
operations, in which case the first option might also be needed... but I 
hope not.


--
Marc Glisse


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Joern Rennecke

Quoting Richard Biener richard.guent...@gmail.com:


Maybe we should split the thing then into a adjust_insn_length attribute
without the iteration parameter


Attributes don't get any parameter but the instruction, and don't apply
to delay slot SEQUENCEs.


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke
joern.renne...@embecosm.com wrote:
 Quoting Richard Biener richard.guent...@gmail.com:

 Maybe we should split the thing then into a adjust_insn_length attribute
 without the iteration parameter


 Attributes don't get any parameter but the instruction, and don't apply
 to delay slot SEQUENCEs.

hook I meant, not attribute.

Richard.


Re: [Ada] Fix Makefile dependencies of C files

2012-10-31 Thread Diego Novillo
On Tue, Oct 30, 2012 at 6:50 PM, Eric Botcazou ebotca...@adacore.com wrote:
 This will hopefully fix the build failure reported by Diego.  Apart from
 adding missing dependencies, this also removes redundant command lines.

 Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch.


 2012-10-30  Eric Botcazou  ebotca...@adacore.com

 * gcc-interface/Make-lang.in: Fix and clean up rules for C files.

Thanks!  I'll update my tree and give it a try.


Diego.


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 But that means that wide_int has to model a P-bit operation as a
 normal len*HOST_WIDE_INT operation and then fix up the result
 after the fact, which seems unnecessarily convoluted.

 It does that right now.  The operations are carried out in a loop
 over len HOST_WIDE_INT parts, the last HWI is then special-treated
 to account for precision/size.  (yes, 'len' is also used as optimization - the
 fact that len ends up being mutable is another thing I dislike about
 wide-int.  If wide-ints are cheap then all ops should be non-mutating
 (at least to 'len')).

But the point of having a mutating len is that things like zero and -1
are common even for OImode values.  So if you're doing someting potentially
expensive like OImode multiplication, why do it to the number of
HOST_WIDE_INTs needed for an OImode value when the value we're
processing has only one significant HOST_WIDE_INT?

  I still don't
 see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
 X*HOST_WIDE_INT operation for any X) has any special meaning.

 Well, the same reason as a HOST_WIDE_INT variable has a meaning.
 We use it to constrain what we (efficiently) want to work on.  For example
 CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
 doing bit-constant-propagation in loops (for TImode integers on a x86_64 
 host).

But what about targets with modes wider than TImode?  Would double_int
still be appropriate then?  If not, why does CCP have to use a templated
type with a fixed number of HWIs (and all arithmetic done to a fixed
number of HWIs) rather than one that can adapt to the runtime values,
like wide_int can?

 Oh, and I don't necessary see a use of double_int in its current form
 but for an integer representation on the host that is efficient to manipulate
 integer constants of a target dependent size.  For example the target
 detail that we have partial integer modes with bitsize  precision and that
 the bits  precision appearantly have a meaning when looking at the
 bit-representation of a constant should not be part of the base class
 of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
 about the reason we track bitsize in addition to precision - I think it's
 abstraction at the wrong level, the tree level does fine without knowing
 about bitsize).

TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
tracked for shift truncation, and if so, I agree it makes sense
to do that separately.

But anyway, this whole discussion seems to have reached a stalemate.
Or I suppose a de-facto rejection, since you're the only person in
a position to approve the thing :-)

Richard


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck

Richi,

Let me explain to you what a broken api is.   I have spent the last week 
screwing around with tree-vpn and as of last night i finally got it to 
work.   In tree-vpn, it is clear that double-int is the precise 
definition of a broken api.


The tree-vpn uses an infinite-precision view of arithmetic. However, 
that infinite precision is implemented on top of a finite, CARVED IN 
STONE, base that is and will always be without a patch like this, 128 
bits on an x86-64.However, as was pointed out by earlier, tree-vrp 
needs 2 * the size of a type + 1 bit to work correctly.Until 
yesterday i did not fully understand the significance of that 1 bit.  
what this means is that tree-vrp does not work on an x86-64 with _int128 
variables.


There are no checks in tree-vrp to back off when it sees something too 
large, tree-vrp simply gets the wrong answer.   To me, this is a broken 
api and is GCC at its very worst.   The patches that required this 
SHOULD HAVE NEVER GONE INTO GCC.   What you have with my patches is 
someone who is willing to fix a large and complex problem that should 
have been fixed years ago.


I understand that you do not like several aspects of the wide-int api 
and i am willing to make some of those improvements.   However, what i 
am worried about is that you are in some ways really attached to the 
style of programmed where everything is dependent on the size of a 
HWI.I will continue to push back on those comments but have been 
working the rest in as i have been going along.


To answer your other question, it will be a significant problem if i 
cannot get these patches in.   They are very prone to patch rot and my 
customer wants a product without many patches to the base code.
Also, i fear that your real reason that you want to wait is because you 
really do not like the fact these patches get rid of double in and that 
style of programming and putting off that day serves no one well.


kenny









On 10/31/2012 05:59 AM, Richard Sandiford wrote:

Richard Bienerrichard.guent...@gmail.com  writes:

On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck
zad...@naturalbridge.com  wrote:

jakub,

i am hoping to get the rest of my wide integer conversion posted by nov 5.
I am under some adverse conditions here: hurricane sandy hit her pretty
badly.  my house is hooked up to a small generator, and no one has any power
for miles around.

So far richi has promised to review them.   he has sent some comments, but
so far no reviews.Some time after i get the first round of them posted,
i will do a second round that incorporates everyones comments.

But i would like a little slack here if possible.While this work is a
show stopper for my private port, the patches address serious problems for
many of the public ports, especially ones that have very flexible vector
units.I believe that there are significant set of latent problems
currently with the existing ports that use ti mode that these patches will
fix.

However, i will do everything in my power to get the first round of the
patches posted by nov 5 deadline.

I suppose you are not going to merge your private port for 4.8 and thus
the wide-int changes are not a show-stopper for you.

That said, I considered the main conversion to be appropriate to be
defered for the next stage1.  There is no advantage in disrupting the
tree more at this stage.

I would like the wide_int class and rtl stuff to go in 4.8 though.
IMO it's a significant improvement in its own right, and Kenny
submitted it well before the deadline.

Richard





Re: [C++ Patch] PR 54583

2012-10-31 Thread Jason Merrill

On 10/30/2012 07:44 PM, Paolo Carlini wrote:

straightforward fix. The below, which regtests fine, simply acts on any
MULT_EXPR as TYPE_SIZE, which I think should be fine, but, in case the
idea is basically Ok, we could also consider the more complex but more
sophisticated variably_modified_type_p?


I think I'd use EXPR_P rather than check for a particular tree code.

Jason




Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-31 Thread Peter Bergner
On Wed, 2012-10-31 at 10:12 +0100, Richard Biener wrote:
 On Tue, Oct 30, 2012 at 9:23 PM, Peter Bergner berg...@vnet.ibm.com wrote:
  On Tue, 2012-10-30 at 20:37 +0100, Jakub Jelinek wrote:
  On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote:
   Ok, then I'll bootstrap and regtest your suggested change while we
   wait for richi to comment.  I'm fine with whatever you and richi
   decide is best.  The ObjC guys should probably test it though too.
  
   I assume you think we should change the current trunk code as well?
 
  Well, I haven't looked at the ObjC failures (guess they are darwin only
  anyway), so have no idea whether those set section name or not.
  So, if my proposed test instead of the trunk one doesn't work for darwin,
  perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) 
  !...).  I think DECL_USER_ALIGN test is undesirable for that though, it is
  just fine to increase alignment of anything, after all, it is still aligned
  properly then.
 
  I can confirm it bootstraps and regtests without any errors...and it
  fixes my problem.
 
 I'm fine with that, but please give it some time on trunk before backporting 
 it.

Great.  Jakub, were you going to commit your change or did you want me
to do that?

Peter




Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 08:11 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/25/2012 06:42 AM, Richard Biener wrote:

On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net wrote:

On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
wrote:

On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/23/2012 10:12 AM, Richard Biener wrote:

+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max byte mode
size 4 on a 64bit host.

I do not believe that this can happen.   The core compiler includes all
modes up to TI mode, so by default we already up to 128 bits.

And mode bitsizes are always power-of-two?  I suppose so.

Actually, no, they are not.  Partial int modes can have bit sizes that
are not power of two, and, if there isn't an int mode that is bigger, we'd
want to round up the partial int bit size.  Something like ((2 *
MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
HOST_BITS_PER_WIDE_INT should do it.


I still would like to have the ability to provide specializations of
wide_int
for small sizes, thus ideally wide_int would be a template templated
on the number of HWIs in val.  Interface-wise wide_int2 should be
identical to double_int, thus we should be able to do

typedef wide_int2 double_int;

If you want to go down this path after the patches get in, go for it.
I
see no use at all for this.
This was not meant to be a plug in replacement for double int. This
goal of
this patch is to get the compiler to do the constant math the way that
the
target does it.   Any such instantiation is by definition placing some
predefined limit that some target may not want.

Well, what I don't really like is that we now have two implementations
of functions that perform integer math on two-HWI sized integers.  What
I also don't like too much is that we have two different interfaces to
operate
on them!  Can't you see how I come to not liking this?  Especially the
latter …

double_int is logically dead.  Reactoring wide-int and double-int is a
waste of time, as the time is better spent removing double-int from the
compiler.  All the necessary semantics and code of double-int _has_ been
refactored into wide-int already.  Changing wide-int in any way to vend
anything to double-int is wrong, as once double-int is removed,
then all the
api changes to make double-int share from wide-int is wasted and must then
be removed.  The path forward is the complete removal of double-int; it is
wrong, has been wrong and always will be wrong, nothing can change that.

double_int, compared to wide_int, is fast and lean.  I doubt we will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you for
example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at least_
templating wide_int on 'len' and providing specializations for 1 and 2.

It might be a non-issue for math that operates on trees or RTXen due to
the allocation overhead we pay, but in recent years we transitioned
important
paths away from using tree math to using double_ints _for speed reasons_.

Richard.

i do not know why you believe this about the speed. double int always
does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the precision's
size, it almost never does uses synthetic operations since the sizes for
almost every instance can be done using the native math on the machine.
almost every call has a check to see if the operation can be done natively.
I seriously doubt that you are going to do TI mode math much faster than i
do it and if you do who cares.

the number of calls does not effect the performance in any negative way and
it fact is more efficient since common things that require more than one
operation in double in are typically done in a single operation.

Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
}

are always going to be faster than conditionally executing only one operation
(but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in favour of separate and() and not() operations, but why
not provide a fused one if there are clients who'll make use of it?

I was more 

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 But that means that wide_int has to model a P-bit operation as a
 normal len*HOST_WIDE_INT operation and then fix up the result
 after the fact, which seems unnecessarily convoluted.

 It does that right now.  The operations are carried out in a loop
 over len HOST_WIDE_INT parts, the last HWI is then special-treated
 to account for precision/size.  (yes, 'len' is also used as optimization - 
 the
 fact that len ends up being mutable is another thing I dislike about
 wide-int.  If wide-ints are cheap then all ops should be non-mutating
 (at least to 'len')).

 But the point of having a mutating len is that things like zero and -1
 are common even for OImode values.  So if you're doing someting potentially
 expensive like OImode multiplication, why do it to the number of
 HOST_WIDE_INTs needed for an OImode value when the value we're
 processing has only one significant HOST_WIDE_INT?

I don't propose doing that.  I propose that no wide-int member function
may _change_ it's len (to something larger).  Only that way you can
avoid allocating wasted space for zero and -1.  That way also the
artificial limit on 2 * largest-int-mode-hwis goes.

  I still don't
 see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
 X*HOST_WIDE_INT operation for any X) has any special meaning.

 Well, the same reason as a HOST_WIDE_INT variable has a meaning.
 We use it to constrain what we (efficiently) want to work on.  For example
 CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
 doing bit-constant-propagation in loops (for TImode integers on a x86_64 
 host).

 But what about targets with modes wider than TImode?  Would double_int
 still be appropriate then?  If not, why does CCP have to use a templated
 type with a fixed number of HWIs (and all arithmetic done to a fixed
 number of HWIs) rather than one that can adapt to the runtime values,
 like wide_int can?

Because nobody cares about accurate bit-tracking for modes larger than
TImode.  And because no convenient abstraction was available ;)

 Oh, and I don't necessary see a use of double_int in its current form
 but for an integer representation on the host that is efficient to manipulate
 integer constants of a target dependent size.  For example the target
 detail that we have partial integer modes with bitsize  precision and that
 the bits  precision appearantly have a meaning when looking at the
 bit-representation of a constant should not be part of the base class
 of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
 about the reason we track bitsize in addition to precision - I think it's
 abstraction at the wrong level, the tree level does fine without knowing
 about bitsize).

 TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
 tracked for shift truncation, and if so, I agree it makes sense
 to do that separately.

So, can we please remove all traces of bitsize from wide-int then?

 But anyway, this whole discussion seems to have reached a stalemate.
 Or I suppose a de-facto rejection, since you're the only person in
 a position to approve the thing :-)

There are many (silent) people that are able to approve the thing.  But the
point is I have too many issues with the current patch that I'm unable
to point at a specific thing I want Kenny to change after which the patch
would be fine.  So I rely on some guesswork from Kenny giving my
advices leaner API, less fused ops, get rid of bitsize, think of
abstracting the core HWI[len] operation, there should be no tree or
RTL dependencies in the wide-int API to produce an updated variant.
Which of course takes time, which of course crosses my vacation, which
in the end means it isn't going to make 4.8 (I _do_ like the idea of not
having a dependence on host properties for integer constant representation).

Btw, a good hint at what a minimal wide-int API would look like is if
you _just_ replace double-int users with it.  Then you obviously have to
implement only the double-int interface and conversion from/to double-int.

Richard.


 Richard


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Joern Rennecke

Quoting Richard Biener richard.guent...@gmail.com:


On Wed, Oct 31, 2012 at 2:06 PM, Joern Rennecke
joern.renne...@embecosm.com wrote:

Quoting Richard Biener richard.guent...@gmail.com:


Maybe we should split the thing then into a adjust_insn_length attribute
without the iteration parameter



Attributes don't get any parameter but the instruction, and don't apply
to delay slot SEQUENCEs.


hook I meant, not attribute.


All right.
So I gather first is called again adjust_insn_length.  What do we call the
one for the iteration count (or priority, if you want to put it that way)
for the lock_length mechanism to become effective?

int lock_length_iter_count (rtx_insn, bool in_delay_slot) ?
int lock_length_priority (rtx_insn, bool iter_count) ?


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote:
 The tree-vpn uses an infinite-precision view of arithmetic. However,
 that infinite precision is implemented on top of a finite, CARVED IN
 STONE, base that is and will always be without a patch like this,
 128 bits on an x86-64.However, as was pointed out by earlier,
 tree-vrp needs 2 * the size of a type + 1 bit to work correctly.
 Until yesterday i did not fully understand the significance of that
 1 bit.  what this means is that tree-vrp does not work on an x86-64
 with _int128 variables.

If you see a VRP bug, please file a PR with a testcase, or point to existing
PR.  I agree with richi that it would be better to add a clean wide_int
implementation for 4.9, rather than rushing something in, introducing
lots of bugs, just for a port that hasn't been submitted, nor I understand
why  int128_t integer types are so crucial to your port, the vector
support doesn't generally need very large integers, even if your
vector unit is 256-bit, 512-bit or larger.

Jakub


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 08:53:31AM -0500, Peter Bergner wrote:
 Great.  Jakub, were you going to commit your change or did you want me
 to do that?

I haven't tested it, you did, so please do that yourself.  Thanks.

Jakub


Re: [Patch] Potential fix for PR55033

2012-10-31 Thread Sebastian Huber

Hello Alan,

maybe it is better to use a require effective target instead of the

{ target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* }

patterns scattered around in the testsuite?  One problem with this is that test 
cases for one of these will likely also work with powerpc*-*-rtems*.  I want to 
look at the remaining 104 test cases which are marked as UNSUPPORTED for RTEMS 
some time in the future.  Please have a look at the attached patches.


On 10/31/2012 01:12 AM, Alan Modra wrote:

On Tue, Oct 30, 2012 at 02:45:40PM +0100, Sebastian Huber wrote:

On 10/26/2012 02:22 PM, Sebastian Huber wrote:

Hello,

here is a test case for PR55033.



Is there something wrong with this test case?  It compiles well with Alan's 
patch.


It looks OK to me if you replace your gd-do compile line with the
following two lines to avoid failures on powerpc targets that don't
support -meabi -msdata.

/* { dg-do compile { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* 
} } */
/* { dg-require-effective-target ilp32 } */





--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


From a57f179c89b5ac1bb6bf8ecfa8fa47a6e8e7c6c0 Mon Sep 17 00:00:00 2001
From: Sebastian Huber sebastian.hu...@embedded-brains.de
Date: Wed, 31 Oct 2012 10:36:49 +0100
Subject: [PATCH 1/2] PowerPC testsuite clean up

2012-10-31  Sebastian Huber  sebastian.hu...@embedded-brains.de

	* lib/target-supports.exp
	(check_effective_target_powerpc_canonical): New.
	(check_effective_target_powerpc_eabi_ok): New.
	* gcc.target/powerpc/ppc-eabi.c: Use require effective target
	powerpc_eabi_ok.
	* gcc.target/powerpc/ppc-sdata-1.c: Likewise.
	* gcc.target/powerpc/spe-small-data-2.c: Likewise.
	* gcc.target/powerpc/ppc-sdata-2.c: Use require effective target
	powerpc_canonical and ilp32.
	* gcc.target/powerpc/pr51623.c: Likewise.
	* gcc.target/powerpc/ppc-stackalign-1.c: Use require effective target
	powerpc_canonical.
	* gcc.target/powerpc/ppc-ldstruct.c: Run for powerpc*-*-rtems*.
	* gcc.target/powerpc/spe-small-data-2.c: Do not run, compile only.
---
 gcc/testsuite/gcc.target/powerpc/ppc-eabi.c|3 +-
 gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c|2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c |3 +-
 gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c |4 ++-
 .../gcc.target/powerpc/ppc-stackalign-1.c  |3 +-
 gcc/testsuite/gcc.target/powerpc/pr51623.c |4 ++-
 .../gcc.target/powerpc/spe-small-data-2.c  |3 +-
 gcc/testsuite/lib/target-supports.exp  |   23 
 8 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c b/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c
index 47ba1a7..cd15586 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-eabi.c
@@ -1,4 +1,5 @@
 /* PR target/16952 */
-/* { dg-do compile { target { powerpc*-*-linux*  ilp32 } } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_eabi_ok } */
 /* { dg-options -meabi -mrelocatable } */
 char *s = boo;
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c b/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c
index da6001f..ffb4264 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-ldstruct.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* } } */
+/* { dg-do run { target powerpc*-*-eabi* powerpc*-*-elf* powerpc*-*-linux* powerpc*-*-rtems* } } */
 /* { dg-options -O -mlong-double-128 } */
 
 #include stdlib.h
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c
index 5f39d86..efd5a5e 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-1.c
@@ -1,5 +1,6 @@
-/* { dg-do compile { target { { powerpc*-*-linux*  ilp32 } || { powerpc-*-eabi* } } } } */
+/* { dg-do compile } */
 /* { dg-options -O2 -fno-common -G 8 -meabi -msdata=eabi } */
+/* { dg-require-effective-target powerpc_eabi_ok } */
 /* { dg-require-effective-target nonpic } */
 /* { dg-final { scan-assembler \\.section\[ \t\]\\.sdata, } } */
 /* { dg-final { scan-assembler \\.section\[ \t\]\\.sdata2, } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c
index f102b1d..fdc4030 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-sdata-2.c
@@ -1,5 +1,7 @@
-/* { dg-do compile { target { { powerpc*-*-linux*  ilp32 } || { powerpc-*-eabi* } } } } */
+/* { dg-do compile } */
 /* { dg-options -O2 -fno-common -G 8 -msdata=sysv } */
+/* { 

Re: Modernize loop_finite_p

2012-10-31 Thread Jan Hubicka
  -  FOR_EACH_VEC_ELT (edge, exits, i, ex)
  +  if (loop-any_upper_bound)
   {
  -  if (!just_once_each_iteration_p (loop, ex-src))
  -   continue;
  +  if (dump_file  (dump_flags  TDF_DETAILS))
  +   fprintf (dump_file, Found loop %i to be finite: upper bound is 
  recorded.\n,
  +loop-num);
  +  return true;
  +}
 
 This looks redundant with ...
 
  -  if (number_of_iterations_exit (loop, ex, desc, false))
  -{
  - if (dump_file  (dump_flags  TDF_DETAILS))
  -   {
  - fprintf (dump_file, Found loop %i to be finite: iterating , 
  loop-num);
  - print_generic_expr (dump_file, desc.niter, TDF_SLIM);
  - fprintf (dump_file,  times\n);
  -   }
  - finite = true;
  - break;
  -   }
  +  if (max_loop_iterations (loop, nit))
  +{
 
 ... this.  If you want to short-circuit max_loop_iterations ()
 then why not
 
if (loop-any_upper_bound
|| max_loop_iterations (loop, nit))
  {
...
 
 ?  The different dumping seems not a good reason to obfuscate it.
 

Sounds good to me.  I only wanted to avoid max_loop_iterations re-trying to
derrive the bound when it is already known to be finite.  I think I will
eventually make max_loop_iterations and friends to do the hard work only once
per loop optimizer queue or when asked to collect the current bounds (since
the pointers to statements are hard to maintain).

OK with that change?
Honza


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck

jakub

my port has 256 bit integers.   They are done by strapping together all 
of the elements of a vector unit.
if one looks at where intel is going, they are doing exactly the same 
thing.The difference is that they like to add the operations one at 
a time rather than just do a clean implementation like we did.   Soon 
they will get there, it is just a matter of time.


i understand the tree-vrp code well enough to say that this operation 
does not work if you have timode, but i do not know how to translate 
that back into c to generate a test case.My patch to tree-vrp is 
adaptable in that it looks at the types in the program and adjusts its 
definition of infinite precision based on the code that it sees.  I can 
point people to that code in tree vrp and am happy to do that, but that 
is not my priority now.


also, richi pointed out that there are places in the tree level constant 
propagators that require infinite precision so he is really the person 
who both should know about this and generate proper tests.


kenny

On 10/31/2012 09:55 AM, Jakub Jelinek wrote:

On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote:

The tree-vpn uses an infinite-precision view of arithmetic. However,
that infinite precision is implemented on top of a finite, CARVED IN
STONE, base that is and will always be without a patch like this,
128 bits on an x86-64.However, as was pointed out by earlier,
tree-vrp needs 2 * the size of a type + 1 bit to work correctly.
Until yesterday i did not fully understand the significance of that
1 bit.  what this means is that tree-vrp does not work on an x86-64
with _int128 variables.

If you see a VRP bug, please file a PR with a testcase, or point to existing
PR.  I agree with richi that it would be better to add a clean wide_int
implementation for 4.9, rather than rushing something in, introducing
lots of bugs, just for a port that hasn't been submitted, nor I understand
why  int128_t integer types are so crucial to your port, the vector
support doesn't generally need very large integers, even if your
vector unit is 256-bit, 512-bit or larger.

Jakub




Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Jan Hubicka
 On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote:
  On Wed, 31 Oct 2012, Jan Hubicka wrote:
unroll you mean.  Because unrolling mutates the CFG too much.
Well - it was just a starting point, populating -Og with as little
as possible and 100% profitable transforms (in both debug and speed
metric).  In late opts we only do (early opt queue is shared):
   
   Well, and what about early cunrolli?
  
  It's not there (and I would disable it for -Og).
 
 Generally, most of the loop transforms are undesirable for -Og.

Hmm, do we have some guidelines what is and is not desirable for -Og?
Unrolling/header copying being code duplicating optimization do not really
throw away any info by themself only require debugger to understand that some
code may get duplicated, but it is not different from i.e. inlining. Of course
the subsequent const prop will make it impossible to write into the iv
variable, but do we want to support reliable writting into user vars (i.e.
disable cprop on user vars) at -Og?

Honza
 
   Jakub


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

 On 10/31/2012 08:11 AM, Richard Biener wrote:

 On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:

 Richard Biener richard.guent...@gmail.com writes:

 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:

 Richard Biener richard.guent...@gmail.com writes:

 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net
 wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener
 richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max byte
 mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler
 includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes
 that
 are not power of two, and, if there isn't an int mode that is
 bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide
 specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template
 templated
 on the number of HWIs in val.  Interface-wise wide_int2 should
 be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go for
 it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int.
 This
 goal of
 this patch is to get the compiler to do the constant math the way
 that
 the
 target does it.   Any such instantiation is by definition placing
 some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two
 implementations
 of functions that perform integer math on two-HWI sized integers.
 What
 I also don't like too much is that we have two different
 interfaces to
 operate
 on them!  Can't you see how I come to not liking this?  Especially
 the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int
 is a
 waste of time, as the time is better spent removing double-int from
 the
 compiler.  All the necessary semantics and code of double-int _has_
 been
 refactored into wide-int already.  Changing wide-int in any way to
 vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and
 must then
 be removed.  The path forward is the complete removal of
 double-int; it is
 wrong, has been wrong and always will be wrong, nothing can change
 that.

 double_int, compared to wide_int, is fast and lean.  I doubt we will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you
 for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at
 least_
 templating wide_int on 'len' and providing specializations for 1 and
 2.

 It might be a non-issue for math that operates on trees or RTXen due
 to
 the allocation overhead we pay, but in recent years we transitioned
 important
 paths away from using tree math to using double_ints _for speed
 reasons_.

 Richard.

 i do not know why you believe this about the speed. double int
 always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the
 precision's
 size, it almost never does uses synthetic operations since the sizes
 for
 almost every instance can be done using the native math on the
 machine.
 almost every call has a check to see if the operation can be done
 natively.
 I seriously doubt that you are going to do TI mode math much faster
 than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative
 way and
 it fact is more efficient since common things that require more than
 one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
double_int result;
result.low = low  ~b.low;
result.high = high  ~b.high;
return result;
 }

 are always going to be faster than conditionally executing only one
 operation
 (but inside an offline function).

 OK, this is really in reply to the 4.8 thing, but it felt more
 appropriate here.

 It's interesting that you gave this example, since before you were
 

Re: Modernize loop_finite_p

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

   -  FOR_EACH_VEC_ELT (edge, exits, i, ex)
   +  if (loop-any_upper_bound)
{
   -  if (!just_once_each_iteration_p (loop, ex-src))
   - continue;
   +  if (dump_file  (dump_flags  TDF_DETAILS))
   + fprintf (dump_file, Found loop %i to be finite: upper bound is 
   recorded.\n,
   +  loop-num);
   +  return true;
   +}
  
  This looks redundant with ...
  
   -  if (number_of_iterations_exit (loop, ex, desc, false))
   -{
   -   if (dump_file  (dump_flags  TDF_DETAILS))
   - {
   -   fprintf (dump_file, Found loop %i to be finite: iterating , 
   loop-num);
   -   print_generic_expr (dump_file, desc.niter, TDF_SLIM);
   -   fprintf (dump_file,  times\n);
   - }
   -   finite = true;
   -   break;
   - }
   +  if (max_loop_iterations (loop, nit))
   +{
  
  ... this.  If you want to short-circuit max_loop_iterations ()
  then why not
  
 if (loop-any_upper_bound
 || max_loop_iterations (loop, nit))
   {
 ...
  
  ?  The different dumping seems not a good reason to obfuscate it.
  
 
 Sounds good to me.  I only wanted to avoid max_loop_iterations re-trying to
 derrive the bound when it is already known to be finite.  I think I will
 eventually make max_loop_iterations and friends to do the hard work only once
 per loop optimizer queue or when asked to collect the current bounds (since
 the pointers to statements are hard to maintain).
 
 OK with that change?

Ok.

Thanks,
Richard.


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-31 Thread Peter Bergner
On Wed, 2012-10-31 at 14:55 +0100, Jakub Jelinek wrote:
 On Wed, Oct 31, 2012 at 08:53:31AM -0500, Peter Bergner wrote:
  Great.  Jakub, were you going to commit your change or did you want me
  to do that?
 
 I haven't tested it, you did, so please do that yourself.  Thanks.

I tested it on FSF 4.7.  I'll do a quick trunk bootstrap/regtest and
commit it when it passes.  Thanks.

Peter





Re: Non-dominating loop bounds in tree-ssa-loop-niter 3/4

2012-10-31 Thread Richard Biener
On Wed, 31 Oct 2012, Jan Hubicka wrote:

  On Wed, Oct 31, 2012 at 01:30:02PM +0100, Richard Biener wrote:
   On Wed, 31 Oct 2012, Jan Hubicka wrote:
 unroll you mean.  Because unrolling mutates the CFG too much.
 Well - it was just a starting point, populating -Og with as little
 as possible and 100% profitable transforms (in both debug and speed
 metric).  In late opts we only do (early opt queue is shared):

Well, and what about early cunrolli?
   
   It's not there (and I would disable it for -Og).
  
  Generally, most of the loop transforms are undesirable for -Og.
 
 Hmm, do we have some guidelines what is and is not desirable for -Og?
 Unrolling/header copying being code duplicating optimization do not really
 throw away any info by themself only require debugger to understand that some
 code may get duplicated, but it is not different from i.e. inlining. Of course
 the subsequent const prop will make it impossible to write into the iv
 variable, but do we want to support reliable writting into user vars (i.e.
 disable cprop on user vars) at -Og?

No, we don't support reliable writing.

We should first sort out issues with the current passes and debug info,
look at compile-time and runtime performance and then add new passes.

Richard.


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 10:05 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/31/2012 08:11 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/25/2012 06:42 AM, Richard Biener wrote:

On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump mikest...@comcast.net
wrote:

On Oct 24, 2012, at 2:43 AM, Richard Biener
richard.guent...@gmail.com
wrote:

On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/23/2012 10:12 AM, Richard Biener wrote:

+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max byte
mode
size 4 on a 64bit host.

I do not believe that this can happen.   The core compiler
includes all
modes up to TI mode, so by default we already up to 128 bits.

And mode bitsizes are always power-of-two?  I suppose so.

Actually, no, they are not.  Partial int modes can have bit sizes
that
are not power of two, and, if there isn't an int mode that is
bigger, we'd
want to round up the partial int bit size.  Something like ((2 *
MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
HOST_BITS_PER_WIDE_INT should do it.


I still would like to have the ability to provide
specializations of
wide_int
for small sizes, thus ideally wide_int would be a template
templated
on the number of HWIs in val.  Interface-wise wide_int2 should
be
identical to double_int, thus we should be able to do

typedef wide_int2 double_int;

If you want to go down this path after the patches get in, go for
it.
I
see no use at all for this.
This was not meant to be a plug in replacement for double int.
This
goal of
this patch is to get the compiler to do the constant math the way
that
the
target does it.   Any such instantiation is by definition placing
some
predefined limit that some target may not want.

Well, what I don't really like is that we now have two
implementations
of functions that perform integer math on two-HWI sized integers.
What
I also don't like too much is that we have two different
interfaces to
operate
on them!  Can't you see how I come to not liking this?  Especially
the
latter …

double_int is logically dead.  Reactoring wide-int and double-int
is a
waste of time, as the time is better spent removing double-int from
the
compiler.  All the necessary semantics and code of double-int _has_
been
refactored into wide-int already.  Changing wide-int in any way to
vend
anything to double-int is wrong, as once double-int is removed,
then all the
api changes to make double-int share from wide-int is wasted and
must then
be removed.  The path forward is the complete removal of
double-int; it is
wrong, has been wrong and always will be wrong, nothing can change
that.

double_int, compared to wide_int, is fast and lean.  I doubt we will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you
for
example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at
least_
templating wide_int on 'len' and providing specializations for 1 and
2.

It might be a non-issue for math that operates on trees or RTXen due
to
the allocation overhead we pay, but in recent years we transitioned
important
paths away from using tree math to using double_ints _for speed
reasons_.

Richard.

i do not know why you believe this about the speed. double int
always
does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the
precision's
size, it almost never does uses synthetic operations since the sizes
for
almost every instance can be done using the native math on the
machine.
almost every call has a check to see if the operation can be done
natively.
I seriously doubt that you are going to do TI mode math much faster
than i
do it and if you do who cares.

the number of calls does not effect the performance in any negative
way and
it fact is more efficient since common things that require more than
one
operation in double in are typically done in a single operation.

Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
double_int result;
result.low = low  ~b.low;
result.high = high  ~b.high;
return result;
}

are always going to be faster than conditionally executing only one
operation
(but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in favour 

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Richard Biener
On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

 On 10/31/2012 10:05 AM, Richard Biener wrote:

 On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/31/2012 08:11 AM, Richard Biener wrote:

 On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:

 Richard Biener richard.guent...@gmail.com writes:

 On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:

 Richard Biener richard.guent...@gmail.com writes:

 On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/25/2012 06:42 AM, Richard Biener wrote:

 On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
 mikest...@comcast.net
 wrote:

 On Oct 24, 2012, at 2:43 AM, Richard Biener
 richard.guent...@gmail.com
 wrote:

 On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:

 On 10/23/2012 10:12 AM, Richard Biener wrote:

 +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
 HOST_BITS_PER_WIDE_INT];

 are we sure this rounds properly?  Consider a port with max
 byte
 mode
 size 4 on a 64bit host.

 I do not believe that this can happen.   The core compiler
 includes all
 modes up to TI mode, so by default we already up to 128 bits.

 And mode bitsizes are always power-of-two?  I suppose so.

 Actually, no, they are not.  Partial int modes can have bit sizes
 that
 are not power of two, and, if there isn't an int mode that is
 bigger, we'd
 want to round up the partial int bit size.  Something like ((2 *
 MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
 HOST_BITS_PER_WIDE_INT should do it.

 I still would like to have the ability to provide
 specializations of
 wide_int
 for small sizes, thus ideally wide_int would be a template
 templated
 on the number of HWIs in val.  Interface-wise wide_int2
 should
 be
 identical to double_int, thus we should be able to do

 typedef wide_int2 double_int;

 If you want to go down this path after the patches get in, go
 for
 it.
 I
 see no use at all for this.
 This was not meant to be a plug in replacement for double int.
 This
 goal of
 this patch is to get the compiler to do the constant math the
 way
 that
 the
 target does it.   Any such instantiation is by definition
 placing
 some
 predefined limit that some target may not want.

 Well, what I don't really like is that we now have two
 implementations
 of functions that perform integer math on two-HWI sized
 integers.
 What
 I also don't like too much is that we have two different
 interfaces to
 operate
 on them!  Can't you see how I come to not liking this?
 Especially
 the
 latter …

 double_int is logically dead.  Reactoring wide-int and double-int
 is a
 waste of time, as the time is better spent removing double-int
 from
 the
 compiler.  All the necessary semantics and code of double-int
 _has_
 been
 refactored into wide-int already.  Changing wide-int in any way
 to
 vend
 anything to double-int is wrong, as once double-int is removed,
 then all the
 api changes to make double-int share from wide-int is wasted and
 must then
 be removed.  The path forward is the complete removal of
 double-int; it is
 wrong, has been wrong and always will be wrong, nothing can
 change
 that.

 double_int, compared to wide_int, is fast and lean.  I doubt we
 will
 get rid of it - you
 will make compile-time math a _lot_ slower.  Just profile when you
 for
 example
 change get_inner_reference to use wide_ints.

 To be able to remove double_int in favor of wide_int requires _at
 least_
 templating wide_int on 'len' and providing specializations for 1
 and
 2.

 It might be a non-issue for math that operates on trees or RTXen
 due
 to
 the allocation overhead we pay, but in recent years we
 transitioned
 important
 paths away from using tree math to using double_ints _for speed
 reasons_.

 Richard.

 i do not know why you believe this about the speed. double int
 always
 does synthetic math since you do everything at 128 bit precision.

 the thing about wide int, is that since it does math to the
 precision's
 size, it almost never does uses synthetic operations since the
 sizes
 for
 almost every instance can be done using the native math on the
 machine.
 almost every call has a check to see if the operation can be done
 natively.
 I seriously doubt that you are going to do TI mode math much faster
 than i
 do it and if you do who cares.

 the number of calls does not effect the performance in any negative
 way and
 it fact is more efficient since common things that require more
 than
 one
 operation in double in are typically done in a single operation.

 Simple double-int operations like

 inline double_int
 double_int::and_not (double_int b) const
 {
 double_int result;
 result.low = low  ~b.low;
 result.high = high  ~b.high;
 return result;
 }

 are always going to be faster than conditionally executing only one
 operation
 (but inside an offline function).

 OK, 

Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 09:54 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

But that means that wide_int has to model a P-bit operation as a
normal len*HOST_WIDE_INT operation and then fix up the result
after the fact, which seems unnecessarily convoluted.

It does that right now.  The operations are carried out in a loop
over len HOST_WIDE_INT parts, the last HWI is then special-treated
to account for precision/size.  (yes, 'len' is also used as optimization - the
fact that len ends up being mutable is another thing I dislike about
wide-int.  If wide-ints are cheap then all ops should be non-mutating
(at least to 'len')).

But the point of having a mutating len is that things like zero and -1
are common even for OImode values.  So if you're doing someting potentially
expensive like OImode multiplication, why do it to the number of
HOST_WIDE_INTs needed for an OImode value when the value we're
processing has only one significant HOST_WIDE_INT?

I don't propose doing that.  I propose that no wide-int member function
may _change_ it's len (to something larger).  Only that way you can
avoid allocating wasted space for zero and -1.  That way also the
artificial limit on 2 * largest-int-mode-hwis goes.

it is now 4x not 2x to accomodate the extra bit in tree-vrp.

remember that the space burden is minimal.wide-ints are not 
persistent and there are never more than a handful at a time.



  I still don't
see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
X*HOST_WIDE_INT operation for any X) has any special meaning.

Well, the same reason as a HOST_WIDE_INT variable has a meaning.
We use it to constrain what we (efficiently) want to work on.  For example
CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).

But what about targets with modes wider than TImode?  Would double_int
still be appropriate then?  If not, why does CCP have to use a templated
type with a fixed number of HWIs (and all arithmetic done to a fixed
number of HWIs) rather than one that can adapt to the runtime values,
like wide_int can?

Because nobody cares about accurate bit-tracking for modes larger than
TImode.  And because no convenient abstraction was available ;)
yes, but tree-vrp does not even work for timode.  and there are not 
tests to scale it back when it does see ti-mode.   I understand that 
these can be added, but they so far have not been.


I would also point out that i was corrected on this point by (i believe) 
lawrence.   He points out that tree-vrp is still important for 
converting signed to unsigned for larger modes.




Oh, and I don't necessary see a use of double_int in its current form
but for an integer representation on the host that is efficient to manipulate
integer constants of a target dependent size.  For example the target
detail that we have partial integer modes with bitsize  precision and that
the bits  precision appearantly have a meaning when looking at the
bit-representation of a constant should not be part of the base class
of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
about the reason we track bitsize in addition to precision - I think it's
abstraction at the wrong level, the tree level does fine without knowing
about bitsize).

TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
tracked for shift truncation, and if so, I agree it makes sense
to do that separately.

So, can we please remove all traces of bitsize from wide-int then?


But anyway, this whole discussion seems to have reached a stalemate.
Or I suppose a de-facto rejection, since you're the only person in
a position to approve the thing :-)

There are many (silent) people that are able to approve the thing.  But the
point is I have too many issues with the current patch that I'm unable
to point at a specific thing I want Kenny to change after which the patch
would be fine.  So I rely on some guesswork from Kenny giving my
advices leaner API, less fused ops, get rid of bitsize, think of
abstracting the core HWI[len] operation, there should be no tree or
RTL dependencies in the wide-int API to produce an updated variant.
Which of course takes time, which of course crosses my vacation, which
in the end means it isn't going to make 4.8 (I _do_ like the idea of not
having a dependence on host properties for integer constant representation).

Btw, a good hint at what a minimal wide-int API would look like is if
you _just_ replace double-int users with it.  Then you obviously have to
implement only the double-int interface and conversion from/to double-int.

Richard.



Richard




Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-10-31 Thread Jason Merrill

On 10/30/2012 05:49 PM, Sriraman Tallam wrote:

AFAIU, this should not be a problem. For duplicate declarations,
duplicate_decls should merge them and they should never be seen here.
Did I miss something?


With extern C functions you can have multiple declarations of the same 
function in different namespaces that are not duplicates, but still 
match.  And I can't think what that test is supposed to be catching, anyway.



No, I thought about this but I did not want to handle this case in
this iteration. The dispatcher is created only once and if more
functions are declared later, they will not be dispatched atleast in
this iteration.


I still think that instead of collecting the set of functions in 
overload resolution, they should be collected at declaration time and 
added to a vector in the cgraph information for use when generating the 
body of the dispatcher.



You talked about doing the dispatcher
building later, but I did it here since I am doing it only once.


I still don't think this is the right place for it.


dispatcher_node does not have a body  until it is generated in
cgraphunit.c, so cgraph does not mark this field before this is
processed in cgraph_analyze_function.


That seems like something to address in your cgraph changes.

Jason



Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 10:24 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/31/2012 10:05 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/31/2012 08:11 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/25/2012 06:42 AM, Richard Biener wrote:

On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
mikest...@comcast.net
wrote:

On Oct 24, 2012, at 2:43 AM, Richard Biener
richard.guent...@gmail.com
wrote:

On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/23/2012 10:12 AM, Richard Biener wrote:

+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max
byte
mode
size 4 on a 64bit host.

I do not believe that this can happen.   The core compiler
includes all
modes up to TI mode, so by default we already up to 128 bits.

And mode bitsizes are always power-of-two?  I suppose so.

Actually, no, they are not.  Partial int modes can have bit sizes
that
are not power of two, and, if there isn't an int mode that is
bigger, we'd
want to round up the partial int bit size.  Something like ((2 *
MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
HOST_BITS_PER_WIDE_INT should do it.


I still would like to have the ability to provide
specializations of
wide_int
for small sizes, thus ideally wide_int would be a template
templated
on the number of HWIs in val.  Interface-wise wide_int2
should
be
identical to double_int, thus we should be able to do

typedef wide_int2 double_int;

If you want to go down this path after the patches get in, go
for
it.
I
see no use at all for this.
This was not meant to be a plug in replacement for double int.
This
goal of
this patch is to get the compiler to do the constant math the
way
that
the
target does it.   Any such instantiation is by definition
placing
some
predefined limit that some target may not want.

Well, what I don't really like is that we now have two
implementations
of functions that perform integer math on two-HWI sized
integers.
What
I also don't like too much is that we have two different
interfaces to
operate
on them!  Can't you see how I come to not liking this?
Especially
the
latter …

double_int is logically dead.  Reactoring wide-int and double-int
is a
waste of time, as the time is better spent removing double-int
from
the
compiler.  All the necessary semantics and code of double-int
_has_
been
refactored into wide-int already.  Changing wide-int in any way
to
vend
anything to double-int is wrong, as once double-int is removed,
then all the
api changes to make double-int share from wide-int is wasted and
must then
be removed.  The path forward is the complete removal of
double-int; it is
wrong, has been wrong and always will be wrong, nothing can
change
that.

double_int, compared to wide_int, is fast and lean.  I doubt we
will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you
for
example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at
least_
templating wide_int on 'len' and providing specializations for 1
and
2.

It might be a non-issue for math that operates on trees or RTXen
due
to
the allocation overhead we pay, but in recent years we
transitioned
important
paths away from using tree math to using double_ints _for speed
reasons_.

Richard.

i do not know why you believe this about the speed. double int
always
does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the
precision's
size, it almost never does uses synthetic operations since the
sizes
for
almost every instance can be done using the native math on the
machine.
almost every call has a check to see if the operation can be done
natively.
I seriously doubt that you are going to do TI mode math much faster
than i
do it and if you do who cares.

the number of calls does not effect the performance in any negative
way and
it fact is more efficient since common things that require more
than
one
operation in double in are typically done in a single operation.

Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
 double_int result;
 result.low = low  ~b.low;
 result.high = high  ~b.high;
 return result;
}

are always going to be faster than conditionally executing only one
operation
(but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting 

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Jakub Jelinek
On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote:
 if one looks at where intel is going, they are doing exactly the
 same thing.The difference is that they like to add the
 operations one at a time rather than just do a clean implementation
 like we did.   Soon they will get there, it is just a matter of
 time.

All I see on Intel is whole vector register shifts (and like on many other
ports and/or/xor/andn could be considered whole register too).
And, even if your port has 256-bit integer arithmetics, there is no mangling
for __int256_t or similar, so I don't see how you can offer such data type
as supported in the 4.8 timeframe.

Jakub


Re: [C++ Patch] PR 54583

2012-10-31 Thread Paolo Carlini

Hi,

On 10/31/2012 02:50 PM, Jason Merrill wrote:

On 10/30/2012 07:44 PM, Paolo Carlini wrote:

straightforward fix. The below, which regtests fine, simply acts on any
MULT_EXPR as TYPE_SIZE, which I think should be fine, but, in case the
idea is basically Ok, we could also consider the more complex but more
sophisticated variably_modified_type_p?

I think I'd use EXPR_P rather than check for a particular tree code.
Yes, I also wondered whether that would be better. The below also passes 
testing.


Thanks,
Paolo.

/
Index: testsuite/g++.dg/ext/vla13.C
===
--- testsuite/g++.dg/ext/vla13.C(revision 0)
+++ testsuite/g++.dg/ext/vla13.C(working copy)
@@ -0,0 +1,8 @@
+// PR c++/54583
+// { dg-options -Wunused-value }
+
+void fred()
+{
+  int n=10;
+  double (*x)[n];
+}
Index: cp/tree.c
===
--- cp/tree.c   (revision 193033)
+++ cp/tree.c   (working copy)
@@ -824,6 +824,10 @@ build_cplus_array_type (tree elt_type, tree index_
}
 }
 
+  /* Avoid spurious warnings with VLAs (c++/54583).  */
+  if (TYPE_SIZE (t)  EXPR_P (TYPE_SIZE (t)))
+TREE_NO_WARNING (TYPE_SIZE (t)) = 1;
+
   /* Push these needs up so that initialization takes place
  more easily.  */
   TYPE_NEEDS_CONSTRUCTING (t)


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 08:44 AM, Richard Biener wrote:

On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:

Richard Biener richard.guent...@gmail.com writes:

On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/25/2012 06:42 AM, Richard Biener wrote:

On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump
mikest...@comcast.net wrote:

On Oct 24, 2012, at 2:43 AM, Richard Biener richard.guent...@gmail.com
wrote:

On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
zad...@naturalbridge.com wrote:

On 10/23/2012 10:12 AM, Richard Biener wrote:

+  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
HOST_BITS_PER_WIDE_INT];

are we sure this rounds properly?  Consider a port with max byte mode
size 4 on a 64bit host.

I do not believe that this can happen.  The core compiler
includes all
modes up to TI mode, so by default we already up to 128 bits.

And mode bitsizes are always power-of-two?  I suppose so.

Actually, no, they are not.  Partial int modes can have bit sizes that
are not power of two, and, if there isn't an int mode that is
bigger, we'd
want to round up the partial int bit size.  Something like ((2 *
MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
HOST_BITS_PER_WIDE_INT should do it.


I still would like to have the ability to provide specializations of
wide_int
for small sizes, thus ideally wide_int would be a template
templated
on the number of HWIs in val.  Interface-wise wide_int2 should be
identical to double_int, thus we should be able to do

typedef wide_int2 double_int;

If you want to go down this path after the patches get in, go for it.
I
see no use at all for this.
This was not meant to be a plug in replacement for double int. This
goal of
this patch is to get the compiler to do the constant math the way that
the
target does it.   Any such instantiation is by definition placing some
predefined limit that some target may not want.

Well, what I don't really like is that we now have two implementations
of functions that perform integer math on two-HWI sized integers.  What
I also don't like too much is that we have two different interfaces to
operate
on them!  Can't you see how I come to not liking this?  Especially the
latter …

double_int is logically dead.  Reactoring wide-int and double-int is a
waste of time, as the time is better spent removing double-int from the
compiler.  All the necessary semantics and code of double-int _has_ been
refactored into wide-int already.  Changing wide-int in any way to vend
anything to double-int is wrong, as once double-int is removed,
then all the
api changes to make double-int share from wide-int is wasted and
must then
be removed.  The path forward is the complete removal of
double-int; it is
wrong, has been wrong and always will be wrong, nothing can change that.

double_int, compared to wide_int, is fast and lean.  I doubt we will
get rid of it - you
will make compile-time math a _lot_ slower.  Just profile when you for
example
change get_inner_reference to use wide_ints.

To be able to remove double_int in favor of wide_int requires _at least_
templating wide_int on 'len' and providing specializations for 1 and 2.

It might be a non-issue for math that operates on trees or RTXen due to
the allocation overhead we pay, but in recent years we transitioned
important
paths away from using tree math to using double_ints _for speed reasons_.

Richard.

i do not know why you believe this about the speed. double int always
does synthetic math since you do everything at 128 bit precision.

the thing about wide int, is that since it does math to the precision's
size, it almost never does uses synthetic operations since the sizes for
almost every instance can be done using the native math on the machine.
almost every call has a check to see if the operation can be done
natively.
I seriously doubt that you are going to do TI mode math much faster than i
do it and if you do who cares.

the number of calls does not effect the performance in any
negative way and
it fact is more efficient since common things that require more than one
operation in double in are typically done in a single operation.

Simple double-int operations like

inline double_int
double_int::and_not (double_int b) const
{
   double_int result;
   result.low = low  ~b.low;
   result.high = high  ~b.high;
   return result;
}

are always going to be faster than conditionally executing only one
operation
(but inside an offline function).

OK, this is really in reply to the 4.8 thing, but it felt more
appropriate here.

It's interesting that you gave this example, since before you were
complaining about too many fused ops.  Clearly this one could be
removed in 

Re: [PATCH] Fix PR53501

2012-10-31 Thread Eric Botcazou
 It failed with revision 188008.

OK, thanks.  So the testcase never compiled on the trunk (except for about 24 
hours between 188009  188118) or did it compile before 188008 at some point?

-- 
Eric Botcazou


committed: fix compilation failure for ports that don't use LEN parameter of MOVE_BY_PIECES_P

2012-10-31 Thread Joern Rennecke

Committed as obvious.
2012-10-31  Joern Rennecke  joern.renne...@embecosm.com

* expr.c (can_move_by_pieces): Apply ATTRIBUTE_UNUSED to len.

Index: expr.c
===
--- expr.c  (revision 193034)
+++ expr.c  (working copy)
@@ -841,7 +841,7 @@ widest_int_mode_for_size (unsigned int s
succeed.  */
 
 int
-can_move_by_pieces (unsigned HOST_WIDE_INT len,
+can_move_by_pieces (unsigned HOST_WIDE_INT len ATTRIBUTE_UNUSED,
unsigned int align ATTRIBUTE_UNUSED)
 {
   return MOVE_BY_PIECES_P (len, align);


Re: [PATCH] Update source location for PRE inserted stmt

2012-10-31 Thread Dehao Chen
On Wed, Oct 31, 2012 at 2:34 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Oct 31, 2012 at 12:57 AM, Xinliang David Li davi...@google.com 
 wrote:
 It will make the location info for the newly synthesized stmt more
 deterministic, I think.

 Maybe, but it will increase the jumpiness in the debugger without actually
 being accurate, no?  For example if the partially redundant expression is

Sometimes, it is necessary to have a jump in gdb, especially when
control flow changes. For the test case I gave, people may want to
know if the branch in line 10 is taken or not. Without this patch, the
gdb simply jump from line 10 to line 15 if the branch is not taken.
But with the patch, people get the correct control flow.

One more thing, for AutoFDO to work, we want to make sure debug info
is correct at control flow boundaries. That is why we would prefer
deterministic location info, instead of randomly inherit debug info
from previous BB.

Thanks,
Dehao


   i + j;

 then when computed at the insertion point the values of i and j do not
 necessarily reflect the computed value!  Instead we may compute the
 result of i + j using completely different components / operation.

 Thus I think inserted expressions should not have any debug information
 at all because they do not correspond to a source line.

 Richard.

 David

 On Tue, Oct 30, 2012 at 4:38 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
 On Wed, Oct 31, 2012 at 12:00 AM, Dehao Chen wrote:
 This patch aims to improve debugging of optimized code. It ensures
 that PRE inserted statements have the same source location as the
 statement at the insertion point, instead of UNKNOWN_LOCATION.

 Wrong patch attached.

 However, is it really better to have the location of the insertion
 point than to have UNKNOWN_LOCATION? It's not where the value is
 computed in the source program...

 Ciao!
 Steven


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-31 Thread Dehao Chen
 Yeah.  But please also check gdb testsuite for this kind of patches.

This patch also passed gdb testsuite.

Thanks,
Dehao


 Jakub


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-31 Thread Kenneth Zadeck
I was not planning to do that mangling for 4.8.My primary 
justification for getting it in publicly now is that there are a large 
number of places where the current compiler (both at the tree and rtl 
levels) do not do optimization of the value is larger than a single 
hwi.My code generalizes all of these places so that they do the 
transformations independent of the size of the hwi.   (in some cases at 
the rtl level, the transformations were only done on 32 bit or smaller 
types, but i have seen nothing like that at the tree level.)   This 
provides benefits for cross compilers and for ports that support timode 
now.


The fact that i have chosen to do it in such a way that we will never 
have this problem again is the part of the patch that richi seems to 
object to.


We have patches that do the mangling for 256 for the front ends but we 
figured that we would post those for comments.   These are likely to be 
controversial because the require extensions to the syntax to accept 
large constants.


But there is no reason why the patches that fix the existing problems in 
a general way should not be considered for this release.


Kenny

On 10/31/2012 10:27 AM, Jakub Jelinek wrote:

On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote:

if one looks at where intel is going, they are doing exactly the
same thing.The difference is that they like to add the
operations one at a time rather than just do a clean implementation
like we did.   Soon they will get there, it is just a matter of
time.

All I see on Intel is whole vector register shifts (and like on many other
ports and/or/xor/andn could be considered whole register too).
And, even if your port has 256-bit integer arithmetics, there is no mangling
for __int256_t or similar, so I don't see how you can offer such data type
as supported in the 4.8 timeframe.

Jakub




Re: RFA: hookize ADJUST_INSN_LENGTH

2012-10-31 Thread Richard Sandiford
Joern Rennecke joern.renne...@embecosm.com writes:
 Quoting Richard Sandiford rdsandif...@googlemail.com:
 The length variation for the ARC are not alike: there are branches that
 are subject to branch shortening in the usual way, but they might
 shrink when other things shrink.  When we are iterating starting at
 minimal length and increasing lengths, these branches should be stopped
 from shrinking, as they likly will grow back again in the next iteration.
 OTOH there are potentially short instructions that are lengthened to
 archive scheduling-dictated alignment or misalignment.  These should
 be allowed to change freely back and forth.  Unless we have a rare
 cycle that only depends on alignments.

 Hmm, but this is still talking in terms of shorten_branches, rather than
 the target property that we're trying to model.  It sounds like the property
 is something along the lines of: some instructions have different encodings
 (or work-alikes with different encodings), and we want to make all the
 encodings available to the optimisers.  Is that right?

 No, not really.  There are lots of redundant encodings, but we have only
 one favourite encoding for each size.  We usually want the shortest version,
 unless we want to tweak the alignment of a following instruction, or in
 the case of unaligned conditional branches with unfilled delay slot, to
 hide the branch penalty (don't ask me why it has that effect...).

But what I'm trying to get at is: why can't the backend tell
shorten_branches about the amount of alignment/misalignment
that the target wants, and where?  Via an attribute or a hook,
I don't mind which.  But it should be declarative, rather than
a part of the shorten_branches algorithm.

Also...

 If so, that sounds
 like a useful thing to model, but I think it should be modelled directly,
 rather than as a modification to the shorten_branches algorithm.

 The shorten_branches algorithm can't give the correct result without
 taking these things into account.  Also, instruction lengths are needed
 to know the requirements.  So, do you want me to run a scheduling
 pass in each iteration of shorten_branches?

...it looks like arc_reorg already runs shorten_branches in a loop.
Is that right?  So in a sense we already have iteration involving
shorten_branches and the backend.

I realise we want to cut down the number of iterations where possible,
but at the same time, it seems like you're trying to do conditional
execution conversion on the fly during shorten_branches, such that the
rtl doesn't actually represent the instructions we output.  And the
performance problem you're seeing is that those behind-the-scenes
changes are too confusing for the current shorten_branches code to
handle well.  E.g. sometimes you decide to delete branches that stay
in the rtl stream as branches, and that got a length in a previous
iteration.

At some point, as you say, you hav have to commit to using a particular
form (conditionally executed or not) in order to avoid cycles.  But that
seems like a conditional execution decision rather than a shorten_branches
decision.  (It'd also be good to represent it in the rtl if possible.)

Richard


Re: patch to fix constant math - 4th patch - the wide-int class.

2012-10-31 Thread Kenneth Zadeck


On 10/31/2012 09:30 AM, Richard Sandiford wrote:

Richard Biener richard.guent...@gmail.com writes:

But that means that wide_int has to model a P-bit operation as a
normal len*HOST_WIDE_INT operation and then fix up the result
after the fact, which seems unnecessarily convoluted.

It does that right now.  The operations are carried out in a loop
over len HOST_WIDE_INT parts, the last HWI is then special-treated
to account for precision/size.  (yes, 'len' is also used as optimization - the
fact that len ends up being mutable is another thing I dislike about
wide-int.  If wide-ints are cheap then all ops should be non-mutating
(at least to 'len')).

But the point of having a mutating len is that things like zero and -1
are common even for OImode values.  So if you're doing someting potentially
expensive like OImode multiplication, why do it to the number of
HOST_WIDE_INTs needed for an OImode value when the value we're
processing has only one significant HOST_WIDE_INT?
I think with a little thought i can add some special constructors and 
get rid of the mutating aspects of the interface.





  I still don't
see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision
X*HOST_WIDE_INT operation for any X) has any special meaning.

Well, the same reason as a HOST_WIDE_INT variable has a meaning.
We use it to constrain what we (efficiently) want to work on.  For example
CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when
doing bit-constant-propagation in loops (for TImode integers on a x86_64 host).

But what about targets with modes wider than TImode?  Would double_int
still be appropriate then?  If not, why does CCP have to use a templated
type with a fixed number of HWIs (and all arithmetic done to a fixed
number of HWIs) rather than one that can adapt to the runtime values,
like wide_int can?


Oh, and I don't necessary see a use of double_int in its current form
but for an integer representation on the host that is efficient to manipulate
integer constants of a target dependent size.  For example the target
detail that we have partial integer modes with bitsize  precision and that
the bits  precision appearantly have a meaning when looking at the
bit-representation of a constant should not be part of the base class
of wide-int (I doubt it belongs to wide-int at all, but I guess you know more
about the reason we track bitsize in addition to precision - I think it's
abstraction at the wrong level, the tree level does fine without knowing
about bitsize).

TBH I'm uneasy about the bitsize thing too.  I think bitsize is only
tracked for shift truncation, and if so, I agree it makes sense
to do that separately.

But anyway, this whole discussion seems to have reached a stalemate.
Or I suppose a de-facto rejection, since you're the only person in
a position to approve the thing :-)

Richard




  1   2   >