Re: [RFC, ivopts] fix bugs in ivopts address cost computation

2012-07-05 Thread Jiangning Liu
Hi,

For the following code change,

@@ -4212,11 +4064,6 @@ get_computation_cost_at (struct ivopts_d
 cost.cost += adjust_setup_cost (data,
add_cost (TYPE_MODE (ctype), speed));

-  /* Having offset does not affect runtime cost in case it is added to
- symbol, but it increases complexity.  */
-  if (offset)
-cost.complexity++;
-
   cost.cost += add_cost (TYPE_MODE (ctype), speed);

   aratio = ratio  0 ? ratio : -ratio;

I think this shouldn't be removed. The offset may be affected by the
position of inserting reduction variable accumulation statement. There
will be different cases between before and after reduction variable
accumulation. The cost of replacing use point with reduction variable
should be different accordingly.

BTW, I personally think the current ivopt cost modelling basically
works fine, although there might be some tunings needed. The most
difficult part is the choice of reduction variable candidates has
something to do with register pressure cost, while the register cost
estimate is not accurate enough at this stage because we don't have
back-end live range interference graph at all. we are always able to
find holes on some particular cases or benchmarks, but we can't only
want to find a optimal result for them, and the tuning needs to be
backed by more comprehensive result.

Thanks,
-Jiangning

2012/6/6 Sandra Loosemore san...@codesourcery.com:
 My colleagues and I have been working on the GCC port for the Qualcomm
 Hexagon.  Along the way I noticed that we were getting poor results
 from the ivopts pass no matter how we adjusted the target-specific RTX
 costs.  In many cases ivopts was coming up with candidate use costs
 that seemed completely inconsistent with the target cost model.  On
 further inspection, I found what appears to be a whole bunch of bugs
 in the way ivopts is computing address costs:

 (1) While the address cost computation is assuming in some situations
 that pre/post increment/decrement addressing will be used if
 supported by the target, it isn't actually using the target's address
 cost for such forms -- instead, just the cost of the form that would
 be used if autoinc weren't available/applicable.

 (2) The computation to determine which multiplier values are supported
 by target addressing modes is constructing an address rtx of the form
 (reg * ratio) to do the tests.  This isn't a valid address RTX on
 Hexagon, although both (reg + reg * ratio) and (sym + reg * ratio)
 are.  Because it's choosing the wrong address form to probe with, it
 thinks that the target doesn't support multipliers at all and is
 incorrectly tacking on an extra cost for them.  I also note that it's
 assuming that the same set of ratios are supported by all three
 address forms that can potentially include them, and that all valid
 ratios have the same cost.

 (3) The computation to determine the range of valid constant offsets
 for address forms that can include them is probing the upper end of
 the range using constants of the form ((1n) - 1).  On Hexagon, the
 offsets have to be aligned appropriately for the mode, so it's
 incorrectly rejecting all positive offsets for non-char modes.  And
 again, it's assuming that the same range of offsets are supported by
 all address forms that can legitimately include them, and that all
 valid offsets have the same cost.  The latter isn't true on Hexagon.

 (4) The cost adjustment for converting the symbol_present address to a
 var_present address seems overly optimistic in assuming that the
 symbol load will be hoisted outside the loop.  I looked at a lot of
 code where this was not happening no matter how expensive I made the
 absolute addressing forms in the target-specific costs.  Also, if
 subsequent passes actually do hoist the symbol load, this requires an
 additional register to be available over the entire loop, which ivopts
 isn't accounting for in any way.  It seems to me that this adjustment
 shouldn't be attempted when the symbol_present form is a legitimate
 address (because subsequent passes are not doing the anticipated
 optimization in that case).  There's also a bug present in the cost
 accounting: it's adding the full cost of the symbol + var addition,
 whereas it should be pro-rated across iterations (see the way this is
 handled for the non-address cases in get_computation_cost_at).

 (5) The documentation of TARGET_ADDRESS_COST says that when two
 (legitimate) address expressions have the same lowest cost, the one
 with the higher complexity is used.  But, the ivopts code is doing the
 opposite of this and choosing the lower complexity as the tie-breaker.
 (Actually, it's also computing the complexity without regard to
 whether the address rtx is even legitimate on the target.)

 (6) The way get_address_costs is precomputing and caching a complete
 set of cost data for each addressing mode seems incorrect to me, in
 the general case.  For example, consider MIPS where MIPS16 

[PATCH][Testsuite] XFAIL scev-3/4.c and add scev-5.c

2012-03-21 Thread Jiangning Liu
Hi,

This patch is to XFAIL scev-3.c and scev-5.c. 

The bug is going to be fixed after Richard Guenther fix a serials of
problems related to POINTER_PLUS_EXPR and sizetype precision.

Thanks,
-Jiangning 

ChangeLog for testsuite:

2012-03-21  Jiangning Liu  jiangning@arm.com

PR tree-optimization/52563
* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-5.c: New.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
index 28d5c93..ed63a18 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -14,5 +14,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { scan-tree-dump-times a 1 optimized { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump optimized } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
index 6c1e530..a538c32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -19,5 +19,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { scan-tree-dump-times a 1 optimized { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump optimized } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
new file mode 100644
index 000..b9de36a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+long long i;
+
+for (i=k; i1000; i+=k) {
+a_p = a[i];
+*a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */





RE: [PATCH] Improve SCEV for array element

2012-03-07 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, March 06, 2012 9:12 PM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Improve SCEV for array element
 
 On Fri, Jan 20, 2012 at 10:06 AM, Jiangning Liu jiangning@arm.com
 wrote:
  It's definitely not ok at this stage but at most for next stage1.
 
  OK. I may wait until next stage1.
 
  This is a very narrow pattern-match.  It doesn't allow for a[i].x
 for
  example,
  even if a[i] is a one-element structure.  I think the canonical way
 of
  handling
  ADDR_EXPR is to use sth like
 
  base = get_inner_reference (TREE_OPERAND (rhs1, 0), ...,
 offset,  ...);
  base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
          chrec1 = analyze_scalar_evolution (loop, base);
          chrec2 = analyze_scalar_evolution (loop, offset);
          chrec1 = chrec_convert (type, chrec1, at_stmt);
          chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
          res = chrec_fold_plus (type, chrec1, chrec2);
 
  where you probably need to handle scev_not_known when analyzing
 offset
  (which might be NULL).  You also need to add bitpos to the base
 address
  (in bytes, of course).  Note that the MEM_REF case would naturally
  work
  with this as well.
 
  OK. New patch is like below, and bootstrapped on x86-32.
 
 You want instead of
 
 +  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
 +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
 +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
 +{
 
 if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
 || handled_component_p (TREE_OPERAND (rhs1, 0)))
   {
 
 + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
 + chrec1 = analyze_scalar_evolution (loop, base);
 
 can you please add a wrapper
 
 tree
 analyze_scalar_evolution_for_address_of (struct loop *loop, tree var)
 {
   return analyze_scalar_evolution (loop, build_fold_addr_expr (var));
 }
 
 and call that instead of building the ADDR_EXPR there?  We want
 to avoid building that tree node, but even such a simple wrapper would
 be prefered.
 
 + if (bitpos)
 
 if (bitpos != 0)
 
 + chrec3 = build_int_cst (integer_type_node,
 + bitpos / BITS_PER_UNIT);
 
 please use size_int (bitpos / BITS_PER_UNIT) instead.  Using
 integer_type_node is definitely wrong.
 
 Ok with that changes.
 

Richard,

Modified as all you suggested, and new code is like below. Bootstrapped on
x86-32. OK for trunk now?

Thanks,
-Jiangning

ChangeLog:

2012-03-08  Jiangning Liu  jiangning@arm.com

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference and component reference.
(analyze_scalar_evolution_for_address_of): New.

ChangeLog for testsuite:

2012-03-08  Jiangning Liu  jiangning@arm.com

* gcc.dg/tree-ssa/scev-3.c: New.
* gcc.dg/tree-ssa/scev-4.c: New.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
new file mode 100644
index 000..6c1e530
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+typedef struct {
+   int x;
+   int y;
+} S;
+
+int *a_p;
+S a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i].y;
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..c719984
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -266,6 +266,8 @@ along with GCC; see the file COPYING3.  If not see
 #include params.h
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree, tree);
+static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
+tree var);
 
 /* The cached information about an SSA name VAR, claiming that below
basic block INSTANTIATED_BELOW, the value of VAR can be expressed
@@ -1712,16 +1714,59 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
-  /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR

RE: [PATCH] Improve SCEV for array element

2012-02-13 Thread Jiangning Liu


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: Friday, January 20, 2012 5:07 PM
 To: 'Richard Guenther'
 Cc: gcc-patches@gcc.gnu.org
 Subject: RE: [PATCH] Improve SCEV for array element
 
  It's definitely not ok at this stage but at most for next stage1.
 
 OK. I may wait until next stage1.
 
  This is a very narrow pattern-match.  It doesn't allow for a[i].x
 for
  example, even if a[i] is a one-element structure.  I think the
  canonical way of handling ADDR_EXPR is to use sth like
 
  base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., offset,
  ...); base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
  chrec1 = analyze_scalar_evolution (loop, base);
  chrec2 = analyze_scalar_evolution (loop, offset);
  chrec1 = chrec_convert (type, chrec1, at_stmt);
  chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
  res = chrec_fold_plus (type, chrec1, chrec2);
 
  where you probably need to handle scev_not_known when analyzing
 offset
  (which might be NULL).  You also need to add bitpos to the base
  address (in bytes, of course).  Note that the MEM_REF case would
  naturally work with this as well.
 
 OK. New patch is like below, and bootstrapped on x86-32.
 
 ChangeLog:
 
 2012-01-20  Jiangning Liu  jiangning@arm.com
 
 * tree-scalar-evolution (interpret_rhs_expr): generate chrec
 for
 array reference and component reference.
 
 
 ChangeLog for testsuite:
 
 2012-01-20  Jiangning Liu  jiangning@arm.com
 
 * gcc.dg/tree-ssa/scev-3.c: New.
 * gcc.dg/tree-ssa/scev-4.c: New.
 

Richard,

PING... Is this patch OK after branch 4.7 is created and trunk is open
again?

Thanks,
-Jiangning

 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
 b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
 new file mode 100644
 index 000..28d5c93
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
 @@ -0,0 +1,18 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +int *a_p;
 +int a[1000];
 +
 +f(int k)
 +{
 + int i;
 +
 + for (i=k; i1000; i+=k) {
 + a_p = a[i];
 + *a_p = 100;
 +}
 +}
 +
 +/* { dg-final { scan-tree-dump-times a 1 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
 b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
 new file mode 100644
 index 000..6c1e530
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
 @@ -0,0 +1,23 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +typedef struct {
 + int x;
 + int y;
 +} S;
 +
 +int *a_p;
 +S a[1000];
 +
 +f(int k)
 +{
 + int i;
 +
 + for (i=k; i1000; i+=k) {
 + a_p = a[i].y;
 + *a_p = 100;
 +}
 +}
 +
 +/* { dg-final { scan-tree-dump-times a 1 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
 index 2077c8d..4e06b75
 --- a/gcc/tree-scalar-evolution.c
 +++ b/gcc/tree-scalar-evolution.c
 @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple
 at_stmt,
switch (code)
  {
  case ADDR_EXPR:
 -  /* Handle MEM[ptr + CST] which is equivalent to
 POINTER_PLUS_EXPR.
 */
 -  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
 - {
 -   res = chrec_dont_know;
 -   break;
 - }
 +  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
 +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
 +  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
 +{
 +   enum machine_mode mode;
 +   HOST_WIDE_INT bitsize, bitpos;
 +   int unsignedp;
 +   int volatilep = 0;
 +   tree base, offset;
 +   tree chrec3;
 +
 +   base = get_inner_reference (TREE_OPERAND (rhs1, 0),
 +   bitsize, bitpos, offset,
 +   mode, unsignedp, volatilep, false);
 +
 +   if (TREE_CODE (base) == MEM_REF)
 + {
 +   rhs2 = TREE_OPERAND (base, 1);
 +   rhs1 = TREE_OPERAND (base, 0);
 +
 +   chrec1 = analyze_scalar_evolution (loop, rhs1);
 +   chrec2 = analyze_scalar_evolution (loop, rhs2);
 +   chrec1 = chrec_convert (type, chrec1, at_stmt);
 +   chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt);
 +   res = chrec_fold_plus (type, chrec1, chrec2);
 + }
 +   else
 + {
 +   base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
 +   chrec1 = analyze_scalar_evolution (loop, base);
 +   chrec1 = chrec_convert (type, chrec1, at_stmt);
 +   res = chrec1;
 + }
 
 -  rhs2 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 1);
 -  rhs1 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 0);
 -  /* Fall through.  */
 +   if (offset != NULL_TREE

RE: [PATCH] Improve SCEV for array element

2012-01-20 Thread Jiangning Liu
 It's definitely not ok at this stage but at most for next stage1.

OK. I may wait until next stage1.

 This is a very narrow pattern-match.  It doesn't allow for a[i].x for
 example,
 even if a[i] is a one-element structure.  I think the canonical way of
 handling
 ADDR_EXPR is to use sth like
 
 base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., offset,  ...);
 base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
 chrec1 = analyze_scalar_evolution (loop, base);
 chrec2 = analyze_scalar_evolution (loop, offset);
 chrec1 = chrec_convert (type, chrec1, at_stmt);
 chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
 res = chrec_fold_plus (type, chrec1, chrec2);
 
 where you probably need to handle scev_not_known when analyzing offset
 (which might be NULL).  You also need to add bitpos to the base address
 (in bytes, of course).  Note that the MEM_REF case would naturally
 work
 with this as well.

OK. New patch is like below, and bootstrapped on x86-32.

ChangeLog:

2012-01-20  Jiangning Liu  jiangning@arm.com

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference and component reference.


ChangeLog for testsuite:

2012-01-20  Jiangning Liu  jiangning@arm.com

* gcc.dg/tree-ssa/scev-3.c: New.
* gcc.dg/tree-ssa/scev-4.c: New.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
new file mode 100644
index 000..6c1e530
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+typedef struct {
+   int x;
+   int y;
+} S;
+
+int *a_p;
+S a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i].y;
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..4e06b75
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
-  /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
-  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
-   {
- res = chrec_dont_know;
- break;
-   }
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF
+  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF
+  || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF)
+{
+ enum machine_mode mode;
+ HOST_WIDE_INT bitsize, bitpos;
+ int unsignedp;
+ int volatilep = 0;
+ tree base, offset;
+ tree chrec3;
+
+ base = get_inner_reference (TREE_OPERAND (rhs1, 0),
+ bitsize, bitpos, offset,
+ mode, unsignedp, volatilep, false);
+
+ if (TREE_CODE (base) == MEM_REF)
+   {
+ rhs2 = TREE_OPERAND (base, 1);
+ rhs1 = TREE_OPERAND (base, 0);
+
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+   }
+ else
+   {
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ res = chrec1;
+   }
 
-  rhs2 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 1);
-  rhs1 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 0);
-  /* Fall through.  */
+ if (offset != NULL_TREE)
+   {
+ chrec2 = analyze_scalar_evolution (loop, offset);
+ chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, res, chrec2);
+   }
+
+ if (bitpos)
+   {
+ gcc_assert ((bitpos % BITS_PER_UNIT) == 0);
+
+ chrec3 = build_int_cst (integer_type_node,
+ bitpos / BITS_PER_UNIT

[PATCH] Improve SCEV for array element

2012-01-17 Thread Jiangning Liu
This code change intends to improve scev for array element and reduce the
common sub-expressions in loop, which may be introduced by multiple
reference of expression like a[i]. With this optimization the register
pressure can be reduced in loops. 

The problem is originally from a real benchmark, and the test case only
tries to detect the GIMPLE level changes.

Bootstraped on x86-32. OK for trunk?

ChangeLog:

2012-01-05  Jiangning Liu  jiangning@arm.com

* tree-scalar-evolution (interpret_rhs_expr): generate chrec for
array reference.

ChangeLog for testsuite:

2012-01-05  Jiangning Liu  jiangning@arm.com

* gcc.dg/scev-1.c: New.

diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c
new file mode 100644 index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/scev-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index
2077c8d..de89fc4
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+{
+ tree array_ref;
+ tree var_decl, base, offset;
+ tree low_bound;
+ tree unit_size;
+ tree index;
+
+ array_ref = TREE_OPERAND (rhs1, 0);
+ var_decl = TREE_OPERAND (array_ref, 0);
+ index = TREE_OPERAND (array_ref, 1);
+
+ low_bound = array_ref_low_bound (array_ref);
+ unit_size = array_ref_element_size (array_ref);
+
+ /* We assume all arrays have sizes that are a multiple of a byte.
+First subtract the lower bound, if any, in the type of the
+index, then convert to sizetype and multiply by the size of
+the array element.  */
+ if (! integer_zerop (low_bound))
+   index = fold_build2 (MINUS_EXPR, TREE_TYPE (index),
+index, low_bound);
+
+ offset = size_binop (MULT_EXPR,
+  fold_convert (sizetype, index),
+  unit_size);
+
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec2 = analyze_scalar_evolution (loop, offset);
+  chrec1 = chrec_convert (type, chrec1, at_stmt);
+  chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+ break;
+}
+
   /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
   if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
{

Thanks,
-Jiangning

scev.patch
Description: Binary data


RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2012-01-11 Thread Jiangning Liu


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: Wednesday, December 21, 2011 2:48 PM
 To: 'Richard Henderson'
 Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
 Subject: RE: [RFC] Use REG_EXPR in back-end (introduced by optimization
 to conditional and/or in ARM back-end)
 
 
  -Original Message-
  From: Richard Henderson [mailto:r...@redhat.com]
  Sent: Tuesday, November 22, 2011 9:55 AM
  To: Jiangning Liu
  Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
  Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by
 optimization
  to conditional and/or in ARM back-end)
 
  On 11/21/2011 05:31 PM, Jiangning Liu wrote:
   My question is essentially is May I really use REG_EXPR in back-
 end
  code?
   like the patch I gave below?
 
  I suppose.
 
  Another alternative is to use BImode for booleans.  Dunno how much of
  that
  you'd be able to gleen from mere rtl expansion or if you'd need
 boolean
  decls to be expanded with BImode.
 
 Hi Richard,
 
 The output of expand pass requires the operands must meet the
 requirement of
 general_operand for binary operations, i.e. all RTX operations on top
 level
 must meet the hardware instruction requirement. Generally for a
 hardware not
 directly supporting a single bit logic operation, we can't generate BI
 mode
 rtx dest.
 
 So if I simply generate BImode for NE in expand pass, like subreg:SI
 (ne:BI
 (reg:SI xxx) (const_int 0))), there would be an assertion failure due
 to
 failing to find an appropriate instruction code to operate on a single
 bit.
 
 This looks like a GCC design level issue. How would you suggest
 generating a
 legitimate rtx expression containing BImode?
 

Can anybody give me useful suggestions on this issue?

My question essentially is may I really use BImode to solve my original
problem?

Thanks,
-Jiangning

 Thanks,
 -Jiangning
 
 
  The later would probably need a target hook.  I've often wondered how
  much
  ia64 would benefit from that too, being able to store bool variables
  directly
  in predicate registers.
 
  All of this almost certainly must wait until stage1 opens up again
  though...
 
 
  r~
 
 
 
 






[RFC][patch] improve scev for array element

2012-01-05 Thread Jiangning Liu
This code change intends to improve scev for array element and reduce the
common sub-expressions in loop, which may be introduced by multiple
reference of expression a[i]. In the end the register pressure may be
reduced in the loop. The test case is simplified from a real benchmark, and
only want to show the GIMPLE level changes.

Any suggestions?

diff --git a/gcc/testsuite/gcc.dg/scev-1.c b/gcc/testsuite/gcc.dg/scev-1.c
new file mode 100644
index 000..28d5c93
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/scev-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+   int i;
+
+   for (i=k; i1000; i+=k) {
+   a_p = a[i];
+   *a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 2077c8d..de89fc4
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1712,6 +1712,42 @@ interpret_rhs_expr (struct loop *loop, gimple
at_stmt,
   switch (code)
 {
 case ADDR_EXPR:
+  if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+{
+ tree array_ref;
+ tree var_decl, base, offset;
+ tree low_bound;
+ tree unit_size;
+ tree index;
+
+ array_ref = TREE_OPERAND (rhs1, 0);
+ var_decl = TREE_OPERAND (array_ref, 0);
+ index = TREE_OPERAND (array_ref, 1);
+
+ low_bound = array_ref_low_bound (array_ref);
+ unit_size = array_ref_element_size (array_ref);
+
+ /* We assume all arrays have sizes that are a multiple of a byte.
+First subtract the lower bound, if any, in the type of the
+index, then convert to sizetype and multiply by the size of
+the array element.  */
+ if (! integer_zerop (low_bound))
+   index = fold_build2 (MINUS_EXPR, TREE_TYPE (index),
+index, low_bound);
+
+ offset = size_binop (MULT_EXPR,
+  fold_convert (sizetype, index),
+  unit_size);
+
+ base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), var_decl);
+ chrec1 = analyze_scalar_evolution (loop, base);
+ chrec2 = analyze_scalar_evolution (loop, offset);
+  chrec1 = chrec_convert (type, chrec1, at_stmt);
+  chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt);
+ res = chrec_fold_plus (type, chrec1, chrec2);
+ break;
+}
+
   /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.
*/
   if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF)
{

Another bigger case may be like below, and register pressure could be
observed lower.

int a[512] ;
int b[512] ;
int *ax ;
int *bx ;
int *ay ;
int *by ;
int i ;
int j ;

int f(int k)
{
for( j = 0 ; j  k ; j++)
{
for( i = j ; i  512 ; i += k)
{
ax = a[i+k] ;
bx = b[i+k] ;
ay = a[i] ;
by = b[i] ;

*ax = 7 ;
*bx = 7 ;

a[i] = 8;
b[i] = 8;
}
}
}

Thanks,
-Jiangning






RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2011-12-20 Thread Jiangning Liu

 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Tuesday, November 22, 2011 9:55 AM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org; 'Richard Guenther'
 Subject: Re: [RFC] Use REG_EXPR in back-end (introduced by optimization
 to conditional and/or in ARM back-end)
 
 On 11/21/2011 05:31 PM, Jiangning Liu wrote:
  My question is essentially is May I really use REG_EXPR in back-end
 code?
  like the patch I gave below?
 
 I suppose.
 
 Another alternative is to use BImode for booleans.  Dunno how much of
 that
 you'd be able to gleen from mere rtl expansion or if you'd need boolean
 decls to be expanded with BImode.

Hi Richard,

The output of expand pass requires the operands must meet the requirement of
general_operand for binary operations, i.e. all RTX operations on top level
must meet the hardware instruction requirement. Generally for a hardware not
directly supporting a single bit logic operation, we can't generate BI mode
rtx dest.

So if I simply generate BImode for NE in expand pass, like subreg:SI (ne:BI
(reg:SI xxx) (const_int 0))), there would be an assertion failure due to
failing to find an appropriate instruction code to operate on a single bit.

This looks like a GCC design level issue. How would you suggest generating a
legitimate rtx expression containing BImode?

Thanks,
-Jiangning

 
 The later would probably need a target hook.  I've often wondered how
 much
 ia64 would benefit from that too, being able to store bool variables
 directly
 in predicate registers.
 
 All of this almost certainly must wait until stage1 opens up again
 though...
 
 
 r~






RE: [RFC] Optimization to conditional and/or in ARM back-end

2011-11-22 Thread Jiangning Liu


 -Original Message-
 From: Andrew Pinski [mailto:pins...@gmail.com]
 Sent: Tuesday, November 22, 2011 1:14 PM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org; Richard Guenther; Richard Henderson
 Subject: Re: [RFC] Optimization to conditional and/or in ARM back-end
 
 On Sun, Nov 20, 2011 at 6:17 PM, Jiangning Liu jiangning@arm.com
 wrote:
  Hi,
 
  This patch is to implement a peephole like optimization in ARM back-
 end.
 
  If we have an if condition expression like ((r3 != 0)  r1) != 0,
 
 So this is the same as:
 int f1(int r1, int r3)
 {
   if (((r3 != 0)  r1) != 0)
 return g();
   return 1;
 }
 --- CUT ---
 Can't you do this instead:
 Combine the following two instructions:
 (insn 17 15 18 2 (parallel [
 (set (reg:SI 150)
 (and:SI (ne:SI (reg:SI 0 r0 [ r1 ])
 (const_int 0 [0]))
 (reg:SI 1 r1 [ r3 ])))
 (clobber (reg:CC 24 cc))
 ]) t24.c:11 274 {*cond_arith}
  (expr_list:REG_UNUSED (reg:CC 24 cc)
 (expr_list:REG_DEAD (reg:SI 1 r1 [ r3 ])
 (expr_list:REG_DEAD (reg:SI 0 r0 [ r1 ])
 (nil)
 
 (insn 18 17 19 2 (set (reg:CC 24 cc)
 (compare:CC (reg:SI 150)
 (const_int 0 [0]))) t24.c:11 211 {*arm_cmpsi_insn}
  (expr_list:REG_DEAD (reg:SI 150)
 (nil)))
 
 And then have a pattern which expands it to (note the r3 here is the
 r3 in the function, likewise for r1):
 andi r1, r1, #1
 cmp r3, #0
 it  ne
 cmpne   r1, #0
 
 Yes it is one extra instruction but it is still better than before
 right?

Hi Andrew,

Yes, and even the code generated can be like

cmp r3, #0
itt ne
andine  r1, r1, #1
cmpne   r1, #0

But this is not what I want. For performance purpose, I want to remove that 
andi instruction as well.

Also, actually if we don't modify r1, there might be other more optimization 
opportunities, for example register pressure can be reduced for some cases, I 
guess.

Thanks,
-Jiangning





RE: [RFC] Use REG_EXPR in back-end (introduced by optimization to conditional and/or in ARM back-end)

2011-11-21 Thread Jiangning Liu
The original subject doesn't catch the key point, so I changed the subject
to get more people noticed.

My question is essentially is May I really use REG_EXPR in back-end code?
like the patch I gave below?

Thanks,
-Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: Monday, November 21, 2011 10:18 AM
 To: gcc-patches@gcc.gnu.org
 Cc: 'Richard Guenther'; Richard Henderson
 Subject: [RFC] Optimization to conditional and/or in ARM back-end
 
 Hi,
 
 This patch is to implement a peephole like optimization in ARM back-end.
 
 If we have an if condition expression like ((r3 != 0)  r1) != 0,
 originally the binary code to be generated is like,
 
 cmp r3, #0
 ite eq
 moveq   r1, #0
 andne   r1, r1, #1
 cmp r1, #0
 
 But actually this expression could be transformed into ((r3 != x) 
 (r1 !=
 0)) != 0, if we know r2 is with bool type. This way we could generate
 new
 binary code like,
 
 cmp r3, #0
 it  ne
 cmpne   r1, #0
 
 The question is how to judge r2 is a bool variable in back-end. I'm
 using
 REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a
 bool,
 this function is being invoked in pattern *cond_code.
 
 May I really use REG_EXPR this way in GCC back-end code? I posted a
 related
 topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html.
 
 If the answer is No, is there any suggestions about either how to judge
 this
 bool type or how to implement this optimization?
 
 Appreciate your comments in advance!
 
 Thanks,
 -Jiangning
 
 
 diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
 index 23a29c6..8b12d48
 --- a/gcc/config/arm/arm-protos.h
 +++ b/gcc/config/arm/arm-protos.h
 @@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx,
 int,
 rtx, HOST_WIDE_INT *);
  extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx,
 HOST_WIDE_INT
 *);
  extern int arm_gen_movmemqi (rtx *);
  extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 +extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx);
  extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
  HOST_WIDE_INT);
  extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx);
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index a429c19..e96f24a
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands)
return 1;
  }
 
 +/* Check whether the expression rc cmp1 bool_reg can be
 transformed
 +   to use conditional comparison or not.  */
 +bool
 +arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) {
 +  rtx cmp2;
 +  HOST_WIDE_INT dom_cc;
 +
 +  if (!REG_P (bool_reg))
 +return false;
 +
 +  if (REG_EXPR (bool_reg) == NULL)
 +return false;
 +
 +  if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL)
 +return false;
 +
 +  if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE)
 +return false;
 +
 +  cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx);
 +
 +  if (rc == AND)
 +dom_cc = DOM_CC_X_AND_Y;
 +  else if (rc == IOR)
 +dom_cc = DOM_CC_X_OR_Y;
 +  else
 +gcc_unreachable ();
 +
 +  if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode)
 +return false;
 +
 +  return true;
 +}
 +
 +
  /* Select a dominance comparison mode if possible for a test of the
 general
 form (OP (COND_OR (X) (Y)) (const_int 0)).  We support three forms.
 COND_OR == DOM_CC_X_AND_Y = (X  Y)
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index a78ba88..e90a78e
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -9172,6 +9172,64 @@
 (set_attr length 4,4,8)]
  )
 
 +; This pattern matches expression like example ((r1 != x)  r2) != 0.
 +; Here r2 is a register with data type boolean.
 +; This pattern can transform to code matching patterns cmp_and.
 +; Likewise, code matching pattern cmp_ior could be generated, if it is
 |
 +; rather than  in the example.
 +(define_insn_and_split *cond_code
 +  [(set (match_operand 0 cc_register )
 + (compare
 + (ior_and:SI
 +   (match_operator:SI 4 arm_comparison_operator
 +   [(match_operand:SI 2 s_register_operand r,r)
 + (match_operand:SI 3 arm_add_operand rI,L)])
 +  (match_operand:SI 1 s_register_operand r,r))
 +  (const_int 0)))]
 +  TARGET_32BIT
 +arm_check_logic_with_bool_reg (CODE, operands[1],
 operands[4])
 +  #
 +   1
 +  [(set (match_dup 7)
 + (compare
 +  (match_op_dup 5
 +   [(match_op_dup 4 [(match_dup 2) (match_dup 3)])
 +(match_op_dup 6 [(match_dup 1) (const_int 0)])])
 +  (const_int 0)))]
 +  
 +  {
 +HOST_WIDE_INT dom_cc;
 +
 +operands[6] = gen_rtx_NE (SImode, operands[1], const0_rtx);
 +
 +if (CODE == AND)
 +  {
 +dom_cc = DOM_CC_X_AND_Y

RE: [RFC] Use which_alternative in preparation-statements of define_insn_and_split

2011-11-21 Thread Jiangning Liu


 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Tuesday, November 22, 2011 7:55 AM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [RFC] Use which_alternative in preparation-statements of
 define_insn_and_split
 
 On 11/20/2011 07:34 PM, Jiangning Liu wrote:
  Hi,
 
  I find which_alternative can't really be used in preparation-
 statements of
  define_insn_and_split, so can this be fixed like below?
 
  For example, I want to use which_alternative in the pattern below,
 
  (define_insn_and_split *thumb2_movsicc_insn
[(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
  (if_then_else:SI
   (match_operator 3 arm_comparison_operator
[(match_operand 4 cc_register ) (const_int 0)])
   (match_operand:SI 1 arm_not_operand 0,0,rI,K,rI,rI,K,K)
   (match_operand:SI 2 arm_not_operand rI,K,0,0,rI,K,rI,K)))]
TARGET_THUMB2
@
 it\\t%D3\;mov%D3\\t%0, %2
 it\\t%D3\;mvn%D3\\t%0, #%B2
 it\\t%d3\;mov%d3\\t%0, %1
 it\\t%d3\;mvn%d3\\t%0, #%B1
 ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
 ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
 ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
 ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
 reload_completed
[(cond_exec (match_dup 5)
(set (match_dup 0) (match_dup 6)))]

{
  if (which_alternative = 2  which_alternative  4)
{
  ...
}
  else if (which_alternative = 4)
 
 Hmm, I guess.  It seems a bit weird.
 
 It seems like you'd be better off *not* using define_insn_and_split,
 actually, and instead using more specific tests on the separate
 define_split than you would on the define_insn.
 

Yes. That would be alternative solution I can accept. 

But I still want to know why we don't want to support this? I don't see any
GCC documentation saying not allowing this usage.

Or you might be thinking this change is not safe enough?

Thanks,
-Jiangning

 I don't feel strongly about it though.  I won't object if some other
 rtl maintainer wants to approve this.
 
 
 r~






[RFC] Optimization to conditional and/or in ARM back-end

2011-11-20 Thread Jiangning Liu
Hi,

This patch is to implement a peephole like optimization in ARM back-end.

If we have an if condition expression like ((r3 != 0)  r1) != 0,
originally the binary code to be generated is like,

cmp r3, #0
ite eq
moveq   r1, #0
andne   r1, r1, #1
cmp r1, #0

But actually this expression could be transformed into ((r3 != x)  (r1 !=
0)) != 0, if we know r2 is with bool type. This way we could generate new
binary code like,

cmp r3, #0
it  ne
cmpne   r1, #0

The question is how to judge r2 is a bool variable in back-end. I'm using
REG_EXPR in function arm_check_logic_with_bool_reg to check if r2 is a bool,
this function is being invoked in pattern *cond_code. 

May I really use REG_EXPR this way in GCC back-end code? I posted a related
topic at http://gcc.gnu.org/ml/gcc/2011-11/msg00417.html.

If the answer is No, is there any suggestions about either how to judge this
bool type or how to implement this optimization?

Appreciate your comments in advance!

Thanks,
-Jiangning


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 23a29c6..8b12d48
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -114,6 +114,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx, int,
rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT
*);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
+extern bool arm_check_logic_with_bool_reg (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
   HOST_WIDE_INT);
 extern rtx arm_gen_compare_reg (RTX_CODE, rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a429c19..e96f24a
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10910,6 +10910,41 @@ arm_gen_movmemqi (rtx *operands)
   return 1;
 }
 
+/* Check whether the expression rc cmp1 bool_reg can be transformed
+   to use conditional comparison or not.  */
+bool
+arm_check_logic_with_bool_reg (RTX_CODE rc, rtx bool_reg, rtx cmp1) {
+  rtx cmp2;
+  HOST_WIDE_INT dom_cc;
+
+  if (!REG_P (bool_reg))
+return false;
+
+  if (REG_EXPR (bool_reg) == NULL)
+return false;
+
+  if (TREE_CODE (REG_EXPR (bool_reg)) != VAR_DECL)
+return false;
+
+  if (TREE_CODE (TREE_TYPE (REG_EXPR (bool_reg))) != BOOLEAN_TYPE)
+return false;
+
+  cmp2 = gen_rtx_NE (GET_MODE (bool_reg), bool_reg, const0_rtx);
+
+  if (rc == AND)
+dom_cc = DOM_CC_X_AND_Y;
+  else if (rc == IOR)
+dom_cc = DOM_CC_X_OR_Y;
+  else
+gcc_unreachable ();
+
+  if (arm_select_dominance_cc_mode (cmp1, cmp2, dom_cc) == CCmode)
+return false;
+
+  return true;
+}
+
+
 /* Select a dominance comparison mode if possible for a test of the general
form (OP (COND_OR (X) (Y)) (const_int 0)).  We support three forms.
COND_OR == DOM_CC_X_AND_Y = (X  Y)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a78ba88..e90a78e
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9172,6 +9172,64 @@
(set_attr length 4,4,8)]
 )
 
+; This pattern matches expression like example ((r1 != x)  r2) != 0.
+; Here r2 is a register with data type boolean.
+; This pattern can transform to code matching patterns cmp_and.
+; Likewise, code matching pattern cmp_ior could be generated, if it is |
+; rather than  in the example.
+(define_insn_and_split *cond_code
+  [(set (match_operand 0 cc_register )
+   (compare
+ (ior_and:SI
+ (match_operator:SI 4 arm_comparison_operator
+   [(match_operand:SI 2 s_register_operand r,r)
+   (match_operand:SI 3 arm_add_operand rI,L)])
+  (match_operand:SI 1 s_register_operand r,r))
+(const_int 0)))]
+  TARGET_32BIT
+arm_check_logic_with_bool_reg (CODE, operands[1], operands[4])
+  #
+   1
+  [(set (match_dup 7)
+   (compare
+(match_op_dup 5
+ [(match_op_dup 4 [(match_dup 2) (match_dup 3)])
+  (match_op_dup 6 [(match_dup 1) (const_int 0)])])
+(const_int 0)))]
+  
+  {
+HOST_WIDE_INT dom_cc;
+
+operands[6] = gen_rtx_NE (SImode, operands[1], const0_rtx);
+
+if (CODE == AND)
+  {
+dom_cc = DOM_CC_X_AND_Y;
+operands[5] = gen_rtx_fmt_ee (CODE, SImode,
+  operands[4], operands[6]);
+  }
+else if (CODE == IOR)
+  {
+dom_cc = DOM_CC_X_OR_Y;
+operands[5] = gen_rtx_fmt_ee (CODE, SImode,
+  operands[4], operands[6]);
+  }
+else
+  gcc_unreachable ();
+
+operands[7]
+  = gen_rtx_REG (arm_select_dominance_cc_mode (operands[4],
+   operands[6],
+  dom_cc),
+CC_REGNUM);
+  }
+  [(set_attr conds clob)
+   (set (attr length)
+   (if_then_else 

[RFC] Use which_alternative in preparation-statements of define_insn_and_split

2011-11-20 Thread Jiangning Liu
Hi,

I find which_alternative can't really be used in preparation-statements of
define_insn_and_split, so can this be fixed like below?

For example, I want to use which_alternative in the pattern below,

(define_insn_and_split *thumb2_movsicc_insn
  [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
(if_then_else:SI
 (match_operator 3 arm_comparison_operator
  [(match_operand 4 cc_register ) (const_int 0)])
 (match_operand:SI 1 arm_not_operand 0,0,rI,K,rI,rI,K,K)
 (match_operand:SI 2 arm_not_operand rI,K,0,0,rI,K,rI,K)))]
  TARGET_THUMB2
  @
   it\\t%D3\;mov%D3\\t%0, %2
   it\\t%D3\;mvn%D3\\t%0, #%B2
   it\\t%d3\;mov%d3\\t%0, %1
   it\\t%d3\;mvn%d3\\t%0, #%B1
   ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
   ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
   reload_completed
  [(cond_exec (match_dup 5)
  (set (match_dup 0) (match_dup 6)))]
  
  {
if (which_alternative = 2  which_alternative  4)
  {
...
  }
else if (which_alternative = 4)
  {
...
  }
  }
  [(set_attr length 6,6,6,6,10,10,10,10)
   (set_attr conds use)]
)

Thanks,
-Jiangning

Patch:

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index c94e743..df6a3df
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3502,6 +3502,10 @@ try_split (rtx pat, rtx trial, int last)
 split_branch_probability = INTVAL (XEXP (note, 0));
   probability = split_branch_probability;

+  extract_insn (trial);
+  if (!constrain_operands (reload_completed))
+return trial;
+
   seq = split_insns (pat, trial);

   split_branch_probability = -1;





MAINTAINERS: add myself

2011-11-17 Thread Jiangning Liu
Just committed the following:

   * MAINTAINERS (Write After Approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 181466)
+++ MAINTAINERS (working copy)
@@ -419,6 +419,7 @@
 Marc Lehmann   p...@goof.com
 James Lemkejwle...@juniper.net
 Kriang Lerdsuwanakij
lerds...@users.sourceforge.net
+Jiangning Liu  jiangning@arm.com
 Sa Liu sa...@de.ibm.com
 Ralph Loader   r...@ihug.co.nz
 Gabor Loki l...@inf.u-szeged.hu





[PATCH, ARM] Fix stack red zone bug (PR38644)

2011-11-01 Thread Jiangning Liu
Hi,

This patch is to fix PR38644 in ARM back-end. OK for trunk?

For every detail, please refer to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644.

ChangeLog:

2011-11-2  Jiangning Liu  jiangning@arm.com

PR rtl-optimization/38644
* config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier
for epilogue having stack adjustment.

ChangeLog of testsuite:

2011-11-2  Jiangning Liu  jiangning@arm.com

PR rtl-optimization/38644
* gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

Patch:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1ada6f..1f6fc26
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22215,6 +22215,8 @@ thumb1_expand_epilogue (void)
   gcc_assert (amount = 0);
   if (amount)
 {
+  emit_insn (gen_blockage ());
+
   if (amount  512)
emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (amount)));
diff --git a/gcc/testsuite/gcc.target/arm/stack-red-zone.c
b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
new file mode 100644
index 000..b9f0f99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stack-red-zone.c
@@ -0,0 +1,12 @@
+/* No stack red zone.  PR38644.  */
+/* { dg-options -mthumb -O2 } */
+/* { dg-final { scan-assembler ldrb\[^\n\]*\\n\[\t \]*add\[\t \]*sp } }
*/
+
+extern int doStreamReadBlock (int *, char *, int size, int);
+
+char readStream (int *s)
+{
+   char c = 0;
+   doStreamReadBlock (s, c, 1, *s);
+   return c;
+}







RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-31 Thread Jiangning Liu


 -Original Message-
 From: Kai Tietz [mailto:ktiet...@googlemail.com]
 Sent: Thursday, October 27, 2011 5:36 PM
 To: Jiangning Liu
 Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
 Richard Henderson
 Subject: Re: [patch tree-optimization]: Improve handling of
 conditional-branches on targets with high branch costs
 
 2011/10/27 Jiangning Liu jiangning@arm.com:
 
 
  -Original Message-
  From: Michael Matz [mailto:m...@suse.de]
  Sent: Wednesday, October 26, 2011 11:47 PM
  To: Kai Tietz
  Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-
 patc...@gcc.gnu.org;
  Richard Henderson
  Subject: Re: [patch tree-optimization]: Improve handling of
  conditional-branches on targets with high branch costs
 
  Hi,
 
  On Wed, 26 Oct 2011, Kai Tietz wrote:
 
   So you would mean that memory dereferencing shouldn't be
 considered
  as
   side-effect at all?
 
  No.  I haven't said this at all.  Of course it's a side-effect, but
  we're
  allowed to remove existing ones (under some circumstances).  We're
 not
  allowed to introduce new ones, which means that this ...
 
   So we would happily cause by code 'if (i  *i != 0) an crash, as
   memory-dereference has for you no side-effect?
 
  ... is not allowed.  But in the original example the memread was on
 the
  left side, hence occured always, therefore we can move it to the
 right
  side, even though it might occur less often.
 
   In you special case it might be valid that, if first (and C-fold-
  const
   doesn't know if the side-effect condition is really the first, as
 it
   might be a sub-sequence of a condition) condition might trap or
 not,
  to
   combine it.  But branching has to cover the general cases.  If you
  find
   a way to determine that left-hand operand in fold_const's
 branching
  code
   is really the left-most condition in chain, then we can add such a
   special case, but I don't see here an easy way to determine it.
 
  Hmm?  I don't see why it's necessary to check if it's the left-most
  condition in a chain.  If the left hand of '' is a memread it can
  always
  be moved to the right side (or the operator transformed into ''
 which
  can
  have the same effect), of course only if the original rhs is free of
  side
  effects, but then independed if the  was part of a larger
 expression.
  The memread will possibly be done fewer times than originally, but
 as
  said, that's okay.
 
 
  Agree. The point is for the small case I gave RHS doesn't have side
 effect
  at all, so the optimization of changing it to AND doesn't violate C
  specification. We need to recover something for this case, although
 it did
  improve a lot for some particular benchmarks.
 
  Thanks,
  -Jiangning
 
 
  Ciao,
  Michael.
 
 Hmm, so we can allow merging to AND, if the left-hand-side might trap
 but has no-side-effects and rhs has neither trapping nor side-effects.
 As for the case that left-hand side has side-effects but right-hand
 not, we aren't allowed to do this AND/OR merge.  For example 'if ((f =
 foo ()) != 0  f  24)' we aren't allowed to make this
 transformation.
 
 This shouldn't be that hard.  We need to provide to simple_operand_p_2
 an additional argument for checking trapping or not.
 

Would it be OK if I file a tracker in bugzilla against this?

 Regards,
 Kai






RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-27 Thread Jiangning Liu


 -Original Message-
 From: Michael Matz [mailto:m...@suse.de]
 Sent: Wednesday, October 26, 2011 11:47 PM
 To: Kai Tietz
 Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
 Richard Henderson
 Subject: Re: [patch tree-optimization]: Improve handling of
 conditional-branches on targets with high branch costs
 
 Hi,
 
 On Wed, 26 Oct 2011, Kai Tietz wrote:
 
  So you would mean that memory dereferencing shouldn't be considered
 as
  side-effect at all?
 
 No.  I haven't said this at all.  Of course it's a side-effect, but
 we're
 allowed to remove existing ones (under some circumstances).  We're not
 allowed to introduce new ones, which means that this ...
 
  So we would happily cause by code 'if (i  *i != 0) an crash, as
  memory-dereference has for you no side-effect?
 
 ... is not allowed.  But in the original example the memread was on the
 left side, hence occured always, therefore we can move it to the right
 side, even though it might occur less often.
 
  In you special case it might be valid that, if first (and C-fold-
 const
  doesn't know if the side-effect condition is really the first, as it
  might be a sub-sequence of a condition) condition might trap or not,
 to
  combine it.  But branching has to cover the general cases.  If you
 find
  a way to determine that left-hand operand in fold_const's branching
 code
  is really the left-most condition in chain, then we can add such a
  special case, but I don't see here an easy way to determine it.
 
 Hmm?  I don't see why it's necessary to check if it's the left-most
 condition in a chain.  If the left hand of '' is a memread it can
 always
 be moved to the right side (or the operator transformed into '' which
 can
 have the same effect), of course only if the original rhs is free of
 side
 effects, but then independed if the  was part of a larger expression.
 The memread will possibly be done fewer times than originally, but as
 said, that's okay.
 

Agree. The point is for the small case I gave RHS doesn't have side effect
at all, so the optimization of changing it to AND doesn't violate C
specification. We need to recover something for this case, although it did
improve a lot for some particular benchmarks.

Thanks,
-Jiangning

 
 Ciao,
 Michael.






RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-26 Thread Jiangning Liu


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Michael Matz
 Sent: Tuesday, October 11, 2011 10:45 PM
 To: Kai Tietz
 Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard
 Henderson
 Subject: Re: [patch tree-optimization]: Improve handling of
 conditional-branches on targets with high branch costs
 
 Hi,
 
 On Tue, 11 Oct 2011, Kai Tietz wrote:
 
   Better make it a separate function the first tests your new
   conditions, and then calls simple_operand_p.
 
  Well, either I make it a new function and call it instead of
  simple_operand_p,
 
 That's what I meant, yes.
 
   @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
ll_arg, rl_arg),
build_int_cst (TREE_TYPE (ll_arg), 0));
   -
   -  if (LOGICAL_OP_NON_SHORT_CIRCUIT)
   - {
   -   if (code != orig_code || lhs != orig_lhs || rhs !=
 orig_rhs)
   - return build2_loc (loc, code, truth_type, lhs, rhs);
   -   return NULL_TREE;
   - }
  
   Why do you remove this hunk?  Shouldn't you instead move the hunk
 you
   added to fold_truth_andor() here.  I realize this needs some TLC to
   fold_truth_andor_1, because right now it early-outs for non-
 comparisons,
   but it seems the better place.  I.e. somehow move the below code
 into the
   above branch, with the associated diddling on fold_truth_andor_1
 that it
   gets called.
 
  This hunk is removed, as it is vain to do here.
 
 There is a fallthrough now, that wasn't there before.  I don't know if
 it's harmless, I just wanted to mention it.
 

Yes, this part introduced different behavior for this small case,

int f(char *i, int j)
{
if (*i  j!=2)
return *i;
else
return j;
}

Before the fix, we have

  D.4710 = *i;
  D.4711 = D.4710 != 0;
  D.4712 = j != 2;
  D.4713 = D.4711  D.4712;
  if (D.4713 != 0) goto D.4714; else goto D.4715;
  D.4714:
  D.4710 = *i;
  D.4716 = (int) D.4710;
  return D.4716;
  D.4715:
  D.4716 = j;
  return D.4716;

After the fix, we have

  D.4711 = *i;
  if (D.4711 != 0) goto D.4712; else goto D.4710;
  D.4712:
  if (j != 2) goto D.4713; else goto D.4710;
  D.4713:
  D.4711 = *i;
  D.4714 = (int) D.4711;
  return D.4714;
  D.4710:
  D.4714 = j;
  return D.4714;

Does this meet the original expectation? 

I find the code below in function fold_truth_andor makes difference,

  /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B)   into (A OR B).
 For sequence point consistancy, we need to check for trapping,
 and side-effects.  */
  else if (code == icode  simple_operand_p_2 (arg0)
simple_operand_p_2 (arg1))
 return fold_build2_loc (loc, ncode, type, arg0, arg1);

for *i != 0 simple_operand_p(*i) returns false. Originally this is not 
checked by the code. I don't see the patch originally wanted to cover this. Can 
this be explained reasonably?

I'm not arguing this patch did worse thing, but only want to understand the 
rationale behind this and justify this patch doesn't hurt anything else. 
Actually on the contrary, I measured and this change accidently made some 
benchmarks significantly improved due to some other reasons.

Thanks,
-Jiangning

  Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
  better done at a single place in fold_truth_andor only.
 
 As fold_truthop is called twice by fold_truth_andor, the latter might
 indeed be the better place.
 
 
 Ciao,
 Michael.





RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu


 -Original Message-
 From: Richard Henderson [mailto:r...@redhat.com]
 Sent: Saturday, October 01, 2011 3:05 AM
 To: Jiangning Liu
 Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc-
 patc...@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On 09/29/2011 06:13 PM, Jiangning Liu wrote:
 
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Thursday, September 29, 2011 6:14 PM
  To: Jiangning Liu
  Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
  As far as I know different back-ends are implementing different
  prologue/epilogue in GCC. If one day this part can be refined and
  abstracted
  as well, I would say solving this stack-red-zone problem in shared
  prologue/epilogue code would be a perfect solution, and barrier can
  be
  inserted there.
 
  I'm not saying you are wrong on keeping scheduler using a pure
  barrier
  interface. From engineering point of view, I only feel my proposal
 is
  so far
  so good, because this patch at least solve the problem for all
  targets in a
  quite simple way. Maybe it can be improved in future based on this.
 
  But you don't want to listen about any other alternative, other
  backends are
  happy with being able to put the best kind of barrier at the best
 spot
  in the epilogue and don't need a generic solution which won't
 model
  very
  well the target diversity anyway.
 
  Jakub,
 
  Appreciate for your attention on this issue,
 
  1) Can you clarify who are the others back-ends? Does it cover most
 of the
  back-ends being supported by GCC right now?
 
 Your red-stack barrier issue is *exactly* the same as the frame pointer
 barrier issue, which affects many backends.
 
 That is, if the frame pointer is initialized before the local stack
 frame
 is allocated, then one has to add a barrier such that memory references
 based on the frame pointer are not scheduled before the local stack
 frame
 allocation.
 
 One example of this is in the i386 port, where the prologue looks like
 
   push%ebp
   mov %esp, %ebp
   sub $frame, %esp
 
 The rtl we emit for that subtraction looks like
 
 (define_insn pro_epilogue_adjust_stack_mode_add
   [(set (match_operand:P 0 register_operand =r,r)
 (plus:P (match_operand:P 1 register_operand 0,r)
 (match_operand:P 2 nonmemory_operand ri,li)))
(clobber (reg:CC FLAGS_REG))
(clobber (mem:BLK (scratch)))]
 
 Note the final clobber, which is a memory scheduling barrier.
 
 Other targets use similar tricks.  For instance arm stack_tie.
 
 Honestly, I've found nothing convincing throughout this thread that
 suggests to me that this problem should be handled generically.
 

Richard H.,

Thanks for your explanation by giving an example in x86. 

The key is if possible, fixing it in middle end can benefit all ports
directly and avoid bug fixing burden in back-ends, rather than fix this
problem port by port.

Actually now the debating here is whether memory barrier is properly
modeling through whole GCC rather than a single component, because my
current understanding is scheduler is not the only component using memory
barrier.

Thanks,
-Jiangning

 
 r~






RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Friday, September 30, 2011 8:57 PM
 To: Jiangning Liu; Jakub Jelinek; Richard Guenther; Andrew Pinski; gcc-
 patc...@gcc.gnu.org; richard.sandif...@linaro.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Fri, Sep 30, 2011 at 2:46 PM, Richard Sandiford
 richard.sandif...@linaro.org wrote:
  Jiangning Liu jiangning@arm.com writes:
  You seem to feel strongly about this because it's a wrong-code bug
 that
  is very easy to introduce and often very hard to detect.  And I
  defintely
  sympathise with that.  If we were going to to do it in a target-
  independent
  way, though, I think it would be better to scan patterns like
 epilogue
  and
  automatically introduce barriers before assignments to
  stack_pointer_rtx
  (subject to the kind of hook in your patch).  But I still don't
 think
  that's better than leaving the onus on the backend.  The backend is
  still responsible for much more complicated things like determning
  the correct deallocation and register-restore sequence, and for
  determining the correct CFI sequence.
 
 
  I think middle-end in GCC is actually shared code rather than the
 part
  exactly in the middle. A pass working on RTL can be a middle end
 just
  because the code can be shared for all targets, and some passes can
 even
  work for both GIMPLE and RTL.
 
  Actually some optimizations need to work through shared part
 (middle-end)
  plus target specific part (back-end). You are thinking the
 interface
  between this shared part and target specific part should be
 using
  barrier as a properly model. To some extension I agree with this.
 However,
  it doesn't mean the fix should be in back-end rather than middle end,
  because obviously this problem is a common ABI issue for all targets.
 If we
  can abstract this issue to be a shared part, why shouldn't we do it
 in
  middle end to reduce the onus of back-end? Back-end should handle
 the target
  specific things rather than only the complicated things.
 
  And for avoidance of doubt, the automatic barrier insertion that I
  described would be one way of doing it in target-independent code.
  But...
 
  If a complicated problem can be implemented in a shared code
 manner, we
  still want to put it into middle end rather than back-end. I believe
 those
  optimizations based on SSA form are complicated enough, but they are
 all in
  middle end. This is the logic I'm seeing in GCC.
 
  The situation here is different.  The target-independent rtl code is
  being given a blob of instructions that the backend has generated for
  the epilogue.  There's no fine-tuning beyond that.  E.g. we don't
 have
  separate patterns for restore registers, deallocate stack,
 return:
  we just have one monolithic epilogue pattern.  The target-
 independent
  code has very little control.
 
  In contrast, after the tree optimisers have handed off the initial IL,
  the tree optimisers are more or less in full control.  There are very
  few cases where we generate further trees outside the middle-
 end.  The only
  case I know off-hand is the innards of va_start and va_arg, which can
 be
  generated by the backend.
 
  So let's suppose we had a similar situation there, where we wanted
  va_arg do something special in a certain situation.  If we had the
  same three choices of:
 
   1. use an on-the-side hook to represent the special something
   2. scan the code generated by the backend and automatically
      inject the special something at an appropriate place
   3. require each backend to do it properly from the start
 
  (OK, slightly prejudiced wording :-)) I think we'd still choose 3.
 
  For this particular issue, I don't think that hook interface I'm
  proposing is more complicated than the barrier. Instead, it is
 easier
  for back-end implementer to be aware of the potential issue before
  really solving stack red zone problem, because it is very clearly
  listed in target hook list.
 
  The point for model it in the IL supporters like myself is that we
  have both many backends and many rtl passes.  Putting it in a hook
 keeps
  things simple for the backends, but it means that every rtl pass must
 be
  aware of this on-the-side dependency.  Perhaps sched2 really is the
 only
  pass that needs to look at the hook at present.  But perhaps not.
  E.g. dbr_schedule (not a problem on ARM, I realise) also reorders
  instructions, so maybe it would need to be audited to see whether any
  calls to this hook are needed.  And perhaps we'd add more rtl passes
  later.
 
  The point behind using a barrier is that the rtl passes do not then
 need
  to treat the stack-deallocation dependency as a special case.  They
 can
  just use the normal analysis and get it right.
 
  In other words, we're both arguing for safety here.
 
 Indeed.  It's certainly not only scheduling that can move instructions,
 but RTL PRE, combine, ifcvt all can

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-10-09 Thread Jiangning Liu


 -Original Message-
 From: Richard Sandiford richard.sandif...@linaro.org
 Date: Fri, Sep 30, 2011 at 8:46 PM
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 To: Jiangning Liu jiangning@arm.com
 Cc: Jakub Jelinek ja...@redhat.com, Richard Guenther
 richard.guent...@gmail.com, Andrew Pinski pins...@gmail.com,
 gcc-patches@gcc.gnu.org
 
 
 Jiangning Liu jiangning@arm.com writes:
  You seem to feel strongly about this because it's a wrong-code bug
 that
  is very easy to introduce and often very hard to detect.  And I
  defintely
  sympathise with that.  If we were going to to do it in a target-
  independent
  way, though, I think it would be better to scan patterns like
 epilogue
  and
  automatically introduce barriers before assignments to
  stack_pointer_rtx
  (subject to the kind of hook in your patch).  But I still don't
 think
  that's better than leaving the onus on the backend.  The backend is
  still responsible for much more complicated things like determning
  the correct deallocation and register-restore sequence, and for
  determining the correct CFI sequence.
 
 
  I think middle-end in GCC is actually shared code rather than the
 part
  exactly in the middle. A pass working on RTL can be a middle end just
  because the code can be shared for all targets, and some passes can
 even
  work for both GIMPLE and RTL.
 
  Actually some optimizations need to work through shared part
 (middle-end)
  plus target specific part (back-end). You are thinking the
 interface
  between this shared part and target specific part should be using
  barrier as a properly model. To some extension I agree with this.
 However,
  it doesn't mean the fix should be in back-end rather than middle end,
  because obviously this problem is a common ABI issue for all targets.
 If we
  can abstract this issue to be a shared part, why shouldn't we do it
 in
  middle end to reduce the onus of back-end? Back-end should handle the
 target
  specific things rather than only the complicated things.
 
 And for avoidance of doubt, the automatic barrier insertion that I
 described would be one way of doing it in target-independent code.
 But...
 
  If a complicated problem can be implemented in a shared code manner,
 we
  still want to put it into middle end rather than back-end. I believe
 those
  optimizations based on SSA form are complicated enough, but they are
 all in
  middle end. This is the logic I'm seeing in GCC.
 
 The situation here is different.  The target-independent rtl code is
 being given a blob of instructions that the backend has generated for
 the epilogue.  There's no fine-tuning beyond that.  E.g. we don't have
 separate patterns for restore registers, deallocate stack, return:
 we just have one monolithic epilogue pattern.  The target-independent
 code has very little control.
 
 In contrast, after the tree optimisers have handed off the initial IL,
 the tree optimisers are more or less in full control.  There are very
 few cases where we generate further trees outside the middle-end.  The
 only
 case I know off-hand is the innards of va_start and va_arg, which can
 be
 generated by the backend.
 
 So let's suppose we had a similar situation there, where we wanted
 va_arg do something special in a certain situation.  If we had the
 same three choices of:
 
  1. use an on-the-side hook to represent the special something
  2. scan the code generated by the backend and automatically
     inject the special something at an appropriate place
  3. require each backend to do it properly from the start
 
 (OK, slightly prejudiced wording :-)) I think we'd still choose 3.
 

Richard S.,

Although I've ever implemented va_arg for a commercial compiler previously
long times ago, I forgot all the details. :-) I'm not sure if using va_arg
is a good example to compare with this stack red zone case.

  For this particular issue, I don't think that hook interface I'm
  proposing is more complicated than the barrier. Instead, it is easier
  for back-end implementer to be aware of the potential issue before
  really solving stack red zone problem, because it is very clearly
  listed in target hook list.
 
 The point for model it in the IL supporters like myself is that we
 have both many backends and many rtl passes.  Putting it in a hook
 keeps
 things simple for the backends, but it means that every rtl pass must
 be
 aware of this on-the-side dependency.  Perhaps sched2 really is the
 only
 pass that needs to look at the hook at present.  But perhaps not.
 E.g. dbr_schedule (not a problem on ARM, I realise) also reorders
 instructions, so maybe it would need to be audited to see whether any
 calls to this hook are needed.  And perhaps we'd add more rtl passes
 later.

Let me rephrase your justification with my own words.

===

We can't compare adding a new pass and adding a new port, because they are
totally different things. But it implies with my proposal the burden may
still be added

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-30 Thread Jiangning Liu


 -Original Message-
 From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
 Sent: Friday, September 30, 2011 4:15 PM
 To: Jiangning Liu
 Cc: 'Jakub Jelinek'; 'Richard Guenther'; Andrew Pinski; gcc-
 patc...@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 Jiangning Liu jiangning@arm.com writes:
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Thursday, September 29, 2011 6:14 PM
  To: Jiangning Liu
  Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
   As far as I know different back-ends are implementing different
   prologue/epilogue in GCC. If one day this part can be refined and
  abstracted
   as well, I would say solving this stack-red-zone problem in shared
   prologue/epilogue code would be a perfect solution, and barrier
 can
  be
   inserted there.
  
   I'm not saying you are wrong on keeping scheduler using a pure
  barrier
   interface. From engineering point of view, I only feel my proposal
 is
  so far
   so good, because this patch at least solve the problem for all
  targets in a
   quite simple way. Maybe it can be improved in future based on this.
 
  But you don't want to listen about any other alternative, other
  backends are
  happy with being able to put the best kind of barrier at the best
 spot
  in the epilogue and don't need a generic solution which won't
 model
  very
  well the target diversity anyway.
 
  Jakub,
 
  Appreciate for your attention on this issue,
 
  1) Can you clarify who are the others back-ends? Does it cover most
 of the
  back-ends being supported by GCC right now?
 
 Not answering for Jakub of course, but as a maintainer of a backend, I
 know
 MIPS doesn't have the required barrier at the moment.  But that's a bug.
 
 Like others in this thread, I'm strongly of the opinion that this
 should
 be modelled directly in the IL.  And it's already supposed to be
 modelled
 in the IL.  Target-independent code emits the required barriers in
 cases
 where it rather than the backend patterns are responsible.  E.g.:
 
 emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
 emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
 emit_move_insn (hard_frame_pointer_rtx, fp);
 emit_stack_restore (SAVE_NONLOCAL, stack);
 
 from expand_builtin_longjmp() and:
 
   if (sa != 0)
 {
   sa = validize_mem (sa);
   /* These clobbers prevent the scheduler from moving
references to variable arrays below the code
that deletes (pops) the arrays.  */
   emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
   emit_clobber (gen_rtx_MEM (BLKmode, stack_pointer_rtx));
 }
 
 from emit_stack_restore().  Backends that fail to follow suit are IMO
 just buggy.
 
 FWIW, I intend to fix MIPS this weekend.

Richard S.,

Appreciate your attention on this issue and investigation on MIPS target.

Really glad to know you find a potential bug for MIPS through this
discussion. To some extension this proved my hypothesis previously.

 
 You seem to feel strongly about this because it's a wrong-code bug that
 is very easy to introduce and often very hard to detect.  And I
 defintely
 sympathise with that.  If we were going to to do it in a target-
 independent
 way, though, I think it would be better to scan patterns like epilogue
 and
 automatically introduce barriers before assignments to
 stack_pointer_rtx
 (subject to the kind of hook in your patch).  But I still don't think
 that's better than leaving the onus on the backend.  The backend is
 still responsible for much more complicated things like determning
 the correct deallocation and register-restore sequence, and for
 determining the correct CFI sequence.
 

I think middle-end in GCC is actually shared code rather than the part
exactly in the middle. A pass working on RTL can be a middle end just
because the code can be shared for all targets, and some passes can even
work for both GIMPLE and RTL.

Actually some optimizations need to work through shared part (middle-end)
plus target specific part (back-end). You are thinking the interface
between this shared part and target specific part should be using
barrier as a properly model. To some extension I agree with this. However,
it doesn't mean the fix should be in back-end rather than middle end,
because obviously this problem is a common ABI issue for all targets. If we
can abstract this issue to be a shared part, why shouldn't we do it in
middle end to reduce the onus of back-end? Back-end should handle the target
specific things rather than only the complicated things. 

If a complicated problem can be implemented in a shared code manner, we
still want to put it into middle end rather than back-end. I believe those
optimizations based on SSA form are complicated enough

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-29 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Thursday, September 29, 2011 5:03 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Thu, Sep 29, 2011 at 5:13 AM, Jiangning Liu jiangning@arm.com
 wrote:
 
 
  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 5:56 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
  
  
   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Wednesday, September 28, 2011 5:20 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
  
   On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
  jiangning@arm.com
   wrote:
   
   
-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com]
Sent: Wednesday, September 28, 2011 4:39 PM
To: Jiangning Liu
Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
   
On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
   jiangning@arm.com
wrote:


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 27, 2011 3:41 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
jiangning@arm.com
 wrote:
  Think of it this way.  What the IR says is there is no
  barrier
 between
  those moves.  You either have an implicit barrier (which
 is
   what
you
  are proposing) or you have it explicitly.  I think we
 all
   rather
 have
  more things explicit rather than implicit in the
 IR.  And
  that
has
  been the overall feeling for a few years now.
 
 
  Sorry, I'm afraid I can't agree with you. Instead, I
 think
   using
 barrier to describe this kind of dependence is a kind of
  implicit
 method. Please note that this is not an usual data
 dependence
   issue.
 The stack pointer change doesn't have any dependence with
  memory
access
 at all.

 It is similar to atomic instructions that require being an
 optimization/memory barrier.  Sure it is not a usual data
   dependence
 (otherwise we would handle
 it already), so the targets have to explicitly express the
dependence
 somehow, for which we only have barriers right now.


 Richard,

 Thanks for your explanation. It's explicit to back-end,
 while
  it's
implicit
 to scheduler in middle end, because barrier can decide
  dependence
   in
 scheduler but barrier can be generated from several
 different
scenarios.
 It's unsafe and prone to introduce bug if any one of the
  scenarios
requiring
 generating barriers is being missed in back-end.

 Between middle-end and back-end, we should have interfaces
 that
  is
easy to
 be implemented by back-end. After all, middle-end itself
 can't
consist of a
 compiler, and vice versa. Back-end needs middle-end's help
 to
  make
sure
 back-end is easy to be implemented and reduce the
 possibility
  of
introducing
 bugs.

 Without an explicit hook as I'm proposing, back-end
  implementers
   have
to
 clearly know all scenarios of generating barrier very
 clearly,
because there
 isn't any code tips and comments in middle end telling back-
 end
   the
list of
 all scenarios on generating barriers.

 Yes, barrier is a perfect interface for scheduler in theory.
  But
   from
 engineering point of view, I think it's better to explicitly
   define
an
 interface to describe stack red zone and inform back-end, or
  vice
versa. Not
 like computer, people is easy to make mistake if you don't
 tell
   them.
On
 this bug, the fact is over the years different back-ends
 made
   similar
bugs.

 GCC is really a perfect platform on building new ports, and
 I
  saw
   a
lot of
 new back-ends. The current middle end is unsafe, if port
  doesn't
support
 stack red zone and back-ends doesn't generate barrier for it.
  Why
can't we
 explicitly clarify this in compiler code between middle end
 and
   back
end?
 What if any other back-end (new or old) NOT supporting stack
  red
   zone
 exposing the similar bug again?
   
There are gazillion things you have to explicitly get right in
  your
backends,
so I don't see why exposing proper scheduling barriers should
 be
special

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-29 Thread Jiangning Liu


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Thursday, September 29, 2011 6:14 PM
 To: Jiangning Liu
 Cc: 'Richard Guenther'; Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Thu, Sep 29, 2011 at 06:08:50PM +0800, Jiangning Liu wrote:
  As far as I know different back-ends are implementing different
  prologue/epilogue in GCC. If one day this part can be refined and
 abstracted
  as well, I would say solving this stack-red-zone problem in shared
  prologue/epilogue code would be a perfect solution, and barrier can
 be
  inserted there.
 
  I'm not saying you are wrong on keeping scheduler using a pure
 barrier
  interface. From engineering point of view, I only feel my proposal is
 so far
  so good, because this patch at least solve the problem for all
 targets in a
  quite simple way. Maybe it can be improved in future based on this.
 
 But you don't want to listen about any other alternative, other
 backends are
 happy with being able to put the best kind of barrier at the best spot
 in the epilogue and don't need a generic solution which won't model
 very
 well the target diversity anyway.

Jakub,

Appreciate for your attention on this issue,

1) Can you clarify who are the others back-ends? Does it cover most of the
back-ends being supported by GCC right now?
2) You need give data to prove other back-ends are happy with current
solution. The fact is over the years there are a bunch of bugs filed against
this problem. WHY? At this point you are implying other back-ends are
happy with bugs or potential bugs? This is wired to me. Also, this is not a
issue whether a back-end is able to implement barrier or not. If you are
really asking able or not, I would say every back-end is able, but it
doesn't mean able is correct and it doesn't mean able is the most
reasonable.

Comparing with the one I am proposing, I don't see the current solution has
other significant advantages in addition to the proper modeling for
scheduler you are arguing. Instead, the solution I'm proposing is a safe
solution, and a solution easy to avoid bugs. If GCC compiler infrastructure
can't even give a safe compilation, why should we insist on the proper
modeling for scheduler only?

Hopefully you can consider again about this.

-Jiangning

 
   Jakub






RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu
  -static inline bool
  +extern bool
 
 static bool
 
   ix86_using_red_zone (void)
   {
    return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
  @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
   #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
   #endif
 

...

 
  +/* Return true if the ABI allows red zone access.  */
  +extern bool
 
 static bool
 
  +rs6000_stack_using_red_zone (void)
  +{
  +   return (DEFAULT_ABI != ABI_V4);
  +}
  +
 

Uros,

Thanks for your review. Accept and new patch is as below. No change for
ChangeLog.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..e200974 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,7 +2631,7 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
+static bool
 ix86_using_red_zone (void)
 {
   return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..1e64d14 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+static bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset  (DEFAULT_ABI == ABI_V4
+  return offset  (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -11223,6 +11223,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@hook TARGET_STACK_USING_RED_ZONE
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @hook TARGET_CONST_ANCHOR
 On some architectures it can take multiple 

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu
Just realized ChangeLog needs to be changed as well.

ChangeLog:

* config/i386/i386.c (ix86_using_red_zone): Remove inline.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: Wednesday, September 28, 2011 2:24 PM
 To: 'Uros Bizjak'
 Cc: gcc-patches@gcc.gnu.org; j...@suse.cz; geo...@geoffk.org;
 dje@gmail.com; r...@redhat.com; Richard Earnshaw; Matthew Gretton-
 Dann
 Subject: RE: [PATCH] Fix stack red zone bug (PR38644)
 
   -static inline bool
   +extern bool
 
  static bool
 
    ix86_using_red_zone (void)
    {
     return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
   @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
    #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
    #endif
  
 
 ...
 
  
   +/* Return true if the ABI allows red zone access.  */
   +extern bool
 
  static bool
 
   +rs6000_stack_using_red_zone (void)
   +{
   +   return (DEFAULT_ABI != ABI_V4);
   +}
   +
 
 
 Uros,
 
 Thanks for your review. Accept and new patch is as below. No change for
 ChangeLog.
 
 Thanks,
 -Jiangning
 
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index ff8c49f..e200974 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2631,7 +2631,7 @@ static const char *const
 cpu_names[TARGET_CPU_DEFAULT_max] =
 
 
  /* Return true if a red-zone is in use.  */
 
 -static inline bool
 +static bool
  ix86_using_red_zone (void)
  {
return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
 @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
  #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
  #endif
 
 +#undef TARGET_STACK_USING_RED_ZONE
 +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
 +
  #undef TARGET_FUNCTION_VALUE
  #define TARGET_FUNCTION_VALUE ix86_function_value
 
 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index 1ab57e5..1e64d14 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -1537,6 +1537,9 @@ static const struct attribute_spec
 rs6000_attribute_table[] =
  #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
  #endif
 
 +#undef TARGET_STACK_USING_RED_ZONE
 +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
 +
  /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
 The PowerPC architecture requires only weak consistency among
 processors--that is, memory accesses between processors need not be
 @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
 using_mfcr_multiple)
   }
  }
 
 +/* Return true if the ABI allows red zone access.  */
 +static bool
 +rs6000_stack_using_red_zone (void)
 +{
 +   return (DEFAULT_ABI != ABI_V4);
 +}
 +
  /* Return true if OFFSET from stack pointer can be clobbered by
 signals.
 V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
 below stack pointer not cloberred by signals.  */
 @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
 using_mfcr_multiple)
  static inline bool
  offset_below_red_zone_p (HOST_WIDE_INT offset)
  {
 -  return offset  (DEFAULT_ABI == ABI_V4
 +  return offset  (!TARGET_STACK_USING_RED_ZONE
  ? 0
  : TARGET_32BIT ? -220 : -288);
  }
 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
 index 335c1d1..c848806 100644
 --- a/gcc/doc/tm.texi
 +++ b/gcc/doc/tm.texi
 @@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should
 return
 true in general, but
  false for naked functions.  The default implementation always returns
 true.
  @end deftypefn
 
 +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
 +This hook returns true if the target has a red zone (an area beyond
 the
 +current extent of the stack that cannot be modified by asynchronous
 events
 +on the processor).
 +
 +If this hook returns false then the compiler mid-end will not move an
 access
 +to memory in the stack frame past a stack adjustment insn.
 +
 +If this hook returns true then the compiler mid-end will assume that
 it is
 +safe to move an access to memory in the stack frame past a stack
 adjustment
 +insn.  The target back-end must emit scheduling barrier insns when
 this is
 +unsafe.
 +
 +The default

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 4:39 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com
 wrote:
 
 
  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 27, 2011 3:41 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
   Think of it this way.  What the IR says is there is no barrier
  between
   those moves.  You either have an implicit barrier (which is what
 you
   are proposing) or you have it explicitly.  I think we all rather
  have
   more things explicit rather than implicit in the IR.  And that
 has
   been the overall feeling for a few years now.
  
  
   Sorry, I'm afraid I can't agree with you. Instead, I think using
  barrier to describe this kind of dependence is a kind of implicit
  method. Please note that this is not an usual data dependence issue.
  The stack pointer change doesn't have any dependence with memory
 access
  at all.
 
  It is similar to atomic instructions that require being an
  optimization/memory barrier.  Sure it is not a usual data dependence
  (otherwise we would handle
  it already), so the targets have to explicitly express the
 dependence
  somehow, for which we only have barriers right now.
 
 
  Richard,
 
  Thanks for your explanation. It's explicit to back-end, while it's
 implicit
  to scheduler in middle end, because barrier can decide dependence in
  scheduler but barrier can be generated from several different
 scenarios.
  It's unsafe and prone to introduce bug if any one of the scenarios
 requiring
  generating barriers is being missed in back-end.
 
  Between middle-end and back-end, we should have interfaces that is
 easy to
  be implemented by back-end. After all, middle-end itself can't
 consist of a
  compiler, and vice versa. Back-end needs middle-end's help to make
 sure
  back-end is easy to be implemented and reduce the possibility of
 introducing
  bugs.
 
  Without an explicit hook as I'm proposing, back-end implementers have
 to
  clearly know all scenarios of generating barrier very clearly,
 because there
  isn't any code tips and comments in middle end telling back-end the
 list of
  all scenarios on generating barriers.
 
  Yes, barrier is a perfect interface for scheduler in theory. But from
  engineering point of view, I think it's better to explicitly define
 an
  interface to describe stack red zone and inform back-end, or vice
 versa. Not
  like computer, people is easy to make mistake if you don't tell them.
 On
  this bug, the fact is over the years different back-ends made similar
 bugs.
 
  GCC is really a perfect platform on building new ports, and I saw a
 lot of
  new back-ends. The current middle end is unsafe, if port doesn't
 support
  stack red zone and back-ends doesn't generate barrier for it. Why
 can't we
  explicitly clarify this in compiler code between middle end and back
 end?
  What if any other back-end (new or old) NOT supporting stack red zone
  exposing the similar bug again?
 
 There are gazillion things you have to explicitly get right in your
 backends,
 so I don't see why exposing proper scheduling barriers should be
 special,
 and there, why red-zones should be special (as opposed to other
 occasions
 where you need to emit barriers from the backend for the scheduler).
 

Richard,

This is because,

1) Current scheduler is unsafe if back-end doesn't generate barrier for a
port which doesn't support stack red zone at all.
2) Implementing barrier in back-end is a burden to new back-end
implementation for ports not supporting stack red zone at all. 
3) There are many other ports not reporting bugs around this. I doubt there
isn't bug for them. 
4) There are over 300 TARGET HOOKS being defined in target.def. I don't
think adding this interface is hurting GCC.

BTW, really appreciate your close attention to this specific issue.

Thanks,
-Jiangning

 Richard.
 
  Thanks,
  -Jiangning
 
  Richard.
 
   No matter what solution itself is, the problem itself is a quite a
  common one on ISA level, so we should solve it in middle-end,
 because
  middle-end is shared for all ports.
  
   My proposal avoids problems in future. Any new ports and new back-
 end
  implementations needn't explicitly define this hook in a very safe
 way.
  But if one port wants to unsafely introduce red zone, we can
  explicitly define this hook in back-end. This way, we would avoid
 the
  bug in the earliest time. Do you really want to hide this problem in
  back-end silently? And wait others to report bug and then waste time
 to
  get it fixed again?
  
   The facts I see is over the years

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 5:20 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com
 wrote:
 
 
  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 4:39 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
  
  
   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Tuesday, September 27, 2011 3:41 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
  
   On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
  jiangning@arm.com
   wrote:
Think of it this way.  What the IR says is there is no barrier
   between
those moves.  You either have an implicit barrier (which is
 what
  you
are proposing) or you have it explicitly.  I think we all
 rather
   have
more things explicit rather than implicit in the IR.  And that
  has
been the overall feeling for a few years now.
   
   
Sorry, I'm afraid I can't agree with you. Instead, I think
 using
   barrier to describe this kind of dependence is a kind of implicit
   method. Please note that this is not an usual data dependence
 issue.
   The stack pointer change doesn't have any dependence with memory
  access
   at all.
  
   It is similar to atomic instructions that require being an
   optimization/memory barrier.  Sure it is not a usual data
 dependence
   (otherwise we would handle
   it already), so the targets have to explicitly express the
  dependence
   somehow, for which we only have barriers right now.
  
  
   Richard,
  
   Thanks for your explanation. It's explicit to back-end, while it's
  implicit
   to scheduler in middle end, because barrier can decide dependence
 in
   scheduler but barrier can be generated from several different
  scenarios.
   It's unsafe and prone to introduce bug if any one of the scenarios
  requiring
   generating barriers is being missed in back-end.
  
   Between middle-end and back-end, we should have interfaces that is
  easy to
   be implemented by back-end. After all, middle-end itself can't
  consist of a
   compiler, and vice versa. Back-end needs middle-end's help to make
  sure
   back-end is easy to be implemented and reduce the possibility of
  introducing
   bugs.
  
   Without an explicit hook as I'm proposing, back-end implementers
 have
  to
   clearly know all scenarios of generating barrier very clearly,
  because there
   isn't any code tips and comments in middle end telling back-end
 the
  list of
   all scenarios on generating barriers.
  
   Yes, barrier is a perfect interface for scheduler in theory. But
 from
   engineering point of view, I think it's better to explicitly
 define
  an
   interface to describe stack red zone and inform back-end, or vice
  versa. Not
   like computer, people is easy to make mistake if you don't tell
 them.
  On
   this bug, the fact is over the years different back-ends made
 similar
  bugs.
  
   GCC is really a perfect platform on building new ports, and I saw
 a
  lot of
   new back-ends. The current middle end is unsafe, if port doesn't
  support
   stack red zone and back-ends doesn't generate barrier for it. Why
  can't we
   explicitly clarify this in compiler code between middle end and
 back
  end?
   What if any other back-end (new or old) NOT supporting stack red
 zone
   exposing the similar bug again?
 
  There are gazillion things you have to explicitly get right in your
  backends,
  so I don't see why exposing proper scheduling barriers should be
  special,
  and there, why red-zones should be special (as opposed to other
  occasions
  where you need to emit barriers from the backend for the scheduler).
 
 
  Richard,
 
  This is because,
 
  1) Current scheduler is unsafe if back-end doesn't generate barrier
 for a
  port which doesn't support stack red zone at all.
  2) Implementing barrier in back-end is a burden to new back-end
  implementation for ports not supporting stack red zone at all.
  3) There are many other ports not reporting bugs around this. I doubt
 there
  isn't bug for them.
  4) There are over 300 TARGET HOOKS being defined in target.def. I
 don't
  think adding this interface is hurting GCC.
 
 I don't argue that your solution is not acceptable, just your reasoning
 is bogus IMHO. 
 1) is a target bug, 

Why should back-ends handle a thing that doesn't exist at all for them? I
don't see clear logic here. 

 2) huh, I doubt that this is the
 biggest issue
 one faces when implementing a new target, 

I never say

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 5:56 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu jiangning@arm.com
 wrote:
 
 
  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 5:20 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
  On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
  
  
   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Wednesday, September 28, 2011 4:39 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
  
   On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
  jiangning@arm.com
   wrote:
   
   
-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com]
Sent: Tuesday, September 27, 2011 3:41 PM
To: Jiangning Liu
Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
   
On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
   jiangning@arm.com
wrote:
 Think of it this way.  What the IR says is there is no
 barrier
between
 those moves.  You either have an implicit barrier (which is
  what
   you
 are proposing) or you have it explicitly.  I think we all
  rather
have
 more things explicit rather than implicit in the IR.  And
 that
   has
 been the overall feeling for a few years now.


 Sorry, I'm afraid I can't agree with you. Instead, I think
  using
barrier to describe this kind of dependence is a kind of
 implicit
method. Please note that this is not an usual data dependence
  issue.
The stack pointer change doesn't have any dependence with
 memory
   access
at all.
   
It is similar to atomic instructions that require being an
optimization/memory barrier.  Sure it is not a usual data
  dependence
(otherwise we would handle
it already), so the targets have to explicitly express the
   dependence
somehow, for which we only have barriers right now.
   
   
Richard,
   
Thanks for your explanation. It's explicit to back-end, while
 it's
   implicit
to scheduler in middle end, because barrier can decide
 dependence
  in
scheduler but barrier can be generated from several different
   scenarios.
It's unsafe and prone to introduce bug if any one of the
 scenarios
   requiring
generating barriers is being missed in back-end.
   
Between middle-end and back-end, we should have interfaces that
 is
   easy to
be implemented by back-end. After all, middle-end itself can't
   consist of a
compiler, and vice versa. Back-end needs middle-end's help to
 make
   sure
back-end is easy to be implemented and reduce the possibility
 of
   introducing
bugs.
   
Without an explicit hook as I'm proposing, back-end
 implementers
  have
   to
clearly know all scenarios of generating barrier very clearly,
   because there
isn't any code tips and comments in middle end telling back-end
  the
   list of
all scenarios on generating barriers.
   
Yes, barrier is a perfect interface for scheduler in theory.
 But
  from
engineering point of view, I think it's better to explicitly
  define
   an
interface to describe stack red zone and inform back-end, or
 vice
   versa. Not
like computer, people is easy to make mistake if you don't tell
  them.
   On
this bug, the fact is over the years different back-ends made
  similar
   bugs.
   
GCC is really a perfect platform on building new ports, and I
 saw
  a
   lot of
new back-ends. The current middle end is unsafe, if port
 doesn't
   support
stack red zone and back-ends doesn't generate barrier for it.
 Why
   can't we
explicitly clarify this in compiler code between middle end and
  back
   end?
What if any other back-end (new or old) NOT supporting stack
 red
  zone
exposing the similar bug again?
  
   There are gazillion things you have to explicitly get right in
 your
   backends,
   so I don't see why exposing proper scheduling barriers should be
   special,
   and there, why red-zones should be special (as opposed to other
   occasions
   where you need to emit barriers from the backend for the
 scheduler).
  
  
   Richard,
  
   This is because,
  
   1) Current scheduler is unsafe if back-end doesn't generate
 barrier
  for a
   port which doesn't support stack red zone at all.
   2) Implementing barrier in back-end is a burden to new back-end
   implementation for ports not supporting stack red zone at all.
   3) There are many other ports not reporting bugs around

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-27 Thread Jiangning Liu


 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 27, 2011 3:41 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com
 wrote:
  Think of it this way.  What the IR says is there is no barrier
 between
  those moves.  You either have an implicit barrier (which is what you
  are proposing) or you have it explicitly.  I think we all rather
 have
  more things explicit rather than implicit in the IR.  And that has
  been the overall feeling for a few years now.
 
 
  Sorry, I'm afraid I can't agree with you. Instead, I think using
 barrier to describe this kind of dependence is a kind of implicit
 method. Please note that this is not an usual data dependence issue.
 The stack pointer change doesn't have any dependence with memory access
 at all.
 
 It is similar to atomic instructions that require being an
 optimization/memory barrier.  Sure it is not a usual data dependence
 (otherwise we would handle
 it already), so the targets have to explicitly express the dependence
 somehow, for which we only have barriers right now.
 

Richard,

Thanks for your explanation. It's explicit to back-end, while it's implicit
to scheduler in middle end, because barrier can decide dependence in
scheduler but barrier can be generated from several different scenarios.
It's unsafe and prone to introduce bug if any one of the scenarios requiring
generating barriers is being missed in back-end.

Between middle-end and back-end, we should have interfaces that is easy to
be implemented by back-end. After all, middle-end itself can't consist of a
compiler, and vice versa. Back-end needs middle-end's help to make sure
back-end is easy to be implemented and reduce the possibility of introducing
bugs.

Without an explicit hook as I'm proposing, back-end implementers have to
clearly know all scenarios of generating barrier very clearly, because there
isn't any code tips and comments in middle end telling back-end the list of
all scenarios on generating barriers. 

Yes, barrier is a perfect interface for scheduler in theory. But from
engineering point of view, I think it's better to explicitly define an
interface to describe stack red zone and inform back-end, or vice versa. Not
like computer, people is easy to make mistake if you don't tell them. On
this bug, the fact is over the years different back-ends made similar bugs.

GCC is really a perfect platform on building new ports, and I saw a lot of
new back-ends. The current middle end is unsafe, if port doesn't support
stack red zone and back-ends doesn't generate barrier for it. Why can't we
explicitly clarify this in compiler code between middle end and back end?
What if any other back-end (new or old) NOT supporting stack red zone
exposing the similar bug again?

Thanks,
-Jiangning

 Richard.
 
  No matter what solution itself is, the problem itself is a quite a
 common one on ISA level, so we should solve it in middle-end, because
 middle-end is shared for all ports.
 
  My proposal avoids problems in future. Any new ports and new back-end
 implementations needn't explicitly define this hook in a very safe way.
 But if one port wants to unsafely introduce red zone, we can
 explicitly define this hook in back-end. This way, we would avoid the
 bug in the earliest time. Do you really want to hide this problem in
 back-end silently? And wait others to report bug and then waste time to
 get it fixed again?
 
  The facts I see is over the years, for different ports reported the
 similar issue around this, and somebody tried to fix them. Actually,
 this kind of problem should be fixed in design level. If the first
 people solve this bug can give solution in middle end, we would not
 need to waste time on filing those bugs in bug zilla and fixing them
 around this problem.
 
  Thanks,
  -Jiangning
 
 
 
 
 
 
 
 






RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-26 Thread Jiangning Liu
 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Saturday, September 24, 2011 6:12 PM
 To: Jiangning Liu
 Cc: Mike Stump; gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
 Subject: Re: [PATCH, testsuite] Add loop unrolling command line options
 for some test cases
 
 On Thu, Sep 22, 2011 at 6:22 AM, Jiangning Liu jiangning@arm.com
 wrote:
  Hi Mike,
 
  OK. I will wait 24 more hours. If no objections by then, I will get
 it
  checked into trunk.
 
 I don't think you need -funroll-loops though.
 

The intention of those test cases are not hurt if -funroll-loops is added,
right?

  Thanks,
  -Jiangning
 
  -Original Message-
  From: Mike Stump [mailto:mikest...@comcast.net]
  Sent: Thursday, September 22, 2011 3:10 AM
  To: Jiangning Liu
  Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
  Subject: Re: [PATCH, testsuite] Add loop unrolling command line
 options
  for some test cases
 
  On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote:
   The fix is to explicitly turn on loop unroll and set max-unroll-
 times
  to 8,
   which is larger than the unrolling times being detected in the
 cases.
 
  Sounds reasonable to me.  Ok, though, do watch for any comments by
  people that know more than I.
 
 
 
 
 






[PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu
This patch is fix PR38644, a 3-year-old bug. 

From the discussions in mail list and bugzilla, I think the middle end fix
is a common view. Although there are stills some gaps on how to fix it in
middle end, I think this patch at least moves the problem from back-end to
middle-end, which makes GCC infrastructure stronger than before. Otherwise,
any back-end needs to find and fix this problem by itself.

Since this bug was pinged many times by customers, I think at least we can
move on with this patch. If necessary, we can then give follow-up to build a
even better solution in middle end later on.

The patch is tested on x86 and ARM.

ChangeLog:

* config/i386/i386.c (ix86_stack_using_red_zone): Change inline
to be extern.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..ce486da 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,8 +2631,8 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
-ix86_using_red_zone (void)
+extern bool
+ix86_stack_using_red_zone (void)
 {
   return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
 }
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_stack_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..411cb09 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+extern bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset  (DEFAULT_ABI == ABI_V4
+  return offset  (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ 

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu


 -Original Message-
 From: Andrew Pinski [mailto:pins...@gmail.com]
 Sent: Tuesday, September 27, 2011 5:31 AM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)
 
 On Mon, Sep 26, 2011 at 3:26 AM, Jiangning Liu jiangning@arm.com
 wrote:
  This patch is fix PR38644, a 3-year-old bug.
 
  From the discussions in mail list and bugzilla, I think the middle
 end fix
  is a common view. Although there are stills some gaps on how to fix
 it in
  middle end, I think this patch at least moves the problem from back-
 end to
  middle-end, which makes GCC infrastructure stronger than before.
 Otherwise,
  any back-end needs to find and fix this problem by itself.
 
 I don't see why you think that is the common view that the fix should
 be in the middle-end.  I and a few others think the back-end should be
 where the barrier emitted from.  The middle-end should not have stack
 accesses as being special in anyway when it comes down to scheduling.
 What happens when a new scheduler comes around?  Then this has to be
 fixed again in that new scheduler rather than having the barrier doing
 the correct thing for all schedulers.
 

Hi Andrew,

Thanks for your kindly response!

Sorry, for this bug, I don’t see your valuable comments previously in either 
bug zilla and the [RFC] I sent out long time ago in gcc mail list. What I see 
is a bunch of people agreed this problem should be fixed in middle end.

For example, 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c48
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c49

My comments,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644#c35

I'd like to repeat my points here.

As I mentioned, it shouldn't be back-end's responsibility to find this 
problem, i.e. the instructions move over stack pointer change. ISAs can be 
split into two classes. One class doesn't support stack red zone, and the other 
class does support stack red zone. If we agree this is a general issue, I think 
this issue should be solved in middle end rather than in back-end.

Yes. Back-end can provide barrier to explicitly represent the dependence, but I 
think this is a kind of workaround, because this way we are hoping back-end to 
detect this kind of dependence, while this kind of dependence is common for 
every back-end which doesn't support stack red zone. If we carefully analyze 
the implementation in x86 and powerpc back-ends supporting stack red zone, we 
may find, they are doing almost the same things.

In particular, I think the retarget-ability is the most valuable treasure for 
GCC. Thinking of implementing a new port, do we want to write new code to 
find this problem, no matter whether the new port supports stack red zone or 
not? If the answer is YES, it means we need to write the similar code like what 
currently x86-64 and powerpc back-ends are doing. Obviously, it is a 
non-trivial work. This way I don't think it's good for GCC's retarget-ability. 
Why don't we abstract the common thing in middle-end with a very clean 
interface?

Finally, It's OK for me if you say scheduler can only tread dependence as a 
clean interface. But this point doesn't support stack red zone issue must be 
handled in different back-ends respectively. If we want to keep scheduler clean 
enough, a simpler solution is adding a pass in middle-end to generate barrier 
before scheduler and this pass handle the general stack-red-zone issue using 
the interface I'm defining in the patch, but obviously this is a kind of 
over-design so far. 

Thanks,
-Jiangning

 Thanks,
 Andrew Pinski






RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu
Fix a typo and CC x86/rs6000/arm ports maintainers.

ChangeLog:

* config/i386/i386.c (ix86_stack_using_red_zone): Change inline
to be extern.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..1c16391 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,7 +2631,7 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
+extern bool
 ix86_using_red_zone (void)
 {
   return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..411cb09 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+extern bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset  (DEFAULT_ABI == ABI_V4
+  return offset  (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -11223,6 +11223,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@hook TARGET_STACK_USING_RED_ZONE
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it 

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-26 Thread Jiangning Liu
 Think of it this way.  What the IR says is there is no barrier between
 those moves.  You either have an implicit barrier (which is what you
 are proposing) or you have it explicitly.  I think we all rather have
 more things explicit rather than implicit in the IR.  And that has
 been the overall feeling for a few years now.
 

Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to 
describe this kind of dependence is a kind of implicit method. Please note that 
this is not an usual data dependence issue. The stack pointer change doesn't 
have any dependence with memory access at all.

No matter what solution itself is, the problem itself is a quite a common one 
on ISA level, so we should solve it in middle-end, because middle-end is shared 
for all ports. 

My proposal avoids problems in future. Any new ports and new back-end 
implementations needn't explicitly define this hook in a very safe way. But if 
one port wants to unsafely introduce red zone, we can explicitly define this 
hook in back-end. This way, we would avoid the bug in the earliest time. Do you 
really want to hide this problem in back-end silently? And wait others to 
report bug and then waste time to get it fixed again?

The facts I see is over the years, for different ports reported the similar 
issue around this, and somebody tried to fix them. Actually, this kind of 
problem should be fixed in design level. If the first people solve this bug can 
give solution in middle end, we would not need to waste time on filing those 
bugs in bug zilla and fixing them around this problem.

Thanks,
-Jiangning









[PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-21 Thread Jiangning Liu
Hi,

The following test cases are to check predictive commoning optimization by
detecting loop unrolling times. Originally by default the max-unroll-times
is 8. If max-unroll-times is specified to be less than the expected
unrolling times, those cases would fail. 

The fix is to explicitly turn on loop unroll and set max-unroll-times to 8,
which is larger than the unrolling times being detected in the cases.

ChangeLog:

2011-09-14  Jiangning Liu  jiangning@arm.com

* gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c: Explicitly turn on 
loop unroll and set max unroll times to 8.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c: Likewise.
* gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c: Likewise.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
index 16bd5c9..f1e52e5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options -O2 -fpredictive-commoning -fdump-tree-pcom-details } */
+/* { dg-options -O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
index 7275f28..27e53ee 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options -O2 -fpredictive-commoning -fdump-tree-pcom-details } */
+/* { dg-options -O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
index d500234..5dfe384 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -fpredictive-commoning -fdump-tree-pcom-details } */
+/* { dg-options -O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details } */
 
 int a[1000], b[1000];
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
index 6f06b7f..c29a46a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options -O2 -fpredictive-commoning -fdump-tree-pcom-details } */
+/* { dg-options -O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details } */
 
 /* Test for predictive commoning of expressions, without reassociation.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
index 134fc37..29444ab 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-do run } */
-/* { dg-options -O2 -fpredictive-commoning -fdump-tree-pcom-details } */
+/* { dg-options -O2 -funroll-loops --param max-unroll-times=8
-fpredictive-commoning -fdump-tree-pcom-details } */
 
 /* Test for predictive commoning of expressions, with reassociation.  */






RE: [PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-21 Thread Jiangning Liu
Hi Mike,

OK. I will wait 24 more hours. If no objections by then, I will get it
checked into trunk.

Thanks,
-Jiangning

 -Original Message-
 From: Mike Stump [mailto:mikest...@comcast.net]
 Sent: Thursday, September 22, 2011 3:10 AM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
 Subject: Re: [PATCH, testsuite] Add loop unrolling command line options
 for some test cases
 
 On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote:
  The fix is to explicitly turn on loop unroll and set max-unroll-times
 to 8,
  which is larger than the unrolling times being detected in the cases.
 
 Sounds reasonable to me.  Ok, though, do watch for any comments by
 people that know more than I.






[PATCH, testsuite, ARM] Change to expected failure for g++.dg/abi/local1.C on ARM target

2011-09-13 Thread Jiangning Liu
Here comes a patch to change the case g++.dg/abi/local1.C to be expected
failure for ARM target.

In C++ ABI for the ARM architecture, it says,



This runs contrary to §2.9.1 of [GC++ABI] which states:

It is intended that two type_info pointers point to equivalent type
descriptions if and only if the pointers are equal. An implementation must
satisfy this constraint, e.g. by using symbol preemption, COMDAT sections,
or other mechanisms.

Fortunately, we can ignore this requirement without violating the C++
standard provided that:

* type_info::operator== and type_info::operator!= compare the strings
returned by type_info::name(), not just the pointers to the RTTI objects and
their names.

* No reliance is placed on the address returned by type_info::name(). (That
is, t1.name() != t2.name() does not imply that t1 != t2).



The patch is,

diff --git a/gcc/testsuite/g++.dg/abi/local1.C
b/gcc/testsuite/g++.dg/abi/local1.C
index 518193c..7f08a8f 100644
--- a/gcc/testsuite/g++.dg/abi/local1.C
+++ b/gcc/testsuite/g++.dg/abi/local1.C
@@ -1,4 +1,4 @@
-// { dg-do run }
+// { dg-do run { xfail { arm*-*-eabi* } } }
 // { dg-additional-sources local1-a.cc }
 
 #include typeinfo

ChangeLog:

2011-09-14  Jiangning Liu  jiangning@arm.com

* g++.dg/abi/local1.C: Change to XFAIL for ARM EABI target.

Thanks,
-Jiangning





[PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom

2011-08-26 Thread Jiangning Liu
Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard
Guenther rguent...@suse.de fixed something in tree-ssa-dom. The
link_error should be optimized away for ARM targets as well.

The patch is:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
index 45e44a1..470b585 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
@@ -33,5 +33,5 @@ void test55 (int x, int y)
that the  should be emitted (based on BRANCH_COST).  Fix this
by teaching dom to look through  and register all components
as true.  */
-/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { !
alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-*
mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { !
alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-*
mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } }
} */
 /* { dg-final { cleanup-tree-dump optimized } } */

gcc/testsuite/ChangeLog:

2011-08-26  Jiangning Liu  jiangning@arm.com

   PR tree-optimization/46021
   * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*.

Thanks,
-Jiangning





RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-17 Thread Jiangning Liu
Attached is the new patch file. Review please!

ChangeLog:

2011-08-18  Jiangning Liu  jiangning@arm.com

* config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
(*ior_scc_scc_cmp): Likewise
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
(*and_scc_scc_nodom): Likewise.
(*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

Testsuite Changelog would be:

2011-08-18  Jiangning Liu  jiangning@arm.com

* gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional 
compare can be generated.
* gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.

Regression test against cortex-M0/M3/M4 profile with -mthumb option
doesn't show any new failures.

Thanks,
-Jiangning

fix_cond_cmp_5.patch
Description: Binary data


RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-11 Thread Jiangning Liu
Ramana,

I only see the following three combinations are meaningful, 

* cmp l, lPy // keep cmp, and length is 2, on thumb
* cmp r, rI  // keep cmp, and length is 4
* cmp r, L   // convert to cmn, and length is 4

According to ARM ARM, for negative immediate, all encodings for cmp/cmn are
4-byte long, i.e.

* CMP: encoding T2 and A1
* CMN: encoding T1 and A1

so we needn't to make difference to cover Pw and Pv.

 Length is 8 bytes if you have Pw and L

For this cases, the length should be 10 for Thumb2 instead. 

Finally, if we want to describe all possibilities in constraints, we would
have the followings 9 combinations,

* cmp1 has operands

(l,  l,  l,  r, r, r, r, r, r)
(lPy,lPy,lPy,rI,rI,rI,L, L, L)

* cmp2 has operands

(l,  r, r, l,  r, r, l,  r, r)
(lPy,rI,L, lPy,rI,L, lPy,rI,L)

So the length would be like below, (I don't know how to write it in
attribute section yet. )

if (TARGET_THUMB2) {
(set_attr length 6,8,8,8,10,10,8,10,10)]
} else {
(set_attr length 4,6,6,6,8,8,6,8,8)]
}

Does it make sense?

Thanks,
-Jiangning

 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
 Sent: Wednesday, August 10, 2011 6:40 PM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state
 
 On 10 August 2011 09:20, Jiangning Liu jiangning@arm.com wrote:
  PING...
 
 
  BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is
 carelessly
  changed, so please check attached new patch file fix_cond_cmp_3.patch.
 
 
 Please do not top post.
 
 I've been away for the whole of last week and then been away from my
 desk
 for the first 2 days this week and am still catching up with email .
 
 I'm missing a Changelog entry for this . Do I assume it is the same ?
 
 I've just noticed that the length attribute isn't correct for Thumb2
 state. When I last
 looked at this I must have missed the case where the cmp and the cmn
 are both 32 bit instructions.
 
 The length can vary between 6 and 10 bytes for the Thumb2 variant from
 my reading of the ARM-ARM.
 
 i.e cmp reg, 8bitconstant
  itcond
  cmn reg, 8bitconstant
 
 Length = 6 bytes
 
 or even with
cmp reg, reg
it cond
cmncond reg, reg
 
  All registers betwen r0 and r7 .
 
 Length is 6 bytes if you have this for both l constraints.
 Length is 6 bytes - you should catch this with Pw and Pv respectively .
 Length is 8 bytes if you have Pw and L
 Length is 8 bytes if you have I and Pv
 Length is 10 bytes if you have I and L .
 
 
 Showing you an example with l and Py at this point of time and leaving
 the rest as homework. Yes it will duplicate a few cases but it helps
 getting the lengths absolutely right.
 
 (define_insn *cmp_ite0
   [(set (match_operand 6 dominant_cc_register )
   (compare
(if_then_else:SI
 (match_operator 4 arm_comparison_operator
  [(match_operand:SI 0 s_register_operand l,r,r,r,r)
   (match_operand:SI 1 arm_add_operand lPy,rI,L,rI,L)])
 (match_operator:SI 5 arm_comparison_operator
  [(match_operand:SI 2 s_register_operand l,r,r,r,r)
   (match_operand:SI 3 arm_add_operand lPy,rI,rI,L,L)])
 (const_int 0))
(const_int 0)))]
   TARGET_32BIT
   *
   {
 static const char * const cmp1[5][2] =
 {
   {\cmp\\t%2, %3\,
\cmp\\t%0, %1\},
   {\cmp\\t%2, %3\,
\cmp\\t%0, %1\},
   {\cmp\\t%2, %3\,
\cmn\\t%0, #%n1\},
   {\cmn\\t%2, #%n3\,
\cmp\\t%0, %1\},
   {\cmn\\t%2, #%n3\,
\cmn\\t%0, #%n1\}
 };
 static const char * const cmp2[5][2] =
 {
   {\cmp%d5\\t%0, %1\,
\cmp%d4\\t%2, %3\},
   {\cmp%d5\\t%0, %1\,
\cmp%d4\\t%2, %3\},
   {\cmn%d5\\t%0, #%n1\,
\cmp%d4\\t%2, %3\},
   {\cmp%d5\\t%0, %1\,
\cmn%d4\\t%2, #%n3\},
   {\cmn%d5\\t%0, #%n1\,
\cmn%d4\\t%2, #%n3\}
 };
 static const char * const ite[2] =
 {
   \it\\t%d5\,
   \it\\t%d4\
 };
 int swap =
   comparison_dominates_p (GET_CODE (operands[5]), GET_CODE
 (operands[4]));
 
 output_asm_insn (cmp1[which_alternative][swap], operands);
 if (TARGET_THUMB2) {
 output_asm_insn (ite[swap], operands);
 }
 output_asm_insn (cmp2[which_alternative][swap], operands);
 return \\;
   }
   [(set_attr conds set)
(set_attr arch t2,any,any,any,any)
(set_attr length 6,8,8,8,8)]
 )
 
 
 As for the extra problem exposed by this specific case, may we treat
 it as a
 separate fix to decouple it with this one, and I can give follow up
 later
 on? I think it is a general problem not only for the particular
 pattern
 it/op/it/op. But I'm not sure how far we can go to optimize this kind
 of
 problems introduced by IT block.
 
 
 Currently the way in which the Thumb2 backend generates
 conditional instructions and combines them further with other
 IT blocks is by running a state machine at the very end before assembly

RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-10 Thread Jiangning Liu
PING...

BTW, in patch fix_cond_cmp_2.patch, the file mode of thumb2.md is carelessly
changed, so please check attached new patch file fix_cond_cmp_3.patch.

Thanks,
-Jiangning

 -Original Message-
 From: Jiangning Liu [mailto:jiangning@arm.com]
 Sent: Monday, August 08, 2011 2:01 PM
 To: 'Ramana Radhakrishnan'
 Cc: gcc-patches@gcc.gnu.org
 Subject: RE: [PATCH, ARM] Generate conditional compares in Thumb2 state
 
 In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I
 tried that with my patch command line option -mcpu=armv7-a9 doesn't
 generate IT instruction any longer, unless option -mthumb is being
 added.
 
 All of my tests assume command line option -mthumb, while cortex-M0,
 cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, -
 march=armv7-m, and -march=armv7e-m respectively.
 
 As for the extra problem exposed by this specific case, may we treat it
 as a separate fix to decouple it with this one, and I can give follow
 up later on? I think it is a general problem not only for the
 particular pattern it/op/it/op. But I'm not sure how far we can go to
 optimize this kind of problems introduced by IT block. For this
 specific case, I see if conversion already generates conditional move
 before combination pass. So basically the peephole rules may probably
 work for most of the general scenarios. My initial thought is go over
 the rules introducing IT block and try to figure out all of the
 combination that two of this kinds of rules can be in sequential order.
 Maybe we only need to handle the most common case like this one. Since
 I do see a bunch of rules have something to do with problem, I'd like
 to look into all of them to give a most reasonable solution in a
 separate fix.
 
 Does it make sense?
 
 Thanks,
 -Jiangning
 
  -Original Message-
  From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
  Sent: Friday, August 05, 2011 9:20 AM
  To: Jiangning Liu
  Cc: gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2
  state
 
  On 3 August 2011 08:48, Jiangning Liu jiangning@arm.com wrote:
   This patch is to generate more conditional compare instructions in
  Thumb2
   state. Given an example like below,
  
   int f(int i, int j)
   {
    if ( (i == '+') || (j == '-') ) {
      return i;
    } else {
      return j;
    }
   }
  
   Without the patch, compiler generates the following codes,
  
          sub     r2, r0, #43
          rsbs    r3, r2, #0
          adc     r3, r3, r2
          cmp     r1, #45
          it      eq
          orreq   r3, r3, #1
          cmp     r3, #0
          it      eq
          moveq   r0, r1
          bx      lr
  
   With the patch, compiler can generate conditional jump like below,
  
          cmp     r0, #43
          it      ne
          cmpne   r1, #45
          it      ne
          movne   r0, r1
          bx      lr
 
 
  Nice improvement but there could be a single it block to handle both
  and thus you could make this even better with
 
  cmp r0, #43
  itt ne
  cmpne r1 ,#45
  movne r0, r1
 
  The way to do this would be to try and split this post-reload
  unfortunately into the cmp instruction and the conditional compare
  with the appropriate instruction length - Then the backend has a
  chance of merging some of this into a single instruction.
  Unfortunately that won't be very straight-forward but that's a
  direction we probably ought to proceed with in this case.
 
  In a number of places:
 
   +   if (arm_arch_thumb2)
 
  Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is
  true based on the architecture levels and not necessarily if the user
  wants to generate Thumb code. I don't want an unnecessary IT
  instruction being emitted in the ASM block in ARM state for v7-a and
  above.
 
   Tested against arm-none-eabi target and no regression found.
 
  Presumably for ARM and Thumb2 state ?
 
 
  cheers
  Ramana


fix_cond_cmp_3.patch
Description: Binary data


RE: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-08 Thread Jiangning Liu
In attached new patch, arm_arch_thumb2 is changed to TARGET_THUMB2. I tried
that with my patch command line option -mcpu=armv7-a9 doesn't generate IT
instruction any longer, unless option -mthumb is being added.

All of my tests assume command line option -mthumb, while cortex-M0,
cortex-M3 cortex-M4 are covered by options -mcpu=cortex-m0, -march=armv7-m,
and -march=armv7e-m respectively.

As for the extra problem exposed by this specific case, may we treat it as a
separate fix to decouple it with this one, and I can give follow up later
on? I think it is a general problem not only for the particular pattern
it/op/it/op. But I'm not sure how far we can go to optimize this kind of
problems introduced by IT block. For this specific case, I see if
conversion already generates conditional move before combination pass. So
basically the peephole rules may probably work for most of the general
scenarios. My initial thought is go over the rules introducing IT block and
try to figure out all of the combination that two of this kinds of rules can
be in sequential order. Maybe we only need to handle the most common case
like this one. Since I do see a bunch of rules have something to do with
problem, I'd like to look into all of them to give a most reasonable
solution in a separate fix. 

Does it make sense?

Thanks,
-Jiangning

 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
 Sent: Friday, August 05, 2011 9:20 AM
 To: Jiangning Liu
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, ARM] Generate conditional compares in Thumb2 state
 
 On 3 August 2011 08:48, Jiangning Liu jiangning@arm.com wrote:
  This patch is to generate more conditional compare instructions in
 Thumb2
  state. Given an example like below,
 
  int f(int i, int j)
  {
   if ( (i == '+') || (j == '-') ) {
     return i;
   } else {
     return j;
   }
  }
 
  Without the patch, compiler generates the following codes,
 
         sub     r2, r0, #43
         rsbs    r3, r2, #0
         adc     r3, r3, r2
         cmp     r1, #45
         it      eq
         orreq   r3, r3, #1
         cmp     r3, #0
         it      eq
         moveq   r0, r1
         bx      lr
 
  With the patch, compiler can generate conditional jump like below,
 
         cmp     r0, #43
         it      ne
         cmpne   r1, #45
         it      ne
         movne   r0, r1
         bx      lr
 
 
 Nice improvement but there could be a single it block to handle both
 and thus you
 could make this even better with
 
 cmp r0, #43
 itt ne
 cmpne r1 ,#45
 movne r0, r1
 
 The way to do this would be to try and split this post-reload
 unfortunately into the cmp instruction and the conditional compare
 with the appropriate instruction length - Then the backend has a
 chance of merging some of this into a single instruction.
 Unfortunately that won't be very straight-forward but that's a
 direction we probably ought to proceed with in this case.
 
 In a number of places:
 
  +   if (arm_arch_thumb2)
 
 Ah instead of this please use if (TARGET_THUMB2) - arm_arch_thumb2 is
 true based on the architecture levels and not necessarily if the user
 wants to generate Thumb code. I don't want an unnecessary IT
 instruction being emitted in the ASM block in ARM state for v7-a and
 above.
 
  Tested against arm-none-eabi target and no regression found.
 
 Presumably for ARM and Thumb2 state ?
 
 
 cheers
 Ramana


fix_cond_cmp_2.patch
Description: Binary data


[PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-03 Thread Jiangning Liu
This patch is to generate more conditional compare instructions in Thumb2
state. Given an example like below,

int f(int i, int j) 
{
  if ( (i == '+') || (j == '-') ) {
return i;
  } else {
return j;
  }
}

Without the patch, compiler generates the following codes,

sub r2, r0, #43
rsbsr3, r2, #0
adc r3, r3, r2
cmp r1, #45
it  eq
orreq   r3, r3, #1
cmp r3, #0
it  eq
moveq   r0, r1
bx  lr

With the patch, compiler can generate conditional jump like below,

cmp r0, #43
it  ne
cmpne   r1, #45
it  ne
movne   r0, r1
bx  lr

The patch is essentially to insert *it* instruction for the following rules
in arm.md,

* cmp_ite0
* cmp_ite1
* cmp_and
* cmp_ior

Tested against arm-none-eabi target and no regression found.

Source code Changelog would be:

2011-07-29  Jiangning Liu  jiangning@arm.com

* config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
(*ior_scc_scc_cmp): Likewise
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
(*and_scc_scc_nodom): Likewise.
(*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

Testsuite Changelog would be:

2011-07-29  Jiangning Liu  jiangning@arm.com

* gcc.target/arm/thumb2-cond-cmp-1.c: New. Make sure conditional 
compare can be generated.
* gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
* gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.

Thanks,
-Jiangning

fix_cond_cmp.patch
Description: Binary data


RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
The answer is ARM can. However, if you look into the bugs PR30282 and 
PR38644, PR44199, you may find in history, there are several different cases

in different ports reporting the similar failures, covering x86, PowerPC and

ARM. You are right, they were all fixed in back-ends in the past, but we
should 
fix the bug in a general way to make GCC infrastructure stronger, rather 
than fixing the problem target-by-target and case-by-case! If you further 
look into the back-end fixes in x86 and PowerPC, you may find they looks 
quite similar in back-ends. 

Thanks,
-Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
 On Behalf Of Jakub Jelinek
 Sent: Monday, August 01, 2011 5:12 PM
 To: Jiangning Liu
 Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
 vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
 Radhakrishnan; 'Ramana Radhakrishnan'
 Subject: Re: [RFC] Add middle end hook for stack red zone size
 
 On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
  It's quite necessary to solve the general problem in middle-end rather
than in
 back-end.
 
 That's what we disagree on.  All back-ends but ARM are able to handle it
 right, why can't ARM too?  The ABI rules for stack handling in the
epilogues
 are simply too diverse and complex to be handled easily in the scheduler.
 
   Jakub






RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
Hi Jakub,

Appreciate for your valuable comments!

I think SPARC V9 ABI doesn't have red zone defined, right? So
stack_red_zone_size should be defined as zero by default, the scheduler
would block moving memory accesses across stack adjustment no matter what
the offset is. I don't see any risk here. Also, in my patch function *abs*
is being used to avoid the opposite stack direction issue as you mentioned.

Some people like you insist on the ABI diversity, and actually I agree with
you on this. But part of the ABI definition is general for all targets. The
point here is memory access beyond stack red zone should be avoided, which
is the general part of ABI that compiler should guarantee. For this general
part, middle end should take the responsibility.

Thanks,
-Jiangning

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Monday, August 01, 2011 6:31 PM
 To: Jiangning Liu
 Cc: 'Joern Rennecke'; g...@gcc.gnu.org; gcc-patches@gcc.gnu.org;
 vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
 Radhakrishnan; 'Ramana Radhakrishnan'
 Subject: Re: [RFC] Add middle end hook for stack red zone size
 
 On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote:
  ARM. You are right, they were all fixed in back-ends in the past, but
 we
  should
  fix the bug in a general way to make GCC infrastructure stronger,
 rather
  than fixing the problem target-by-target and case-by-case! If you
 further
  look into the back-end fixes in x86 and PowerPC, you may find they
 looks
  quite similar in back-ends.
 
 
 Red zone is only one difficulty, your patch is e.g. completely ignoring
 existence of biased stack pointers (e.g. SPARC -m64 has them).
 Some targets have stack growing in opposite direction, etc.
 We have really a huge amount of very diverse ABIs and making the
 middle-end
 grok what is an invalid stack access is difficult.
 
   Jakub






RE: [RFC] Add middle end hook for stack red zone size

2011-07-31 Thread Jiangning Liu
Joern,

Thanks for your valuable feedback! This is only a RFC, and I will send out 
formal patch along with ChangLog later on. 

Basically, my patch is only to add new dependence in scheduler, and it only 
blocks some instruction movements, so it is NO RISK to compiler correctness. 
For whatever stack pointer changes you gave in different scenarios, the current 
code base should already work. My patch intends neither to replace old 
dependences, nor maximize the scheduler capability due to the existence of red 
zone in stack. It is only to block the memory access moving over stack pointer 
adjustment if distance is beyond red zone size, which is an OS requirement due 
to interruption existence. 

Stack adjustment in epilogue is a very general usage in stack frame. It's quite 
necessary to solve the general problem in middle-end rather than in back-end. 
Also, that old patch you attached is to solve the data dependence between two 
memory accesses, but stack pointer doesn't really have data dependence with 
memory access without using stack pointer, so they have different stories. 
Alternative solution of without adding blunt scheduling barrier is we insert an 
independent pass before scheduler to create RTL barrier by using the same 
interface stack_red_zone_size, but it would really be an over-design, if we add 
a new pass only for this *small* functionality.

In my patch, *abs* of offset is being used, so you are right that it's possible 
to get false positive to be too conservative, but there won't exist false 
negative, because my code would only add new dependences. 

Since the compilation is based on function, it would be OK if red zone size 
varies due to different ABI. Could you please tell me exactly on what system 
being supported by GCC red zone size can be different for incoming and 
outgoing? And also how scheduler guarantee the correctness in current code 
base? Anyway, I don't think my patch will break the original solution.

Thanks,
-Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
 On Behalf Of Joern Rennecke
 Sent: Tuesday, July 26, 2011 10:33 AM
 To: Jiangning Liu
 Cc: g...@gcc.gnu.org; gcc-patches@gcc.gnu.org; vmaka...@redhat.com;
 dje@gmail.com; Richard Henderson; Ramana Radhakrishnan; 'Ramana
 Radhakrishnan'
 Subject: RE: [RFC] Add middle end hook for stack red zone size
 
 Quoting Jiangning Liu jiangning@arm.com:
 
  Hi,
 
  One month ago, I sent out this RFC to *gcc-patches* mail list, but I
   didn't receive any response yet. So I'm forwarding this mail to
  *gcc* mail list. Can anybody here really give feedback to me?
 
 Well, I couldn't approve any patch, but I can point out some issues with your 
 patch.
 
 First, it's missing a ChangeLog, and you don't state how you have tested it.
 And regarding the code in sched_analyze_1, I think you'll get false positives 
 with
 alloca, and false negatives when registers are involved to compute offsets or 
 to
 restore the stack pointer from.
 
 FWIW, I think generally blunt scheduling barriers should be avoided, and 
 instead
 the dependencies made visible to the scheduler.
 E.g., I've been working with another architecture with a redzone, where at 
 -fno-
 omit-frame-pointer, the prologue can put pretend_args into the redzone, then 
 after
 stack adjustment and frame allocation, these arguments are accessed via the 
 frame
 pointer.
 
 With the attached patchlet, alias analysis works for this situation too, so 
 no blunt
 scheduling block is required.
 
 Likewise, with stack adjustments, they should not affect scheduling in 
 general, but
 be considered to clobber the part of the frame that is being exposed to 
 interrupt
 writes either before or after the adjustment.
 At the moment, each port that wants to have such selective scheduling 
 blockages
 has to define a stack_adjust pattern with a memory clobber in a parallel, 
 with a
 memref that shows the high water mark of possible interrupt stack writes.
 Prima facia it would seem convenient if you only had to tell the middle-end 
 about
 the redzone size, and it could figure out the implicit clobbers when the 
 stack is
 changed.  However, when a big stack adjustment is being made, or the stack is
 realigned, or restored from the frame pointer / another register where it was
 saved due to realignment, the adjustment is not so obvious.  I'm not sure if 
 you can
 actually create an robust interface that's simpler to use than putting the 
 right
 memory clobber in the stack adjust pattern.  Note also that the redzone size 
 can
 vary from function to function depending on ABI-altering attributes, in 
 particular
 for interrupt functions, which can also have different incoming and outgoing
 redzone sizes.  Plus, you can have an NMI / reset handler which can use the 
 stack
 like an ordinary address register.





RE: [RFC] Add middle end hook for stack red zone size

2011-07-25 Thread Jiangning Liu
Hi,

One month ago, I sent out this RFC to *gcc-patches* mail list, but I didn't 
receive any response yet. So I'm forwarding this mail to *gcc* mail list. Can 
anybody here really give feedback to me?

Appreciate your help in advance!

-Jiangning

-Original Message-
From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] 
Sent: Tuesday, July 19, 2011 6:18 PM
To: Jiangning Liu
Cc: gcc-patches@gcc.gnu.org; vmaka...@redhat.com; dje@gmail.com; Richard 
Henderson; Ramana Radhakrishnan
Subject: Re: [RFC] Add middle end hook for stack red zone size

2011/7/19 Jiangning Liu jiangning@arm.com:

 I see a lot of feedbacks on other posts, but mine is still with ZERO
 response in the past 3 weeks, so I'm wondering if I made any mistake in my
 process? Who can help me?

It would be worth CC'ing the other relevant target maintainers as well
to get some feedback since the patch touches ARM, x86 and Powerpc.
I've added the maintainers for i386 and PPC to the CC list using the
email addresses from the MAINTAINERS file.

Thanks,
Ramana


 Thanks,
 -Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
 On Behalf Of Jiangning Liu
 Sent: Tuesday, July 05, 2011 8:32 AM
 To: gcc-patches@gcc.gnu.org; rgue...@gcc.gnu.org
 Subject: RE: [RFC] Add middle end hook for stack red zone size

 PING...

 I just merged with the latest code base and generated new patch as attached.

 Thanks,
 -Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: 2011年6月28日 4:38 PM
 To: gcc-patches@gcc.gnu.org
 Subject: [RFC] Add middle end hook for stack red zone size

 This patch is to fix PR38644, which is a bug with long history about
 stack red zone access, and PR30282 is correlated.

 Originally red zone concept is not exposed to middle-end, and back-end
 uses special logic to add extra memory barrier RTL and help the
 correct dependence in middle-end. This way different back-ends must
 handle red zone problem by themselves. For example, X86 target
 introduced function
 ix86_using_red_zone() to judge red zone access, while POWER introduced
 offset_below_red_zone_p() to judge it. Note that they have different
 semantics, but the logic in caller sites of back-end uses them to
 decide whether adding memory barrier RTL or not. If back-end
 incorrectly handles this, bug would be introduced.

 Therefore, the correct method should be middle-end handles red zone
 related things to avoid the burden in different back-ends. To be
 specific for PR38644, this middle-end problem causes incorrect
 behavior for ARM target.
 This patch exposes red zone concept to middle-end by introducing a
 middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in
 target.def, and by default its value is 0. Back-end may redefine this
 function to provide concrete red zone size according to specific ABI
 requirements.

 In middle end, scheduling dependence is modified by using this hook
 plus checking stack frame pointer adjustment instruction to decide
 whether memory references need to be all flushed out or not. In
 theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end
 would not be required to specially handle this scheduling dependence
 issue by introducing extra memory barrier RTL.

 In back-end, the following changes are made to define the hook,
 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
 ix86_stack_red_zone_size() in i386.c, which is an newly introduced
 function.
 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
 rs6000_stack_red_zone_size() in rs6000.c, which is also a newly
 defined function.
 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
 default_stack_red_zone_size in targhooks.c, and this function returns
 0, which means ARM eabi and others don't support red zone access at all.

 In summary, the relationship between ABI and red zone access is like
 below,

 -
 |   ARCH   |  ARM  |   X86 |POWER  | others |
 |--|---|---|---||
 |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
 |--|---|---|---||--||
 | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
 |--|---|---|---||--||
 | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
 -

 Thanks,
 -Jiangning










RE: [RFC] Add middle end hook for stack red zone size

2011-07-04 Thread Jiangning Liu
PING...

I just merged with the latest code base and generated new patch as attached.

Thanks,
-Jiangning

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: 2011年6月28日 4:38 PM
 To: gcc-patches@gcc.gnu.org
 Subject: [RFC] Add middle end hook for stack red zone size
 
 This patch is to fix PR38644, which is a bug with long history about
 stack red zone access, and PR30282 is correlated.
 
 Originally red zone concept is not exposed to middle-end, and back-end
 uses special logic to add extra memory barrier RTL and help the correct
 dependence in middle-end. This way different back-ends must handle red
 zone problem by themselves. For example, X86 target introduced function
 ix86_using_red_zone() to judge red zone access, while POWER introduced
 offset_below_red_zone_p() to judge it. Note that they have different
 semantics, but the logic in caller sites of back-end uses them to
 decide whether adding memory barrier RTL or not. If back-end
 incorrectly handles this, bug would be introduced.
 
 Therefore, the correct method should be middle-end handles red zone
 related things to avoid the burden in different back-ends. To be
 specific for PR38644, this middle-end problem causes incorrect behavior
 for ARM target.
 This patch exposes red zone concept to middle-end by introducing a
 middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in
 target.def, and by default its value is 0. Back-end may redefine this
 function to provide concrete red zone size according to specific ABI
 requirements.
 
 In middle end, scheduling dependence is modified by using this hook
 plus checking stack frame pointer adjustment instruction to decide
 whether memory references need to be all flushed out or not. In theory,
 if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not
 be required to specially handle this scheduling dependence issue by
 introducing extra memory barrier RTL.
 
 In back-end, the following changes are made to define the hook,
 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
 ix86_stack_red_zone_size() in i386.c, which is an newly introduced
 function.
 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
 rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined
 function.
 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
 default_stack_red_zone_size in targhooks.c, and this function returns 0,
 which means ARM eabi and others don't support red zone access at all.
 
 In summary, the relationship between ABI and red zone access is like
 below,
 
 -
 |   ARCH   |  ARM  |   X86 |POWER  | others |
 |--|---|---|---||
 |ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
 |--|---|---|---||--||
 | RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
 |--|---|---|---||--||
 | RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
 -
 
 Thanks,
 -Jiangning

stack-red-zone-patch-38644-4.patch
Description: Binary data


[RFC] Add middle end hook for stack red zone size

2011-06-28 Thread Jiangning Liu
This patch is to fix PR38644, which is a bug with long history about stack
red zone access, and PR30282 is correlated.

Originally red zone concept is not exposed to middle-end, and back-end uses
special logic to add extra memory barrier RTL and help the correct
dependence in middle-end. This way different back-ends must handle red zone
problem by themselves. For example, X86 target introduced function
ix86_using_red_zone() to judge red zone access, while POWER introduced
offset_below_red_zone_p() to judge it. Note that they have different
semantics, but the logic in caller sites of back-end uses them to decide
whether adding memory barrier RTL or not. If back-end incorrectly handles
this, bug would be introduced. 

Therefore, the correct method should be middle-end handles red zone related
things to avoid the burden in different back-ends. To be specific for
PR38644, this middle-end problem causes incorrect behavior for ARM target.
This patch exposes red zone concept to middle-end by introducing a
middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in target.def,
and by default its value is 0. Back-end may redefine this function to
provide concrete red zone size according to specific ABI requirements. 

In middle end, scheduling dependence is modified by using this hook plus
checking stack frame pointer adjustment instruction to decide whether memory
references need to be all flushed out or not. In theory, if
TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not be
required to specially handle this scheduling dependence issue by introducing
extra memory barrier RTL.

In back-end, the following changes are made to define the hook,
1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
ix86_stack_red_zone_size() in i386.c, which is an newly introduced function.
2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined
function.
3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
default_stack_red_zone_size in targhooks.c, and this function returns 0,
which means ARM eabi and others don't support red zone access at all.

In summary, the relationship between ABI and red zone access is like below,

-
|   ARCH   |  ARM  |   X86 |POWER  | others |
|--|---|---|---||
|ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
|--|---|---|---||--||
| RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
|--|---|---|---||--||
| RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
-

Thanks,
-Jiangning

stack-red-zone-patch-38644-3.patch
Description: Binary data