from:"Greta Yorsh"

RE: [PATCH,ARM] Define MAX_CONDITIONAL_EXECUTE

2013-06-24 Thread Greta Yorsh

PING...
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00948.html

Thanks,
Greta

 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 17 June 2013 12:19
 To: GCC Patches
 Cc: Richard Earnshaw; Ramana Radhakrishnan; p...@codesourcery.com;
 ni...@redhat.com
 Subject: [PATCH,ARM] Define MAX_CONDITIONAL_EXECUTE

 This patch makes the following changes:
 * Define MAX_CONDITIONAL_EXECUTE in arm backend using
 max_insns_skipped,
 which is set based on the current tune.
 * Update max_insns_skipped for Cortex-A15 tune to be 2 (instead of 5).
 * Use max_insns_skipped in thumb2_final_prescan_insn to decide when to
 combine IT blocks
 into larger IT blocks. Previously, max_insns_skipped was only used in
 arm_final_prescan_insn to decide when branch should be converted to
 conditional execution.

 No regression on qemu for arm-none-eabi with cortex-a15 arm/thumb mode.
 Bootstrap successful on Cortex-A15.

 Performance improvement on Cortex-A15 in both arm and thumb states on
 both
 Dhrystone and Coremark, and improvement on Spec2000 in thumb state,
 with all
 benchmarks showing improvements except three benchmarks in CFP2000 that
 have
 slight regressions (189,183,178).

 gcc/ChangeLog

 2013-06-17  Greta Yorsh  greta.yo...@arm.com

   * config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): Define macro.
   * config/arm/arm-protos.h (arm_max_conditional_execute): New
   declaration.
   (tune_params): Update comment.
   * config/arm/arm.c (arm_cortex_a15_tune): Set max_cond_insns to
 2.
   (arm_max_conditional_execute): New function.
   (thumb2_final_prescan_insn): Use max_insn_skipped and
   MAX_INSN_PER_IT_BLOCK to compute maximum instructions in a block.

[PATCH,ARM] Define MAX_CONDITIONAL_EXECUTE

2013-06-17 Thread Greta Yorsh

This patch makes the following changes:
* Define MAX_CONDITIONAL_EXECUTE in arm backend using max_insns_skipped,
which is set based on the current tune. 
* Update max_insns_skipped for Cortex-A15 tune to be 2 (instead of 5).
* Use max_insns_skipped in thumb2_final_prescan_insn to decide when to
combine IT blocks
into larger IT blocks. Previously, max_insns_skipped was only used in 
arm_final_prescan_insn to decide when branch should be converted to
conditional execution. 

No regression on qemu for arm-none-eabi with cortex-a15 arm/thumb mode.
Bootstrap successful on Cortex-A15. 

Performance improvement on Cortex-A15 in both arm and thumb states on both
Dhrystone and Coremark, and improvement on Spec2000 in thumb state, with all
benchmarks showing improvements except three benchmarks in CFP2000 that have
slight regressions (189,183,178).

gcc/ChangeLog

2013-06-17  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): Define macro.
* config/arm/arm-protos.h (arm_max_conditional_execute): New
declaration.
(tune_params): Update comment.
* config/arm/arm.c (arm_cortex_a15_tune): Set max_cond_insns to 2.
(arm_max_conditional_execute): New function.
(thumb2_final_prescan_insn): Use max_insn_skipped and
MAX_INSN_PER_IT_BLOCK to compute maximum instructions in a block.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c791341..374c364 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -227,6 +227,8 @@ extern const char *arm_mangle_type (const_tree);
 
 extern void arm_order_regs_for_local_alloc (void);
 
+extern int arm_max_conditional_execute ();
+
 /* Vectorizer cost model implementation.  */
 struct cpu_vec_costs {
   const int scalar_stmt_cost;   /* Cost of any scalar operation, excluding
@@ -256,8 +258,7 @@ struct tune_params
   bool (*rtx_costs) (rtx, RTX_CODE, RTX_CODE, int *, bool);
   bool (*sched_adjust_cost) (rtx, rtx, rtx, int *);
   int constant_limit;
-  /* Maximum number of instructions to conditionalise in
- arm_final_prescan_insn.  */
+  /* Maximum number of instructions to conditionalise.  */
   int max_insns_skipped;
   int num_prefetch_slots;
   int l1_cache_size;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 43dfe27..6ca81eb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1054,7 +1057,7 @@ const struct tune_params arm_cortex_a15_tune =
   arm_9e_rtx_costs,
   NULL,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,   /* Prefer constant pool.  */
   arm_default_branch_cost,
@@ -9101,6 +9104,12 @@ arm_adjust_cost (rtx insn, rtx link, rtx dep, int cost)
   return cost;
 }
 
+int
+arm_max_conditional_execute (void)
+{
+  return max_insns_skipped;
+}
+
 static int
 arm_default_branch_cost (bool speed_p, bool predictable_p ATTRIBUTE_UNUSED)
 {
@@ -19488,6 +19497,13 @@ thumb2_final_prescan_insn (rtx insn)
   enum arm_cond_code code;
   int n;
   int mask;
+  int max;
+
+  /* Maximum number of conditionally executed instructions in a block
+ is minimum of the two max values: maximum allowed in an IT block
+ and maximum that is beneficial according to the cost model and tune.  */
+  max = (max_insns_skipped  MAX_INSN_PER_IT_BLOCK) ?
+max_insns_skipped : MAX_INSN_PER_IT_BLOCK;
 
   /* Remove the previous insn from the count of insns to be output.  */
   if (arm_condexec_count)
@@ -19530,9 +19546,9 @@ thumb2_final_prescan_insn (rtx insn)
   /* ??? Recognize conditional jumps, and combine them with IT blocks.  */
   if (GET_CODE (body) != COND_EXEC)
break;
-  /* Allow up to 4 conditionally executed instructions in a block.  */
+  /* Maximum number of conditionally executed instructions in a block.  */
   n = get_attr_ce_count (insn);
-  if (arm_condexec_masklen + n  MAX_INSN_PER_IT_BLOCK)
+  if (arm_condexec_masklen + n  max)
break;
 
   predicate = COND_EXEC_TEST (body);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3a49a90..387d271 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -183,6 +183,11 @@ extern arm_cc arm_current_cc;
 
 #define ARM_INVERSE_CONDITION_CODE(X)  ((arm_cc) (((int)X) ^ 1))
 
+/* The maximaum number of instructions that is beneficial to
+   conditionally execute. */
+#undef MAX_CONDITIONAL_EXECUTE
+#define MAX_CONDITIONAL_EXECUTE arm_max_conditional_execute ()
+
 extern int arm_target_label;
 extern int arm_ccfsm_state;
 extern GTY(()) rtx arm_target_insn;

[PING][PATCH,ARM] Fix PR56732 - backport to gcc 4.8

2013-05-24 Thread Greta Yorsh

This patch (trunk r198547)
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00061.html
fixes an ICE in gcc 4.8:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56732

Ok to backport to 4.8 branch?

Thanks,
Greta

 -Original Message-
 From: Richard Earnshaw
 Sent: 02 May 2013 15:45
 To: Greta Yorsh
 Cc: GCC Patches; Ramana Radhakrishnan; diffg...@gmail.com;
 enrico.sch...@informatik.tu-chemnitz.de; mi...@it.uu.se
 Subject: Re: [PATCH,ARM] Fix PR56732
 
 On 02/05/13 13:52, Greta Yorsh wrote:
  Epilogue in RTL (r188743) generated for naked functions adds simple
 return
  jump insn and causes an ICE, as described here:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56732
 
  There is a missing check of really_return argument in
 arm_expand_epilogue.
  This patch adds the missing check and a new test.
 
  No regression on qemu for arm-none-eabi with cortex-a15 arm/thumb.
  Bootstrap successful on Cortex-A15 and no regression.
 
  Ok for trunk?
 
  Thanks,
  Greta
 
  gcc/ChangeLog
 
  2013-05-02  Greta Yorsh  greta.yo...@arm.com
 
  PR target/56732
  * config/arm/arm.c (arm_expand_epilogue): Check really_return
 before
  generating simple_return for naked functions.
 
  gcc/testsuite/ChangeLog
 
  2013-05-02  Greta Yorsh  greta.yo...@arm.com
 
  PR target/56732
  * gcc.target/arm/pr56732-1.c: New test.
 
 
 OK.
 
 R.

[Patch,Testsuite] Fix failure in gcc.dg/tree-ssa/forwprop-26.c

2013-05-03 Thread Greta Yorsh

This is a new test that fails on arm and probably other targets that
have short enums by default:

FAIL: gcc.dg/tree-ssa/forwprop-26.c (test for excess errors)
Excess errors:
/src/gcc/gcc/testsuite/gcc.dg/tree-ssa/forwprop-26.c:13:22: error:
width of 'code' exceeds its type
gcc.dg/tree-ssa/forwprop-26.c: dump file does not exist

This patch adds missing -fno-short-enums to dg-options. It fixes the
test failure.

Ok for trunk?

Thanks,
Greta

gcc/testsuite/ChangeLog

2013-05-03  Greta Yorsh  greta.yo...@arm.com

 * gcc.dg/tree-ssa/forwprop-26.c: Add -fno-short-enums
 to dg-options.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-26.c
index 14821af..108b1bc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-26.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -fdump-tree-forwprop1 } */
+/* { dg-options -O2 -fdump-tree-forwprop1 -fno-short-enums } */
 
 union tree_node;
 typedef union tree_node *tree;

[PATCH,ARM] Fix PR56732

2013-05-02 Thread Greta Yorsh

Epilogue in RTL (r188743) generated for naked functions adds simple return
jump insn and causes an ICE, as described here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56732

There is a missing check of really_return argument in arm_expand_epilogue.
This patch adds the missing check and a new test.

No regression on qemu for arm-none-eabi with cortex-a15 arm/thumb.
Bootstrap successful on Cortex-A15 and no regression.

Ok for trunk?

Thanks,
Greta

gcc/ChangeLog

2013-05-02  Greta Yorsh  greta.yo...@arm.com

PR target/56732
* config/arm/arm.c (arm_expand_epilogue): Check really_return before
generating simple_return for naked functions.

gcc/testsuite/ChangeLog

2013-05-02  Greta Yorsh  greta.yo...@arm.com

PR target/56732
* gcc.target/arm/pr56732-1.c: New test.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 464d91c..9d4a453 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24067,7 +24067,8 @@ arm_expand_epilogue (bool really_return)
   if (IS_NAKED (func_type)
   || (IS_VOLATILE (func_type)  TARGET_ABORT_NORETURN))
 {
-  emit_jump_insn (simple_return_rtx);
+  if (really_return)
+emit_jump_insn (simple_return_rtx);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/pr56732-1.c 
b/gcc/testsuite/gcc.target/arm/pr56732-1.c
new file mode 100644
index 000..ac8b8cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr56732-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target naked_functions } */
+/* { dg-options -O2 -Wall } */
+extern void bar();
+
+void __attribute__((__naked__))
+foo(void)
+{
+  bar ();
+}
+
+int __attribute__((naked))
+zoo (int a, int b, int c, int d, int e, int f)
+{
+  bar ();
+  return e;
+}
+/* Verify that __attribute__((naked)) produces a naked function that
+   does not use bx to return. */
+/* { dg-final { scan-assembler-not \tbx\tlr } } */

[Patch, ARM][11/n] Split patterns that output multiple assembly instructions - thumb2.md

2013-04-30 Thread Greta Yorsh

This patch continues to clean up patterns that output multiple assembly
instructions. It handles most of the patterns in thumb2.md. 

The following patterns are not split:
  thumb2_movcond, thumb2_cond_move - complex, maybe later.
  tls_load_dot_plus_four - won't split: uses asm_out in output statement.
  thumb2_cbz - won't split: uses pc in length attribute and length in output
statement.
  thumb2_cbnz - likewise.

No regression on qemu for arm-none-eabi. Bootstrap successful on Cortex-A15.

Ok for trunk?

Thanks,
Greta

2013-04-24  Greta Yorsh  greta.yo...@arm.com

* config/arm/thumb2.md (thumb2_smaxsi3,thumb2_sminsi3): Convert
define_insn to define_insn_and_split.
(thumb32_umaxsi3,thumb2_uminsi3): Likewise.
(thumb2_negdi2,thumb2_abssi2,thumb2_neg_abssi2): Likewise.
(thumb2_mov_scc,thumb2_mov_negscc,thumb2_mov_notscc): Likewise.
(thumb2_movsicc_insn,thumb2_and_scc,thumb2_ior_scc): Likewise.
(thumb2_negscc): Likewise.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 697350c..92ae8f4 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -64,81 +38,167 @@
(set_attr type alu_shift)]
 )
 
-(define_insn *thumb2_smaxsi3
+(define_insn_and_split *thumb2_smaxsi3
   [(set (match_operand:SI  0 s_register_operand =r,r,r)
(smax:SI (match_operand:SI 1 s_register_operand  0,r,?r)
 (match_operand:SI 2 arm_rhs_operandrI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB2
-  @
-   cmp\\t%1, %2\;it\\tlt\;movlt\\t%0, %2
-   cmp\\t%1, %2\;it\\tge\;movge\\t%0, %1
-   cmp\\t%1, %2\;ite\\tge\;movge\\t%0, %1\;movlt\\t%0, %2
+  #
+  ; cmp\\t%1, %2\;it\\tlt\;movlt\\t%0, %2
+  ; cmp\\t%1, %2\;it\\tge\;movge\\t%0, %1
+  ; cmp\\t%1, %2\;ite\\tge\;movge\\t%0, %1\;movlt\\t%0, %2
+  TARGET_THUMB2
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (ge:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 10,10,14)]
 )
 
-(define_insn *thumb2_sminsi3
+(define_insn_and_split *thumb2_sminsi3
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(smin:SI (match_operand:SI 1 s_register_operand 0,r,?r)
 (match_operand:SI 2 arm_rhs_operand rI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB2
-  @
-   cmp\\t%1, %2\;it\\tge\;movge\\t%0, %2
-   cmp\\t%1, %2\;it\\tlt\;movlt\\t%0, %1
-   cmp\\t%1, %2\;ite\\tlt\;movlt\\t%0, %1\;movge\\t%0, %2
+  #
+   ; cmp\\t%1, %2\;it\\tge\;movge\\t%0, %2
+   ; cmp\\t%1, %2\;it\\tlt\;movlt\\t%0, %1
+   ; cmp\\t%1, %2\;ite\\tlt\;movlt\\t%0, %1\;movge\\t%0, %2
+  TARGET_THUMB2
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (lt:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 10,10,14)]
 )
 
-(define_insn *thumb32_umaxsi3
+(define_insn_and_split *thumb32_umaxsi3
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(umax:SI (match_operand:SI 1 s_register_operand 0,r,?r)
 (match_operand:SI 2 arm_rhs_operand rI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB2
-  @
-   cmp\\t%1, %2\;it\\tcc\;movcc\\t%0, %2
-   cmp\\t%1, %2\;it\\tcs\;movcs\\t%0, %1
-   cmp\\t%1, %2\;ite\\tcs\;movcs\\t%0, %1\;movcc\\t%0, %2
+  #
+   ; cmp\\t%1, %2\;it\\tcc\;movcc\\t%0, %2
+   ; cmp\\t%1, %2\;it\\tcs\;movcs\\t%0, %1
+   ; cmp\\t%1, %2\;ite\\tcs\;movcs\\t%0, %1\;movcc\\t%0, %2
+  TARGET_THUMB2
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (geu:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 10,10,14)]
 )
 
-(define_insn *thumb2_uminsi3
+(define_insn_and_split *thumb2_uminsi3
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(umin:SI (match_operand:SI 1 s_register_operand 0,r,?r)
 (match_operand:SI 2 arm_rhs_operand rI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_THUMB2
-  @
-   cmp\\t%1, %2\;it\\tcs\;movcs\\t%0, %2
-   cmp\\t%1, %2\;it\\tcc\;movcc\\t%0, %1
-   cmp\\t%1, %2\;ite\\tcc\;movcc\\t%0, %1\;movcs\\t%0, %2
+  #
+   ; cmp\\t%1, %2\;it\\tcs\;movcs\\t%0, %2
+   ; cmp\\t%1, %2\;it\\tcc\;movcc\\t%0, %1
+   ; cmp\\t%1, %2\;ite\\tcc\;movcc\\t%0, %1\;movcs\\t%0, %2
+  TARGET_THUMB2
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (ltu:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 10,10,14)]
 )
 
 ;; Thumb

[Patch,ARM,Committed] Remove trailing whitespaces in thumb2.md

2013-04-30 Thread Greta Yorsh

Remove trailing whitespaces in thumb2.md. Committed as obvious (trunk
r198464).

Thanks,
Greta

2013-04-30  Greta Yorsh  greta.yo...@arm.com

* config/arm/thumb2.md: Remove trailing whitespaces.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 697350c..3aa7247 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -50,7 +50,7 @@
(set_attr length 6,10)]
 )
 
-;; Thumb-2 only allows shift by constant on data processing instructions 
+;; Thumb-2 only allows shift by constant on data processing instructions
 (define_insn *thumb_andsi_not_shiftsi_si
   [(set (match_operand:SI 0 s_register_operand =r)
(and:SI (not:SI (match_operator:SI 4 shift_operator
@@ -330,7 +330,7 @@
   [(set_attr conds clob)]
 )
 ;; Don't define thumb2_load_indirect_jump because we can't guarantee label
-;; addresses will have the thumb bit set correctly. 
+;; addresses will have the thumb bit set correctly.
 
 
 (define_insn *thumb2_and_scc
@@ -401,7 +401,7 @@
 
 (define_insn *thumb2_cond_arith
   [(set (match_operand:SI 0 s_register_operand =r,r)
-(match_operator:SI 5 shiftable_operator 
+(match_operator:SI 5 shiftable_operator
 [(match_operator:SI 4 arm_comparison_operator
[(match_operand:SI 2 s_register_operand r,r)
(match_operand:SI 3 arm_rhs_operand rI,rI)])
@@ -864,7 +864,7 @@
   else
 return \cmp\\t%0, #0\;beq\\t%l1\;
   
-  [(set (attr length) 
+  [(set (attr length)
 (if_then_else
(and (ge (minus (match_dup 1) (pc)) (const_int 2))
 (le (minus (match_dup 1) (pc)) (const_int 128))
@@ -887,7 +887,7 @@
   else
 return \cmp\\t%0, #0\;bne\\t%l1\;
   
-  [(set (attr length) 
+  [(set (attr length)
 (if_then_else
(and (ge (minus (match_dup 1) (pc)) (const_int 2))
 (le (minus (match_dup 1) (pc)) (const_int 128))

[PATCH,ARM] Internal memcpy using LDRD/STRD

2013-04-30 Thread Greta Yorsh

This patch for gcc's internal memcpy emits LDRD/STRD whenever
possible, if prefer_ldrd_strd field is set in tune_params. 

It uses DImode moves in both ARM and Thumb modes.

The generic move_by_pieces implementation cannot be used as is
to generate the same instruction sequence.

To handle cases in which either source or destination is not word-aligned, 
this patch introduces new patterns for UNSPEC_UNALIGNED double-word access.
After reload, the pattern is split into two unaligned single-word accesses.
It prevents lower_subreg from splitting an aligned double-word access
that depends on the unaligned access.
This may become unnecessary when the cost model is fixed.

This patch also adjusts existing tests to accept LDRD/STRD or LDM/STM
depending on effective target arm_prefer_ldrd_strd. 

An early version of this patch was posted here:
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00921.html
The new version is simpler because it generates 
(a) the same RTL for both Thumb and ARM modes, and 
(b) load and store blocks are matched, i.e., no need for store_partial_word
subroutine any more.
The previous version did not use DImode moves in Thumb mode. 
Instead, it relied on LDRD/STRD patterns introduced by patches for
Thumb prolog/epilog using LDRD/STRD. These patterns were not approved, 
because of a potential problem with reload, see here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01807.html 
A slightly modified version of these patterns, approved and committed,
matches only
after reload, whereas the RTL insns for internal memcpy are generated
early on, during expand. There might be missed optimization
opportunities in Thumb mode.

No regression on qemu for arm-none-eabi with cpu cortex-a15 arm/thumb.

Bootstrap successful on Cortex-A15.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2013-04-30  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (gen_movmem_ldrd_strd): New declaration.
* config/arm/arm.c (next_consecutive_mem): New function.
(gen_movmem_ldrd_strd): Likewise.
* config/arm/arm.md (movmemqi): Update condition and code.
(unaligned_loaddi, unaligned_storedi): New patterns.

gcc/testsuite

2013-04-30  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/unaligned-memcpy-2.c: Adjust expected output.
* gcc.target/arm/unaligned-memcpy-3.c: Likewise.
* gcc.target/arm/unaligned-memcpy-4.c: Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4274c0d..9e43419 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -119,6 +119,7 @@ extern rtx arm_gen_store_multiple (int *, int, rtx, int, 
rtx, HOST_WIDE_INT *);
 extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
 extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
+extern bool gen_movmem_ldrd_strd (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
   HOST_WIDE_INT);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 231a27f..a3b9787 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11836,6 +11836,134 @@ arm_gen_movmemqi (rtx *operands)
   return 1;
 }
 
+/* Helper for gen_movmem_ldrd_strd. Increase the address of memory rtx
+by mode size.  */
+inline static rtx
+next_consecutive_mem (rtx mem)
+{
+  enum machine_mode mode = GET_MODE (mem);
+  HOST_WIDE_INT offset = GET_MODE_SIZE (mode);
+  rtx addr = plus_constant (Pmode, XEXP (mem, 0), offset);
+
+  return adjust_automodify_address (mem, mode, addr, offset);
+}
+
+/* Copy using LDRD/STRD instructions whenever possible.
+   Returns true upon success. */
+bool
+gen_movmem_ldrd_strd (rtx *operands)
+{
+  unsigned HOST_WIDE_INT len;
+  HOST_WIDE_INT align;
+  rtx src, dst, base;
+  rtx reg0;
+  bool src_aligned, dst_aligned;
+  bool src_volatile, dst_volatile;
+
+  gcc_assert (CONST_INT_P (operands[2]));
+  gcc_assert (CONST_INT_P (operands[3]));
+
+  len = UINTVAL (operands[2]);
+  if (len  64)
+return false;
+
+  /* Maximum alignment we can assume for both src and dst buffers.  */
+  align = INTVAL (operands[3]);
+
+  if ((!unaligned_access)  (len = 4)  ((align  3) != 0))
+return false;
+
+  /* Place src and dst addresses in registers
+ and update the corresponding mem rtx.  */
+  dst = operands[0];
+  dst_volatile = MEM_VOLATILE_P (dst);
+  dst_aligned = MEM_ALIGN (dst) = BITS_PER_WORD;
+  base = copy_to_mode_reg (SImode, XEXP (dst, 0));
+  dst = adjust_automodify_address (dst, VOIDmode, base, 0);
+
+  src = operands[1];
+  src_volatile = MEM_VOLATILE_P (src);
+  src_aligned = MEM_ALIGN (src) = BITS_PER_WORD;
+  base = copy_to_mode_reg (SImode, XEXP (src, 0));
+  src = adjust_automodify_address (src, VOIDmode, base, 0);
+
+  if (!unaligned_access  !(src_aligned  dst_aligned))
+return false;
+
+  if (src_volatile || dst_volatile

[PATCH, ARM] Remove incscc and decscc patterns from thumb2.md

2013-04-26 Thread Greta Yorsh

This patch removes dead patterns for incscc and decscc from thumb2.md.

It's a cleanup after this patch:
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00955.html
which removed incscc and decscc expanders and the corresponding patterns
from arm.md, but not from thumb2.md.

No regression on qemu for arm-none-eabi cortex-a15 thumb.

Ok for trunk?

Thanks,
Greta

gcc/

2013-04-05  Greta Yorsh  greta.yo...@arm.com

* config/arm/thumb2.md (thumb2_incscc, thumb2_decscc): Delete.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 6aa76f6..968cc0c 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -24,32 +24,6 @@
 ;; changes made in armv5t as thumb2.  These are considered part
 ;; the 16-bit Thumb-1 instruction set.
 
-(define_insn *thumb2_incscc
-  [(set (match_operand:SI 0 s_register_operand =r,r)
-(plus:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand:CC 3 cc_register ) (const_int 0)])
- (match_operand:SI 1 s_register_operand 0,?r)))]
-  TARGET_THUMB2
-  @
-  it\\t%d2\;add%d2\\t%0, %1, #1
-  ite\\t%D2\;mov%D2\\t%0, %1\;add%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length 6,10)]
-)
-
-(define_insn *thumb2_decscc
-  [(set (match_operand:SI0 s_register_operand =r,r)
-(minus:SI (match_operand:SI  1 s_register_operand 0,?r)
- (match_operator:SI 2 arm_comparison_operator
-   [(match_operand   3 cc_register ) (const_int 0)])))]
-  TARGET_THUMB2
-  @
-   it\\t%d2\;sub%d2\\t%0, %1, #1
-   ite\\t%D2\;mov%D2\\t%0, %1\;sub%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length 6,10)]
-)
-
 ;; Thumb-2 only allows shift by constant on data processing instructions 
 (define_insn *thumb_andsi_not_shiftsi_si
   [(set (match_operand:SI 0 s_register_operand =r)

RE: [PATCH, ARM] Fix PR56797

2013-04-23 Thread Greta Yorsh

Ok to backport to gcc4.8?

I'm attaching an updated version - just fixed a spelling error in the
comment. 

Thanks,
Greta

gcc/ChangeLog

PR target/56797
* config/arm/arm.c (load_multiple_sequence): Require SP
as base register for loads if SP is in the register list.

 -Original Message-
 From: Richard Earnshaw
 Sent: 19 April 2013 12:34
 To: Greta Yorsh
 Cc: GCC Patches; raj.k...@gmail.com; Ramana Radhakrishnan
 Subject: Re: [PATCH, ARM] Fix PR56797
 
 On 19/04/13 10:34, Greta Yorsh wrote:
  Fix PR56797
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56797
 
  The problem is that peephole optimizer thinks it can generate an ldm,
 but
  the pattern for ldm no longer matches, because after r188738 it
 requires
  that if one of the destination registers is SP then the base register
 must
  be SP, and it's not SP in the test case.
 
  The test case fails on armv5t but doesn't fail on armv6t2 or armv7-a
 because
  peephole doesn't trigger there (because there is a different epilogue
  sequence). It looks like a latent problem for other architecture or
 CPUs.
 
  This patch adds this condition to the peephole optimizer.
 
  No regression on qemu for arm-none-eabi and fixes the test reported
 in the
  PR. I couldn't minimize the test sufficiently to include it in the
  testsuite.
 
  Ok for trunk?
 
  Thanks,
  Greta
 
  gcc/
 
  2013-04-18  Greta Yorsh  greta.yo...@arm.com
 
  PR target/56797
  * config/arm/arm.c (load_multiple_sequence): Require SP
   as base register for loads if SP is in the register list.
 
 
 OK.
 
 R.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d00849c..60fef78 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10347,6 +10347,13 @@ load_multiple_sequence (rtx *operands, int nops, int 
*regs, int *saved_order,
  || (i != nops - 1  unsorted_regs[i] == base_reg))
return 0;
 
+  /* Don't allow SP to be loaded unless it is also the base
+ register.  It guarantees that SP is reset correctly when
+ an LDM instruction is interrupted.  Otherwise, we might
+ end up with a corrupt stack.  */
+  if (unsorted_regs[i] == SP_REGNUM  base_reg != SP_REGNUM)
+return 0;
+
  unsorted_offsets[i] = INTVAL (offset);
  if (i == 0 || unsorted_offsets[i]  unsorted_offsets[order[0]])
order[0] = i;

[PATCH, ARM] Fix PR56797

2013-04-19 Thread Greta Yorsh

Fix PR56797
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56797

The problem is that peephole optimizer thinks it can generate an ldm, but
the pattern for ldm no longer matches, because after r188738 it requires
that if one of the destination registers is SP then the base register must
be SP, and it's not SP in the test case. 

The test case fails on armv5t but doesn't fail on armv6t2 or armv7-a because
peephole doesn't trigger there (because there is a different epilogue
sequence). It looks like a latent problem for other architecture or CPUs.

This patch adds this condition to the peephole optimizer.

No regression on qemu for arm-none-eabi and fixes the test reported in the
PR. I couldn't minimize the test sufficiently to include it in the
testsuite. 

Ok for trunk?

Thanks,
Greta

gcc/ 

2013-04-18  Greta Yorsh  greta.yo...@arm.com

PR target/56797
* config/arm/arm.c (load_multiple_sequence): Require SP
as base register for loads if SP is in the register list.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d00849c..60fef78 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10347,6 +10347,13 @@ load_multiple_sequence (rtx *operands, int nops, int 
*regs, int *saved_order,
  || (i != nops - 1  unsorted_regs[i] == base_reg))
return 0;
 
+  /* Don't allow SP to be loaded unless it is also the base
+ register.  It guarantees that SP is reset correctly when
+ an LDM instruction is interruptted.  Otherwise, we might
+ end up with a corrupt stack.  */
+  if (unsorted_regs[i] == SP_REGNUM  base_reg != SP_REGNUM)
+return 0;
+
  unsorted_offsets[i] = INTVAL (offset);
  if (i == 0 || unsorted_offsets[i]  unsorted_offsets[order[0]])
order[0] = i;

[PATCH, ARM] emit LDRD epilogue instead of a single LDM return

2013-04-17 Thread Greta Yorsh

Currently, epilogue is not generated in RTL for function that can return
using a single instruction. This patch enables RTL epilogue for such
functions on targets that can benefit from using a sequence of LDRD
instructions in epilogue instead of a single LDM instruction.

No regression on qemu arm-none-eabi with cortex-a15.

Ok for trunk?

Thanks,
Greta

gcc/

2012-10-19  Greta Yorsh  Greta.Yorsh at arm.com

* config/arm/arm.c (use_return_insn): Return 0 for targets that
can benefit from using a sequence of LDRD instructions in epilogue
instead of a single LDM instruction.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 866385c..bca92af 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2296,6 +2296,10 @@ use_return_insn (int iscond, rtx sibling)
   if (IS_INTERRUPT (func_type)  (frame_pointer_needed || TARGET_THUMB))
 return 0;
 
+  if (TARGET_LDRD  current_tune-prefer_ldrd_strd
+   !optimize_function_for_size_p (cfun))
+return 0;
+
   offsets = arm_get_frame_offsets ();
   stack_adjust = offsets-outgoing_args - offsets-saved_regs;

[PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-17 Thread Greta Yorsh

This patch converts define_insn into define_insn_and_split to split
some alternatives of movsicc_insn and some scc patterns that cannot be
expressed using movsicc. The patch emits cond_exec RTL insns.

Ok for trunk?

Thanks,
Greta

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (movsicc_insn): Convert define_insn into
define_insn_and_split.
(and_scc,ior_scc,negscc): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 83b36ca..c2e59ed 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -858,7 +858,7 @@
 
 ;; This is the canonicalization of addsi3_compare0_for_combiner when the
 ;; addend is a constant.
-(define_insn *cmpsi2_addneg
+(define_insn cmpsi2_addneg
   [(set (reg:CC CC_REGNUM)
(compare:CC
 (match_operand:SI 1 s_register_operand r,r)
@@ -1415,7 +1415,7 @@
(set_attr type  simple_alu_imm,*,*)]
 )
 
-(define_insn *subsi3_compare
+(define_insn subsi3_compare
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I)
(match_operand:SI 2 arm_rhs_operand I,r,r)))
@@ -8619,7 +8619,7 @@
(set_attr type f_selvfp_type)]
 )
 
-(define_insn *movsicc_insn
+(define_insn_and_split *movsicc_insn
   [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
(if_then_else:SI
 (match_operator 3 arm_comparison_operator
@@ -8632,10 +8632,45 @@
mvn%D3\\t%0, #%B2
mov%d3\\t%0, %1
mvn%d3\\t%0, #%B1
-   mov%d3\\t%0, %1\;mov%D3\\t%0, %2
-   mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
-   mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
-   mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   #
+   #
+   #
+   #
+   ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2
+   ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
+   ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
+   ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   reload_completed
+  [(const_int 0)]
+  {
+enum rtx_code rev_code;
+enum machine_mode mode;
+rtx rev_cond;
+
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  operands[3],
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[1])));
+
+rev_code = GET_CODE (operands[3]);
+mode = GET_MODE (operands[4]);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rev_code = reverse_condition_maybe_unordered (rev_code);
+else
+  rev_code = reverse_condition (rev_code);
+
+rev_cond = gen_rtx_fmt_ee (rev_code,
+   VOIDmode,
+   operands[4],
+   const0_rtx);
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  rev_cond,
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[2])));
+DONE;
+  }
   [(set_attr length 4,4,4,4,8,8,8,8)
(set_attr conds use)
(set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn)
@@ -9604,27 +9639,64 @@
(set_attr type alu_shift,alu_shift_reg)])
 
 
-(define_insn *and_scc
+(define_insn_and_split *and_scc
   [(set (match_operand:SI 0 s_register_operand =r)
(and:SI (match_operator:SI 1 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 2 s_register_operand r)))]
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand r)))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1
+  #   ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1
+   reload_completed
+  [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0)))
+   (cond_exec (match_dup 4) (set (match_dup 0)
+ (and:SI (match_dup 3) (const_int 1]
+  {
+enum machine_mode mode = GET_MODE (operands[2]);
+enum rtx_code rc = GET_CODE (operands[1]);
+
+/* Note that operands[4] is the same as operands[1],
+   but with VOIDmode as the result. */
+operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rc = reverse_condition_maybe_unordered (rc);
+else
+  rc = reverse_condition (rc);
+operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+  }
   [(set_attr conds use)
(set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *ior_scc
+(define_insn_and_split *ior_scc
   [(set (match_operand:SI 0 s_register_operand =r,r)
-   (ior:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 1 s_register_operand 0,?r)))]
+   (ior:SI (match_operator:SI 1 arm_comparison_operator
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand 0,?r)))]
   TARGET_ARM

[PATCH, ARM] Prologue/epilogue using STRD/LDRD in ARM mode

2013-04-15 Thread Greta Yorsh

Generate prologue/epilogue using STRD/LDRD in ARM mode, when tuning
prefer_ldrd_strd flag is set, such as in Cortex-A15.

The previous version of this patch was posted for review here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00995.html

The new version includes the following improvements:
(1) For prologue, it generates STRD whenever possible, otherwise it
generate single-word loads, instead of STM. This allows us to use
offset addressing with STRD, instead of writeback on every store used
in the previous version of this patch. Similarly, for epilogue. To
allow epilogue returns by loading directly into PC, a separate stack
update instruction is emitted before the final load into PC.
(2) The previous version of this patch causes an ICE in
arm_emit_strd_push, when gcc is called with -fno-omit-frame-pointer
-mapcs-frame command-line options. It is fixed in the attached patch,
where arm_emit_strd_push is not called when TARGET_APCS_FRAME holds
(epilogue already has a similar condition).
(3) The previous version of the patch generated incorrect return
sequences for interrupt function. This version fixes it by using the
original LDM epilogues for interrupt functions. No need to change the
tests gcc.target/arm/interrupt-*.c.
(4) Takes assert statements out of the loop, addressing a comment made
about a related patch, also relevant here.
(5) Improves dwarf info generation.

No regression on qemu for arm-none-eabi cortex-a15.

Bootstrap successful on A15 TC2.

Spec2k overall slight performance improvement (less than 1%) on Cortex-A15
TC2. 
Out of 26 benchmarks, 4 show regression of 2.5% or less (benchmarks
186,254,255,178). 
Other benchmarks show improvements or no change. 
Size increase overall by 1.4%. 
No clear correlation between performance and size increase.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2013-04-15  Greta Yorsh  Greta.Yorsh at arm.com

* config/arm/arm.c (emit_multi_reg_push): New declaration
for an existing function.
(arm_emit_strd_push): New function.
(arm_expand_prologue): Used here.
(arm_emit_ldrd_pop): New function.
(arm_expand_epilogue): Used here.
(arm_get_frame_offsets): Update condition.
(arm_emit_multi_reg_pop): Add a special case for load of a single
register with writeback.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 982487e..833d092 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -173,6 +173,7 @@ static rtx arm_expand_builtin (tree, rtx, rtx, enum 
machine_mode, int);
 static tree arm_builtin_decl (unsigned, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx emit_set_insn (rtx, rtx);
+static rtx emit_multi_reg_push (unsigned long);
 static int arm_arg_partial_bytes (cumulative_args_t, enum machine_mode,
  tree, bool);
 static rtx arm_function_arg (cumulative_args_t, enum machine_mode,
@@ -16690,6 +16691,148 @@ thumb2_emit_strd_push (unsigned long saved_regs_mask)
   return;
 }
 
+/* STRD in ARM mode requires consecutive registers.  This function emits STRD
+   whenever possible, otherwise it emits single-word stores.  The first store
+   also allocates stack space for all saved registers, using writeback with
+   post-addressing mode.  All other stores use offset addressing.  If no STRD
+   can be emitted, this function emits a sequence of single-word stores,
+   and not an STM as before, because single-word stores provide more freedom
+   scheduling and can be turned into an STM by peephole optimizations.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j, dwarf_index  = 0;
+  int offset = 0;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, mem;
+
+  /* TODO: A more efficient code can be emitted by changing the
+ layout, e.g., first push all pairs that can use STRD to keep the
+ stack aligned, and then push all other registers.  */
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  gcc_assert (!(saved_regs_mask  (1  SP_REGNUM)));
+  gcc_assert (!(saved_regs_mask  (1  PC_REGNUM)));
+  gcc_assert (num_regs  0);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* For dwarf info, we generate explicit stack update.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  XVECEXP (dwarf, 0, dwarf_index++) = tmp;
+
+  /* Save registers.  */
+  offset = - 4 * num_regs;
+  j = 0;
+  while (j = LAST_ARM_REGNUM)
+if (saved_regs_mask  (1  j))
+  {
+if ((j % 2 == 0)
+ (saved_regs_mask  (1  (j + 1
+  {
+/* Current register and previous register form register pair for
+   which STRD can be generated.  */
+if (offset  0

[PATCH, ARM][9/n] Split scc patterns using movsicc

2013-04-12 Thread Greta Yorsh

This patch converts define_insn into define_insn_and_split for simple scc
patterns and emits RTL insns that match movsicc pattern.
 
Tested as part of the series for splitting arm.md patterns that output
multiple asm instructions. No regression on qemu with arm-none-eabi and
bootstrap successful.

Ok for trunk?

Thanks,
Greta

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (mov_scc,mov_negscc,mov_notscc): Convert
define_insn into define_insn_and_split and emit movsicc
patterns.commit f678aaf7cdab589f34b1bf92b3f9fcabd7f29593
Author: Greta greta.yo...@arm.com
Date:   Thu Apr 11 10:54:27 2013 +0100

9

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 073ee6b..4284535 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8259,36 +8259,56 @@
operands[3] = const0_rtx;
 )
 
-(define_insn *mov_scc
+(define_insn_and_split *mov_scc
   [(set (match_operand:SI 0 s_register_operand =r)
(match_operator:SI 1 arm_comparison_operator
 [(match_operand 2 cc_register ) (const_int 0)]))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;mov%d1\\t%0, #1
+  #   ; mov%D1\\t%0, #0\;mov%d1\\t%0, #1
+  TARGET_ARM
+  [(set (match_dup 0)
+(if_then_else:SI (match_dup 1)
+ (const_int 1)
+ (const_int 0)))]
+  
   [(set_attr conds use)
-   (set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *mov_negscc
+(define_insn_and_split *mov_negscc
   [(set (match_operand:SI 0 s_register_operand =r)
(neg:SI (match_operator:SI 1 arm_comparison_operator
 [(match_operand 2 cc_register ) (const_int 0)])))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;mvn%d1\\t%0, #0
+  #   ; mov%D1\\t%0, #0\;mvn%d1\\t%0, #0
+  TARGET_ARM
+  [(set (match_dup 0)
+(if_then_else:SI (match_dup 1)
+ (match_dup 3)
+ (const_int 0)))]
+  {
+operands[3] = GEN_INT (~0);
+  }
   [(set_attr conds use)
-   (set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *mov_notscc
+(define_insn_and_split *mov_notscc
   [(set (match_operand:SI 0 s_register_operand =r)
(not:SI (match_operator:SI 1 arm_comparison_operator
 [(match_operand 2 cc_register ) (const_int 0)])))]
   TARGET_ARM
-  mvn%D1\\t%0, #0\;mvn%d1\\t%0, #1
+  #   ; mvn%D1\\t%0, #0\;mvn%d1\\t%0, #1
+  TARGET_ARM
+  [(set (match_dup 0)
+(if_then_else:SI (match_dup 1)
+ (match_dup 3)
+ (match_dup 4)))]
+  {
+operands[3] = GEN_INT (~1);
+operands[4] = GEN_INT (~0);
+  }
   [(set_attr conds use)
-   (set_attr insn mov)
(set_attr length 8)]
 )

[PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-12 Thread Greta Yorsh

This patch converts define_insn into define_insn_and_split to split
some alternatives of movsicc_insn and some scc patterns that cannot be
expressed using movsicc. The patch emits cond_exec RTL insns.

Tested as part of the series for splitting arm.md patterns that output
multiple asm instructions.

Ok for trunk?

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (movsicc_insn): Convert define_insn into
define_insn_and_split.
(and_scc,ior_scc,negscc): Likewise.
(cmpsi2_addneg, subsi3_compare): Convert to named patterns.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 701e465..d190a17 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -858,7 +858,7 @@
 
 ;; This is the canonicalization of addsi3_compare0_for_combiner when the
 ;; addend is a constant.
-(define_insn *cmpsi2_addneg
+(define_insn cmpsi2_addneg
   [(set (reg:CC CC_REGNUM)
(compare:CC
 (match_operand:SI 1 s_register_operand r,r)
@@ -1415,7 +1415,7 @@
(set_attr type  simple_alu_imm,*,*)]
 )
 
-(define_insn *subsi3_compare
+(define_insn subsi3_compare
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I)
(match_operand:SI 2 arm_rhs_operand I,r,r)))
@@ -8620,7 +8620,7 @@
(set_attr type f_selvfp_type)]
 )
 
-(define_insn *movsicc_insn
+(define_insn_and_split *movsicc_insn
   [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
(if_then_else:SI
 (match_operator 3 arm_comparison_operator
@@ -8633,10 +8633,45 @@
mvn%D3\\t%0, #%B2
mov%d3\\t%0, %1
mvn%d3\\t%0, #%B1
-   mov%d3\\t%0, %1\;mov%D3\\t%0, %2
-   mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
-   mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
-   mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   #
+   #
+   #
+   #
+   ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2
+   ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
+   ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
+   ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   reload_completed
+  [(const_int 0)]
+  {
+enum rtx_code rev_code;
+enum machine_mode mode;
+rtx rev_cond;
+
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  operands[3],
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[1])));
+
+rev_code = GET_CODE (operands[3]);
+mode = SELECT_CC_MODE (rev_code, operands[4], const0_rtx);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rev_code = reverse_condition_maybe_unordered (rev_code);
+else
+  rev_code = reverse_condition (rev_code);
+
+rev_cond = gen_rtx_fmt_ee (rev_code,
+   VOIDmode,
+   operands[4],
+   const0_rtx);
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  rev_cond,
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[2])));
+DONE;
+  }
   [(set_attr length 4,4,4,4,8,8,8,8)
(set_attr conds use)
(set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn)
@@ -9605,27 +9640,64 @@
(set_attr type alu_shift,alu_shift_reg)])
 
 
-(define_insn *and_scc
+(define_insn_and_split *and_scc
   [(set (match_operand:SI 0 s_register_operand =r)
(and:SI (match_operator:SI 1 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 2 s_register_operand r)))]
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand r)))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1
+  #   ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1
+   reload_completed
+  [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0)))
+   (cond_exec (match_dup 4) (set (match_dup 0)
+ (and:SI (match_dup 3) (const_int 1]
+  {
+enum machine_mode mode = SELECT_CC_MODE (GET_CODE (operands[1]),
+   operands[2],
+   const0_rtx);
+enum rtx_code rc = GET_CODE (operands[1]);
+
+operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rc = reverse_condition_maybe_unordered (rc);
+else
+  rc = reverse_condition (rc);
+operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+  }
   [(set_attr conds use)
(set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *ior_scc
+(define_insn_and_split *ior_scc
   [(set (match_operand:SI 0 s_register_operand =r,r)
-   (ior:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 1

[COMMITTED][PATCH,ARM] Cleanup uninitialized variable

2013-04-12 Thread Greta Yorsh

Cleanup to remove warning about uninitialized variable base when compiling
arm.c.

Approved offline by Richard Earnshaw. Committed r197921.

gcc/

2013-04-12  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (gen_operands_ldrd_strd): Initialize base.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index af95ac1..982487e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12694,7 +12694,7 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
 {
   int nops = 2;
   HOST_WIDE_INT offsets[2], offset;
-  rtx base;
+  rtx base = NULL_RTX;
   rtx cur_base, cur_offset, tmp;
   int i, gap;
   HARD_REG_SET regset;

RE: [PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-12 Thread Greta Yorsh

Sorry, I've just realized that there is a possible issue with the way
SELECT_CC_MODE is used in a few places in this patch. Working on a fix.

Thanks,
Greta

 -Original Message-
 From: Richard Earnshaw
 Sent: 12 April 2013 14:07
 To: Greta Yorsh
 Cc: GCC Patches; Ramana Radhakrishnan; p...@codesourcery.com;
 ni...@redhat.com
 Subject: Re: [PATCH, ARM][10/n] Split scc patterns using cond_exec

 On 12/04/13 12:04, Greta Yorsh wrote:
  This patch converts define_insn into define_insn_and_split to split
  some alternatives of movsicc_insn and some scc patterns that cannot
 be
  expressed using movsicc. The patch emits cond_exec RTL insns.

  Tested as part of the series for splitting arm.md patterns that
 output
  multiple asm instructions.

  Ok for trunk?

  gcc/

  2013-02-19  Greta Yorsh  greta.yo...@arm.com

   * config/arm/arm.md (movsicc_insn): Convert define_insn into
   define_insn_and_split.
   (and_scc,ior_scc,negscc): Likewise.
   (cmpsi2_addneg, subsi3_compare): Convert to named patterns.

 OK.

 R.

[PING][PATCH,ARM] Peephole individual LDR/STD into LDRD/STRD

2013-03-11 Thread Greta Yorsh

PING:   
http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00604.html

Thanks,
Greta


 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 13 February 2013 13:36
 To: GCC Patches
 Cc: Ramana Radhakrishnan; Richard Earnshaw; 'p...@codesourcery.com';
 'ni...@redhat.com'
 Subject: [PATCH,ARM] Peephole individual LDR/STD into LDRD/STRD
 
 This patch defines peephole2 patterns that merge two individual LDR
 instructions into LDRD instruction (resp. STR into STRD) whenever
 possible using the following transformations:
 * reorder two memory accesses,
 * rename registers when storing two constants, and
 * reorder target registers of a load when they are used by a
 commutative operation.
 
 In ARM mode only, the pair of registers IP and SP is allowed as
 operands in LDRD/STRD. To handle it, this patch defines a new
 constraint q to be CORE_REGS in ARM mode and GENERAL_REGS (i.e.,
 equivalent to r) otherwise. Note that in ARM mode q is not
 equivalent to rk because of the way constraints are matched. The new
 constraint q is used in place of r for DImode move between register
 and memory.
 
 This is a new version of the patch posted for review a long time ago:
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00914.html
 All the dependencies mentioned in the previous patch have already been
 upstreamed.
 Compared to the previous version, the new patch
 * handles both ARM and Thumb modes in the same peephole pattern,
 * does not attempt to generate LDRD/STRD when optimizing for size and
 non of the LDM/STM patterns match (but it would be easy to add),
 * does not include scan-assembly tests specific for cortex-a15 and
 cortex-a9, because they are not stable and highly sensitive to other
 optimizations.
 
 No regression on qemu for arm-none-eabi with cpu cortex-a15.
 
 Bootstrap successful on Cortex-A15 TC2.
 
 Spec2k results:
 Performance: slight improvement in overall scores (less than 1%) in
 both CINT2000 and CFP2000.
 For individual benchmarks, there is a slight variation in performance,
 within less than 1%, which I consider to be just noise.
 Object size: there is a slight reduction in size in all the benchmarks
 - overall 0.2% and at most 0.5% for individual benchmarks.
 Baseline compiler is gcc r194473 from December 2012.
 Compiled in thumb mode with hardfp.
 Run on Cortex-A15 hardware.
 
 Ok for gcc4.9 stage 1?
 
 Thanks,
 Greta
 
 gcc/
 
 2013-02-13  Greta Yorsh  greta.yo...@arm.com
 
 * config/arm/constraints.md (q): New constraint.
 * config/arm/ldrdstrd.md: New file.
 * config/arm/arm.md (ldrdstrd.md) New include.
 (arm_movdi): Use q instead of r constraint
 for double-word memory access.
 (movdf_soft_insn): Likewise.
 * config/arm/vfp.md (movdi_vfp): Likewise.
 * config/arm/t-arm (MD_INCLUDES): Add ldrdstrd.md.
 * config/arm/arm-protos.h (gen_operands_ldrd_strd): New
 declaration.
 * config/arm/arm.c (gen_operands_ldrd_strd): New function.
 (mem_ok_for_ldrd_strd): Likewise.
 (output_move_double): Update assertion.
 
 gcc/testsuite
 
 2013-02-13  Greta Yorsh  greta.yo...@arm.com
 
 * gcc.target/arm/peep-ldrd-1.c: New test.
 * gcc.target/arm/peep-strd-1.c: Likewise.

[PATCH,ARM] Improve extendsidi without neon

2013-02-22 Thread Greta Yorsh

This patch improves code generated for extension from SI to DI mode for core
registers when neon is not enabled.

Currently, if neon is enabled, extendsidi for core registers benefits from
the patch described here (r194558 from 17 Dec 2012):
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00984.html

Before r194558, the compiler used to split extendsidi before register
allocation (at split1 pass), which resulted in some cases in an extra
register being used and an extra register move. After r194558, if neon is
enabled, extendsidi is split after reload, generating better code.
Unfortunately, when neon is not enabled, the splitter is still triggered
before reload.

For example, 

$ cat negdi-1.c

signed long long extendsidi_negsi (signed int x)
{
  return -x;
}

$ arm-none-eabi-gcc negdi-1.c -S -O2 -o- -mfpu=vfpv4

extendsidi_negsi:
rsb r3, r0, #0  @ 6 *arm_negsi2 [length = 4]
mov r0, r3  @ 19*arm_movsi_vfp/1[length = 4]
mov r1, r3, asr #31 @ 20*arm_shiftsi3   [length = 4]
bx  lr  @ 25*arm_return [length = 12]

$ arm-none-eabi-gcc negdi-1.c -S -O2 -o- -mfpu=neon

extendsidi_negsi:
rsb r0, r0, #0  @ 6 *arm_negsi2 [length = 4]
mov r1, r0, asr #31 @ 20*arm_shiftsi3   [length = 4]
bx  lr  @ 23*arm_return [length = 12]

This patch changes the condition of splitters for extendsidi to trigger only
after reload.

In addition, this patch fixes a bug in zero_extendmodedi2 and
extendmodedi2 patterns. One of the alternatives in these patterns has a
constraint 'w' which is not valid when neon is disabled, but the patterns
don't have attributes to guard these alternatives by neon availability. This
might cause an ICE when neon is not available. Currently, it seems that the
patterns are only matched by RTL insns that are generated by splitters with
conditions on neon being enabled. However, it may be a latent bug. In any
case, the change in conditions of these splitters made by this patch
triggers the bug and causes an ICE. 

No regression on qemu for arm-none-eabi cortex-a15 arm/thumb neon/vfpv4
softfp/soft.

Bootstrap successful on Cortex-A15.

I haven't added a test case because tests that scan assembly for 'mov' are
very unstable.

Ok for trunk?
Thanks,
Greta

gcc/

2013-02-21  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (split for extendsidi): Update condition.
(zero_extendmodedi2,extendmodedi2): Add an alternative.
* config/arm/iterators.md (qhs_extenddi_cstr): Likewise.
(qhs_zextenddi_cstr): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
1cb1515b1fa57c6052b68eb8701616c1b80e7416..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4491,36 +4491,35 @@ (define_expand truncdfhf2
 ;; Zero and sign extension instructions.
 
 (define_insn zero_extendmodedi2
-  [(set (match_operand:DI 0 s_register_operand =w,r,?r)
+  [(set (match_operand:DI 0 s_register_operand =w,r,?r,w)
 (zero_extend:DI (match_operand:QHSI 1 qhs_zextenddi_op
qhs_zextenddi_cstr)))]
   TARGET_32BIT qhs_zextenddi_cond
   #
-  [(set_attr length 8,4,8)
+  [(set_attr length 8,4,8,8)
+   (set_attr arch neon_nota8,*,*,neon_onlya8)
(set_attr ce_count 2)
(set_attr predicable yes)]
 )
 
 (define_insn extendmodedi2
-  [(set (match_operand:DI 0 s_register_operand =w,r,?r,?r)
+  [(set (match_operand:DI 0 s_register_operand =w,r,?r,?r,w)
 (sign_extend:DI (match_operand:QHSI 1 qhs_extenddi_op
qhs_extenddi_cstr)))]
   TARGET_32BIT qhs_sextenddi_cond
   #
-  [(set_attr length 8,4,8,8)
+  [(set_attr length 8,4,8,8,8)
(set_attr ce_count 2)
(set_attr shift 1)
(set_attr predicable yes)
-   (set_attr arch *,*,a,t)]
+   (set_attr arch neon_nota8,*,a,t,neon_onlya8)]
 )
 
 ;; Splits for all extensions to DImode
 (define_split
   [(set (match_operand:DI 0 s_register_operand )
 (zero_extend:DI (match_operand 1 nonimmediate_operand )))]
-  TARGET_32BIT  (!TARGET_NEON
-   || (reload_completed
-!(IS_VFP_REGNUM (REGNO (operands[0])
+  TARGET_32BIT  reload_completed  !IS_VFP_REGNUM (REGNO (operands[0]))
   [(set (match_dup 0) (match_dup 1))]
 {
   rtx lo_part = gen_lowpart (SImode, operands[0]);
@@ -4546,9 +4545,7 @@ (define_split
 (define_split
   [(set (match_operand:DI 0 s_register_operand )
 (sign_extend:DI (match_operand 1 nonimmediate_operand )))]
-  TARGET_32BIT  (!TARGET_NEON
-   || (reload_completed
-!(IS_VFP_REGNUM (REGNO (operands[0])
+  TARGET_32BIT  reload_completed  !IS_VFP_REGNUM (REGNO (operands[0]))
   [(set (match_dup 0) (ashiftrt:SI (match_dup 1) (const_int 31)))]
 {
   rtx lo_part = gen_lowpart (SImode, operands[0]);
diff --git a/gcc/config/arm/iterators.md b/gcc/config

[PATCH, ARM][0/n] Split patterns that output multiple assembly instruction

2013-02-18 Thread Greta Yorsh

This sequence of patches aims at cleaning up patterns that output multiple
assembly instructions.

The first few patches handle some of the patterns in arm.md. 

[1/n] Add new patterns for subtract with carry.
[2/n] Split subdi patterns.
[3/n] Split patterns andsi_iorsi3_notsi, abs, cmpdi, and negdi.
[4/n] Add negdi_extend patterns.
[5/n] Split shiftdi patterns and add rrx pattern.
[6/n] Split min and max patterns.
[7/n] Add a comment on splitting Thumb1 patterns.

No regression on qemu for arm-none-eabi cortex-a15 arm/thumb.

Bootstrap successful on Cortex-A15.

Ok for gcc 4.9 stage 1?

Thanks,
Greta

[PATCH,ARM][2/n] Split subdi patterns

2013-02-18 Thread Greta Yorsh

Convert define_insn into define_insn_and_split for various subdi patterns
that output multiple assembly instructions.

2013-02-14  Greta Yorsh  greta.yo...@arm.com

  * config/arm/arm.md (arm_subdi3): Convert define_insn into
  define_insn_and_split.
  (subdi_di_zesidi,subdi_di_sesidi): Likewise.
  (subdi_zesidi_di,subdi_sesidi_di,subdi_zesidi_zesidi): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
c708af4d78df9a92ac1c441138b57f6f18178607..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1149,13 +1149,27 @@ (define_expand subdi3
   
 )
 
-(define_insn *arm_subdi3
+(define_insn_and_split *arm_subdi3
   [(set (match_operand:DI   0 s_register_operand =r,r,r)
(minus:DI (match_operand:DI 1 s_register_operand 0,r,0)
  (match_operand:DI 2 s_register_operand r,0,0)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT  !TARGET_NEON
-  subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2
+  #  ; subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
+   (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
+  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[5] = gen_highpart (SImode, operands[2]);
+operands[2] = gen_lowpart (SImode, operands[2]);
+   }
   [(set_attr conds clob)
(set_attr length 8)]
 )
@@ -1170,55 +1184,113 @@ (define_insn *thumb_subdi3
   [(set_attr length 4)]
 )
 
-(define_insn *subdi_di_zesidi
+(define_insn_and_split *subdi_di_zesidi
   [(set (match_operand:DI   0 s_register_operand =r,r)
(minus:DI (match_operand:DI 1 s_register_operand  0,r)
  (zero_extend:DI
   (match_operand:SI 2 s_register_operand  r,r
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT
-  subs\\t%Q0, %Q1, %2\;sbc\\t%R0, %R1, #0
+  #   ; subs\\t%Q0, %Q1, %2\;sbc\\t%R0, %R1, #0
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
+   (set (match_dup 3) (minus:SI (plus:SI (match_dup 4) (match_dup 5))
+(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+operands[5] = GEN_INT (~0);
+   }
   [(set_attr conds clob)
(set_attr length 8)]
 )
 
-(define_insn *subdi_di_sesidi
+(define_insn_and_split *subdi_di_sesidi
   [(set (match_operand:DI0 s_register_operand =r,r)
(minus:DI (match_operand:DI  1 s_register_operand  0,r)
  (sign_extend:DI
   (match_operand:SI 2 s_register_operand  r,r
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT
-  subs\\t%Q0, %Q1, %2\;sbc\\t%R0, %R1, %2, asr #31
+  #   ; subs\\t%Q0, %Q1, %2\;sbc\\t%R0, %R1, %2, asr #31
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
+   (set (match_dup 3) (minus:SI (minus:SI (match_dup 4)
+ (ashiftrt:SI (match_dup 2)
+  (const_int 31)))
+(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[3] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[4] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr conds clob)
(set_attr length 8)]
 )
 
-(define_insn *subdi_zesidi_di
+(define_insn_and_split *subdi_zesidi_di
   [(set (match_operand:DI0 s_register_operand =r,r)
(minus:DI (zero_extend:DI
   (match_operand:SI 2 s_register_operand  r,r))
  (match_operand:DI  1 s_register_operand 0,r)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  rsbs\\t%Q0, %Q1, %2\;rsc\\t%R0, %R1, #0
+  #   ; rsbs\\t%Q0, %Q1, %2\;rsc\\t%R0, %R1, #0
+; is equivalent to:
+; subs\\t%Q0, %2, %Q1\;rsc\\t%R0, %R1, #0
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (match_dup 2) (match_dup 1)))
+ (set (match_dup 0) (minus:SI (match_dup 2) (match_dup 1)))])
+   (set (match_dup 3) (minus:SI (minus:SI (const_int

[PATCH,ARM][3/n] Split various patterns

2013-02-18 Thread Greta Yorsh

Convert define_insn into define_insn_and_split for various patterns that
output multiple assembly instructions.

It appears that preparation statements in define_insn_and_split sometimes
are called with which_alternative set to -1 even after reload. Therefore,
preparation statements use conditions on the operands instead of
which_alternative.

gcc/

2013-02-14  Greta Yorsh  greta.yo...@arm.com

  * config/arm/arm.md (andsi_iorsi3_notsi): Convert define_insn into
  define_insn_and_split.
  (arm_negdi2,arm_abssi2,arm_neg_abssi2): Likewise.
  (arm_cmpdi_insn,arm_cmpdi_unsigned): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
282d460f928f9f1a58230a4f3f3e8960e3357c1a..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3276,13 +3276,17 @@ (define_split
   
 )
 
-(define_insn *andsi_iorsi3_notsi
+(define_insn_and_split *andsi_iorsi3_notsi
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(and:SI (ior:SI (match_operand:SI 1 s_register_operand %0,r,r)
(match_operand:SI 2 arm_rhs_operand rI,0,rI))
(not:SI (match_operand:SI 3 arm_rhs_operand rI,rI,rI]
   TARGET_32BIT
-  orr%?\\t%0, %1, %2\;bic%?\\t%0, %0, %3
+  #   ; orr%?\\t%0, %1, %2\;bic%?\\t%0, %0, %3
+   reload_completed
+  [(set (match_dup 0) (ior:SI (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (and:SI (not:SI (match_dup 3)) (match_dup 0)))]
+  
   [(set_attr length 8)
(set_attr ce_count 2)
(set_attr predicable yes)]
@@ -4350,12 +4354,24 @@ (define_expand negdi2
 
 ;; The constraints here are to prevent a *partial* overlap (where %Q0 == %R1).
 ;; The first alternative allows the common case of a *full* overlap.
-(define_insn *arm_negdi2
+(define_insn_and_split *arm_negdi2
   [(set (match_operand:DI 0 s_register_operand =r,r)
(neg:DI (match_operand:DI 1 s_register_operand  0,r)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  rsbs\\t%Q0, %Q1, #0\;rsc\\t%R0, %R1, #0
+  #   ; rsbs\\t%Q0, %Q1, #0\;rsc\\t%R0, %R1, #0
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (const_int 0) (match_dup 1)))
+ (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))])
+   (set (match_dup 2) (minus:SI (minus:SI (const_int 0) (match_dup 3))
+(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[2] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[3] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr conds clob)
(set_attr length 8)]
 )
@@ -4425,14 +4441,67 @@ (define_expand abssi2
 operands[2] = gen_rtx_REG (CCmode, CC_REGNUM);
 )
 
-(define_insn *arm_abssi2
+(define_insn_and_split *arm_abssi2
   [(set (match_operand:SI 0 s_register_operand =r,r)
(abs:SI (match_operand:SI 1 s_register_operand 0,r)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  @
-   cmp\\t%0, #0\;rsblt\\t%0, %0, #0
-   eor%?\\t%0, %1, %1, asr #31\;sub%?\\t%0, %0, %1, asr #31
+  #
+   reload_completed
+  [(const_int 0)]
+  {
+   /* if (which_alternative == 0) */
+   if (REGNO(operands[0]) == REGNO(operands[1]))
+ {
+  /* Emit the pattern:
+ cmp\\t%0, #0\;rsblt\\t%0, %0, #0
+ [(set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 0) (const_int 0)))
+  (cond_exec (lt:CC (reg:CC CC_REGNUM) (const_int 0))
+ (set (match_dup 0) (minus:SI (const_int 0) (match_dup 
1]
+  */
+  emit_insn (gen_rtx_SET (VOIDmode,
+  gen_rtx_REG (CCmode, CC_REGNUM),
+  gen_rtx_COMPARE (CCmode, operands[0], 
const0_rtx)));
+  emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+(gen_rtx_LT (SImode,
+ gen_rtx_REG (CCmode, 
CC_REGNUM),
+ const0_rtx)),
+(gen_rtx_SET (VOIDmode,
+  operands[0],
+  (gen_rtx_MINUS (SImode,
+  const0_rtx,
+  
operands[1]));
+  DONE;
+ }
+   else
+ {
+  /* Emit the pattern:
+ alt1: eor%?\\t%0, %1, %1, asr #31\;sub%?\\t%0, %0, %1, asr #31
+ [(set (match_dup 0)
+   (xor:SI (match_dup 1)
+   (ashiftrt:SI (match_dup 1) (const_int 31
+  (set (match_dup 0)
+   (minus:SI (match_dup 0)
+  (ashiftrt:SI (match_dup 1) (const_int 31]
+  */
+  emit_insn (gen_rtx_SET (VOIDmode,
+  operands[0],
+  gen_rtx_XOR (SImode

[PATCH,ARM][4/n] Add negdi_extend patterns

2013-02-18 Thread Greta Yorsh

This patch adds patterns to handle negation of an extended 32-bit value more
efficiently.

For example,

(set (reg:DI r0) (neg:DI (sign_extend:DI (reg:SI r0)))

The compiler currently generates
mov r1, r0, asr #31
rsbsr0, r0, #0
rsc r1, r1, #0
and after the patch it generates:
  rsb r0, r0, #0
  mov r1, r0, asr #31

(set (reg:DI r0) (neg:DI (zero_extend:DI (reg:SI r0)))

The compiler currently generates
mov r1, #0
rsbsr0, r0, #0
rsc r1, r1, #0
and after the patch it generates:
  rsbsr0, r0, #0
  sbc r1, r1, r1

The following examples are not affected by the patch:

(set (reg:DI r0) (sign_extend:DI (neg:SI (reg:SI r0)))
  rsb   r0, r0, #0
  mov r1, r0, asr #31

(set (reg:DI r0) (zero_extend:DI (neg:SI (reg:SI r0)))
  rsb r0, r0, #0
  mov r1, #0

The patch also adds the appropriate test cases.

gcc/

2013-01-10  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (negdi_extendsidi): New pattern.
(negdi_zero_extendsidi): Likewise.

gcc/testsuite

2013-01-10  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/negdi-1.c: New test.
* gcc.target/arm/negdi-2.c: Likewise.
* gcc.target/arm/negdi-3.c: Likewise.
* gcc.target/arm/negdi-4.c: Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
de57f40c20ad89a71dc9b3b172b9d5666afde9f8..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4207,6 +4207,72 @@ (define_expand negdf2
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP_DOUBLE
   )
 
+;; Negate an extended 32-bit value.
+(define_insn_and_split *negdi_extendsidi
+  [(set (match_operand:DI 0 s_register_operand =r,r,l,l)
+   (neg:DI (sign_extend:DI (match_operand:SI 1 s_register_operand 
0,r,0,l
+   (clobber (reg:CC CC_REGNUM))]
+  TARGET_32BIT
+  # ; rsb\\t%Q0, %1, #0\;asr\\t%R0, %Q0, #31
+   reload_completed
+  [(const_int 0)]
+  {
+ operands[2] = gen_highpart (SImode, operands[0]);
+ operands[0] = gen_lowpart (SImode, operands[0]);
+ rtx tmp = gen_rtx_SET (VOIDmode,
+operands[0],
+gen_rtx_MINUS (SImode,
+   const0_rtx,
+   operands[1]));
+ if (TARGET_ARM)
+   {
+ emit_insn (tmp);
+   }
+ else
+   {
+ /* Set the flags, to emit the short encoding in Thumb2.  */
+ rtx flags = gen_rtx_SET (VOIDmode,
+  gen_rtx_REG (CCmode, CC_REGNUM),
+  gen_rtx_COMPARE (CCmode,
+   const0_rtx,
+   operands[1]));
+ emit_insn (gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (2,
+ flags,
+ tmp)));
+   }
+   emit_insn (gen_rtx_SET (VOIDmode,
+  operands[2],
+  gen_rtx_ASHIFTRT (SImode,
+operands[0],
+GEN_INT (31;
+  }
+  [(set_attr length 8,8,4,4)
+   (set_attr arch a,a,t2,t2)]
+)
+
+(define_insn_and_split *negdi_zero_extendsidi
+  [(set (match_operand:DI 0 s_register_operand =r,r)
+   (neg:DI (zero_extend:DI (match_operand:SI 1 s_register_operand 
0,r
+   (clobber (reg:CC CC_REGNUM))]
+  TARGET_32BIT
+  # ; rsbs\\t%Q0, %1, #0\;sbc\\t%R0,%R0,%R0
+  ;; Don't care what register is input to sbc,
+  ;; since we just just need to propagate the carry.
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+   (compare:CC (const_int 0) (match_dup 1)))
+  (set (match_dup 0) (minus:SI (const_int 0) (match_dup 1)))])
+   (set (match_dup 2) (minus:SI (minus:SI (match_dup 2) (match_dup 2))
+(ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[2] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+  }
+  [(set_attr conds clob)
+   (set_attr length 8)]   ;; length in thumb is 4
+)
+
 ;; abssi2 doesn't really clobber the condition codes if a different register
 ;; is being set.  To keep things simple, assume during rtl manipulations that
 ;; it does, but tell the final scan operator the truth.  Similarly for
diff --git a/gcc/testsuite/gcc.target/arm/negdi-1.c 
b/gcc/testsuite/gcc.target/arm/negdi-1.c
index ...7cd80ea3dc397f4c0eee688de5d6b49c685e869f 100644
--- a/gcc/testsuite/gcc.target/arm/negdi-1.c
+++ b/gcc/testsuite/gcc.target/arm/negdi-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options -O2 } */
+
+signed long long extendsidi_negsi (signed int x

[PATCH,ARM][5/n] Split shift di patterns

2013-02-18 Thread Greta Yorsh

Convert define_insn into define_insn_and_split for various DImode shift
operations that output multiple assembly instructions.

This patch also adds a new pattern for RRX using a new UNSPEC. This pattern
matches RTL insns emitted by arm_ashrdi3_1bit and arm_lshrdi3_1bit
splitters. This patch also adds a new pattern shiftsi3_compare.

gcc/

2013-02-14  Greta Yorsh  greta.yo...@arm.com

  * config/arm/arm.md (arm_ashldi3_1bit): Convert define_insn into
  define_insn_and_split.
(arm_ashrdi3_1bit,arm_lshrdi3_1bit): Likewise.
  (shiftsi3_compare): New pattern.
  (rrx): New pattern.
  * config/arm/unspecs.md (UNSPEC_RRX): New.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
cbd0faf636d264dd4e46db9c8a1fe226b431a97e..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3806,13 +3806,26 @@ (define_expand ashldi3
   
 )
 
-(define_insn arm_ashldi3_1bit
+(define_insn_and_split arm_ashldi3_1bit
   [(set (match_operand:DI0 s_register_operand =r,r)
 (ashift:DI (match_operand:DI 1 s_register_operand 0,r)
(const_int 1)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT
-  movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1
+  #   ; movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+  (compare:CC (ashift:SI (match_dup 1) (const_int 1))
+   (const_int 0)))
+ (set (match_dup 0) (ashift:SI (match_dup 1) (const_int 1)))])
+   (set (match_dup 2) (plus:SI (plus:SI (match_dup 3) (match_dup 3))
+  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
+  {
+operands[2] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[3] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr conds clob)
(set_attr length 8)]
 )
@@ -3888,18 +3901,43 @@ (define_expand ashrdi3
   
 )
 
-(define_insn arm_ashrdi3_1bit
+(define_insn_and_split arm_ashrdi3_1bit
   [(set (match_operand:DI  0 s_register_operand =r,r)
 (ashiftrt:DI (match_operand:DI 1 s_register_operand 0,r)
  (const_int 1)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT
-  movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx
+  #   ; movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+   (compare:CC (ashiftrt:SI (match_dup 3) (const_int 1))
+   (const_int 0)))
+  (set (match_dup 2) (ashiftrt:SI (match_dup 3) (const_int 1)))])
+   (set (match_dup 0) (unspec:SI [(match_dup 1)
+  (reg:CC_C CC_REGNUM)]
+ UNSPEC_RRX))]
+  {
+operands[2] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[3] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr conds clob)
-   (set_attr insn mov)
(set_attr length 8)]
 )
 
+(define_insn *rrx
+  [(set (match_operand:SI 0 s_register_operand =r)
+(unspec:SI [(match_operand:SI 1 s_register_operand r)
+(reg:CC_C CC_REGNUM)]
+   UNSPEC_RRX))]
+  TARGET_32BIT
+  mov\\t%0, %1, rrx
+  [(set_attr conds use)
+   (set_attr insn mov)
+   (set_attr type alu_shift)]
+)
+
 (define_expand ashrsi3
   [(set (match_operand:SI  0 s_register_operand )
(ashiftrt:SI (match_operand:SI 1 s_register_operand )
@@ -3968,15 +4006,28 @@ (define_expand lshrdi3
   
 )
 
-(define_insn arm_lshrdi3_1bit
+(define_insn_and_split arm_lshrdi3_1bit
   [(set (match_operand:DI  0 s_register_operand =r,r)
 (lshiftrt:DI (match_operand:DI 1 s_register_operand 0,r)
  (const_int 1)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_32BIT
-  movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx
+  #   ;  movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx
+   reload_completed
+  [(parallel [(set (reg:CC CC_REGNUM)
+   (compare:CC (lshiftrt:SI (match_dup 3) (const_int 1))
+   (const_int 0)))
+  (set (match_dup 2) (lshiftrt:SI (match_dup 3) (const_int 1)))])
+   (set (match_dup 0) (unspec:SI [(match_dup 1)
+  (reg:CC_C CC_REGNUM)]
+ UNSPEC_RRX))]
+  {
+operands[2] = gen_highpart (SImode, operands[0]);
+operands[0] = gen_lowpart (SImode, operands[0]);
+operands[3] = gen_highpart (SImode, operands[1]);
+operands[1] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr conds clob)
-   (set_attr insn mov)
(set_attr length 8)]
 )
 
@@ -4064,6 +4115,23 @@ (define_insn *arm_shiftsi3
  (const_string alu_shift_reg)))]
 )
 
+(define_insn *shiftsi3_compare
+  [(set (reg:CC CC_REGNUM

[PATCH,ARM][6/n] Split min and max patterns

2013-02-18 Thread Greta Yorsh

Convert define_insn into define_insn_and_split for various min and max
patterns that output multiple assembly instructions. Use movsicc to emit
RTL. A separate patch will split movsicc.

gcc/

2013-02-14  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (arm_smax_insn): Convert define_insn into
  define_insn_and_split.
  (arm_smin_insn,arm_umaxsi3,arm_uminsi3): Likewise.commit 068f9449536fca959fd687ac8b7e0bdae898f8bd
Author: Greta greta.yo...@arm.com
Date:   Fri Feb 15 14:41:48 2013 +

8-split-min-max.v2.patch

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7c04840..5f5af3c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3443,15 +3443,23 @@
   [(set_attr predicable yes)]
 )
 
-(define_insn *arm_smax_insn
+(define_insn_and_split *arm_smax_insn
   [(set (match_operand:SI  0 s_register_operand =r,r)
(smax:SI (match_operand:SI 1 s_register_operand  %0,?r)
 (match_operand:SI 2 arm_rhs_operandrI,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  @
-   cmp\\t%1, %2\;movlt\\t%0, %2
-   cmp\\t%1, %2\;movge\\t%0, %1\;movlt\\t%0, %2
+  #
+   ; cmp\\t%1, %2\;movlt\\t%0, %2
+   ; cmp\\t%1, %2\;movge\\t%0, %1\;movlt\\t%0, %2
+  TARGET_ARM
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (ge:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 8,12)]
 )
@@ -3483,15 +3491,23 @@
   [(set_attr predicable yes)]
 )
 
-(define_insn *arm_smin_insn
+(define_insn_and_split *arm_smin_insn
   [(set (match_operand:SI 0 s_register_operand =r,r)
(smin:SI (match_operand:SI 1 s_register_operand %0,?r)
 (match_operand:SI 2 arm_rhs_operand rI,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  @
-   cmp\\t%1, %2\;movge\\t%0, %2
-   cmp\\t%1, %2\;movlt\\t%0, %1\;movge\\t%0, %2
+  #
+; cmp\\t%1, %2\;movge\\t%0, %2
+; cmp\\t%1, %2\;movlt\\t%0, %1\;movge\\t%0, %2
+  TARGET_ARM
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (lt:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 8,12)]
 )
@@ -3506,16 +3522,24 @@
   
 )
 
-(define_insn *arm_umaxsi3
+(define_insn_and_split *arm_umaxsi3
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(umax:SI (match_operand:SI 1 s_register_operand 0,r,?r)
 (match_operand:SI 2 arm_rhs_operand rI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  @
-   cmp\\t%1, %2\;movcc\\t%0, %2
-   cmp\\t%1, %2\;movcs\\t%0, %1
-   cmp\\t%1, %2\;movcs\\t%0, %1\;movcc\\t%0, %2
+  #
+; cmp\\t%1, %2\;movcc\\t%0, %2
+; cmp\\t%1, %2\;movcs\\t%0, %1
+; cmp\\t%1, %2\;movcs\\t%0, %1\;movcc\\t%0, %2
+  TARGET_ARM
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (geu:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 8,8,12)]
 )
@@ -3530,16 +3554,24 @@
   
 )
 
-(define_insn *arm_uminsi3
+(define_insn_and_split *arm_uminsi3
   [(set (match_operand:SI 0 s_register_operand =r,r,r)
(umin:SI (match_operand:SI 1 s_register_operand 0,r,?r)
 (match_operand:SI 2 arm_rhs_operand rI,0,rI)))
(clobber (reg:CC CC_REGNUM))]
   TARGET_ARM
-  @
-   cmp\\t%1, %2\;movcs\\t%0, %2
-   cmp\\t%1, %2\;movcc\\t%0, %1
-   cmp\\t%1, %2\;movcc\\t%0, %1\;movcs\\t%0, %2
+  #
+   ; cmp\\t%1, %2\;movcs\\t%0, %2
+   ; cmp\\t%1, %2\;movcc\\t%0, %1
+   ; cmp\\t%1, %2\;movcc\\t%0, %1\;movcs\\t%0, %2
+  TARGET_ARM
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+(if_then_else:SI (ltu:SI (reg:CC CC_REGNUM) (const_int 0))
+ (match_dup 1)
+ (match_dup 2)))]
+  
   [(set_attr conds clob)
(set_attr length 8,8,12)]
 )

[PATCH, ARM][7/n] Comment on splitting THUMB1 patterns

2013-02-18 Thread Greta Yorsh

This patch adds a comment explaining why it is difficult to split Thumb1
patterns.

gcc/

2013-02-12  Greta Yorsh  greta.yo...@arm.com

   * config/arm/arm.md: Comment on splitting Thumb1 patterns.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 64888f9..ce98013 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -22,6 +22,25 @@
 
 ;;- See file rtl.def for documentation on define_insn, match_*, et. al.
 
+;; Beware of splitting Thumb1 patterns that output multiple
+;; assembly instructions, in particular instruction such as SBC and
+;; ADC which consume flags.  For example, in the pattern thumb_subdi3
+;; below, the output SUB implicitly sets the flags (assembled to SUBS)
+;; and then the Carry flag is used by SBC to compute the correct
+;; result.  If we split thumb_subdi3 pattern into two separate RTL
+;; insns (using define_insn_and_split), the scheduler might place
+;; other RTL insns between SUB and SBC, possibly modifying the Carry
+;; flag used by SBC.  This might happen because most Thumb1 patterns
+;; for flag-setting instructions do not have explicit RTL for setting
+;; or clobbering the flags.  Instead, they have the attribute conds
+;; with value set or clob.  However, this attribute is not used to
+;; identify dependencies and therefore the scheduler might reorder
+;; these instruction.  Currenly, this problem cannot happen because
+;; there are no separate Thumb1 patterns for individual instruction
+;; that consume flags (except conditional execution, which is treated
+;; differently).  In particular there is no Thumb1 armv6-m pattern for
+;; sbc or adc.
+
 
 ;;---
 ;; Constants

[PATCH,ARM] Set attribute predicable

2013-02-14 Thread Greta Yorsh

This patch sets attribute predicable to yes for patterns that handle add
with carry and already use %? in their output statements.

Ok for trunk?
Thanks,
Greta

gcc/

2013-02-14  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (addsi3_carryin_optab): Set attribute
predicable to yes.
(addsi3_carryin_alt2_optab,addsi3_carryin_shift_optab):
Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
1cb1515b1fa57c6052b68eb8701616c1b80e7416..35294dd6560ac63279d95eca6cf774257e06bd93
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -974,7 +974,8 @@ (define_insn *addsi3_carryin_optab
   @
adc%?\\t%0, %1, %2
sbc%?\\t%0, %1, #%B2
-  [(set_attr conds use)]
+  [(set_attr conds use)
+   (set_attr predicable yes)]
 )
 
 (define_insn *addsi3_carryin_alt2_optab
@@ -986,7 +987,8 @@ (define_insn *addsi3_carryin_alt2_opta
   @
adc%?\\t%0, %1, %2
sbc%?\\t%0, %1, #%B2
-  [(set_attr conds use)]
+  [(set_attr conds use)
+   (set_attr predicable yes)]
 )
 
 (define_insn *addsi3_carryin_shift_optab
@@ -1000,6 +1002,7 @@ (define_insn *addsi3_carryin_shift_opt
   TARGET_32BIT
   adc%?\\t%0, %1, %3%S2
   [(set_attr conds use)
+   (set_attr predicable yes)
(set (attr type) (if_then_else (match_operand 4 const_int_operand )
  (const_string alu_shift)
  (const_string alu_shift_reg)))]

[PATCH,ARM] Peephole individual LDR/STD into LDRD/STRD

2013-02-13 Thread Greta Yorsh

This patch defines peephole2 patterns that merge two individual LDR
instructions into LDRD instruction (resp. STR into STRD) whenever possible
using the following transformations:
* reorder two memory accesses,
* rename registers when storing two constants, and
* reorder target registers of a load when they are used by a commutative
operation.

In ARM mode only, the pair of registers IP and SP is allowed as operands in
LDRD/STRD. To handle it, this patch defines a new constraint q to be
CORE_REGS in ARM mode and GENERAL_REGS (i.e., equivalent to r) otherwise.
Note that in ARM mode q is not equivalent to rk because of the way
constraints are matched. The new constraint q is used in place of r for
DImode move between register and memory.

This is a new version of the patch posted for review a long time ago:
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00914.html
All the dependencies mentioned in the previous patch have already been
upstreamed.
Compared to the previous version, the new patch
* handles both ARM and Thumb modes in the same peephole pattern,
* does not attempt to generate LDRD/STRD when optimizing for size and non of
the LDM/STM patterns match (but it would be easy to add),
* does not include scan-assembly tests specific for cortex-a15 and
cortex-a9, because they are not stable and highly sensitive to other
optimizations.

No regression on qemu for arm-none-eabi with cpu cortex-a15.

Bootstrap successful on Cortex-A15 TC2.

Spec2k results:
Performance: slight improvement in overall scores (less than 1%) in both
CINT2000 and CFP2000. 
For individual benchmarks, there is a slight variation in performance,
within less than 1%, which I consider to be just noise.
Object size: there is a slight reduction in size in all the benchmarks -
overall 0.2% and at most 0.5% for individual benchmarks.
Baseline compiler is gcc r194473 from December 2012.
Compiled in thumb mode with hardfp.
Run on Cortex-A15 hardware.

Ok for gcc4.9 stage 1?

Thanks,
Greta

gcc/

2013-02-13  Greta Yorsh  greta.yo...@arm.com

* config/arm/constraints.md (q): New constraint.
* config/arm/ldrdstrd.md: New file.
* config/arm/arm.md (ldrdstrd.md) New include.
(arm_movdi): Use q instead of r constraint
for double-word memory access.
(movdf_soft_insn): Likewise.
* config/arm/vfp.md (movdi_vfp): Likewise.
* config/arm/t-arm (MD_INCLUDES): Add ldrdstrd.md.
* config/arm/arm-protos.h (gen_operands_ldrd_strd): New declaration.
* config/arm/arm.c (gen_operands_ldrd_strd): New function.
(mem_ok_for_ldrd_strd): Likewise.
(output_move_double): Update assertion.

gcc/testsuite

2013-02-13  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/peep-ldrd-1.c: New test.
* gcc.target/arm/peep-strd-1.c: Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..41199c1 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -118,6 +118,7 @@ extern rtx arm_gen_load_multiple (int *, int, rtx, int, 
rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
 extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
+extern bool gen_operands_ldrd_strd (rtx *, bool, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 84ce56f..423dddc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12288,6 +12288,277 @@ operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rn, 
HOST_WIDE_INT offset,
   return true;
 }
 
+/* Helper for gen_operands_ldrd_strd.  Returns true iff the memory
+   operand ADDR is an immediate offset from the base register and is
+   not volatile, in which case it sets BASE and OFFSET
+   accordingly.  */
+bool
+mem_ok_for_ldrd_strd (rtx addr, rtx *base, rtx *offset)
+{
+  /* TODO: Handle more general memory operand patterns, such as
+ PRE_DEC and PRE_INC.  */
+
+  /* Convert a subreg of mem into mem itself.  */
+  if (GET_CODE (addr) == SUBREG)
+addr = alter_subreg (addr, true);
+
+  gcc_assert (MEM_P (addr));
+
+  /* Don't modify volatile memory accesses.  */
+  if (MEM_VOLATILE_P (addr))
+return false;
+
+  *offset = const0_rtx;
+
+  addr = XEXP (addr, 0);
+  if (REG_P (addr))
+{
+  *base = addr;
+  return true;
+}
+  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == MINUS)
+{
+  *base = XEXP (addr, 0);
+  *offset = XEXP (addr, 1);
+  return (REG_P (*base)  CONST_INT_P (*offset));
+}
+
+  return false;
+}
+
+#define SWAP_RTX(x,y) do { rtx tmp = x; x = y; y = tmp; } while (0)
+
+/* Called from a peephole2 to replace two word-size accesses with a
+   single LDRD/STRD instruction.  Returns true iff we can generate

[Patch] Cleanup gcc.target/arm/interrupt-*.c for thumb mode

2013-02-13 Thread Greta Yorsh

The tests gcc.target/arm/interrupt-*.c are for ARM mode only. 
This patch uses effective target arm_notthumb instead of __thumb_ predefine,
removes unreachable code, and fixes typos.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/testsuite/

2012-02-13  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/interrupt-1.c: Fix for thumb mode.
* gcc.target/arm/interrupt-2.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/arm/interrupt-1.c 
b/gcc/testsuite/gcc.target/arm/interrupt-1.c
index 18379de..a384242 100644
--- a/gcc/testsuite/gcc.target/arm/interrupt-1.c
+++ b/gcc/testsuite/gcc.target/arm/interrupt-1.c
@@ -1,10 +1,10 @@
 /* Verify that prologue and epilogue are correct for functions with
__attribute__ ((interrupt)).  */
 /* { dg-do compile } */
-/* { dg-options -O0 } */
+/* { dg-require-effective-target arm_nothumb } */
+/* { dg-options -O0 -marm } */
 
-/* This test is not valid when -mthumb.  We just cheat.  */
-#ifndef __thumb__
+/* This test is not valid when -mthumb.  */
 extern void bar (int);
 extern void foo (void) __attribute__ ((interrupt(IRQ)));
 
@@ -12,12 +12,6 @@ void foo ()
 {
   bar (0);
 }
-#else
-void foo ()
-{
-  asm (stmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, lr});
-  asm (ldmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, pc}^);
-}
-#endif
+
 /* { dg-final { scan-assembler stmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, lr} 
} } */
 /* { dg-final { scan-assembler ldmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, 
pc}\\^ } } */
diff --git a/gcc/testsuite/gcc.target/arm/interrupt-2.c 
b/gcc/testsuite/gcc.target/arm/interrupt-2.c
index b979bf1..61d3130 100644
--- a/gcc/testsuite/gcc.target/arm/interrupt-2.c
+++ b/gcc/testsuite/gcc.target/arm/interrupt-2.c
@@ -1,26 +1,19 @@
 /* Verify that prologue and epilogue are correct for functions with
__attribute__ ((interrupt)).  */
 /* { dg-do compile } */
-/* { dg-options -O1 } */
+/* { dg-require-effective-target arm_nothumb } */
+/* { dg-options -O1 -marm } */
 
-/* This test is not valid when -mthum.  We just cheat.  */
-#ifndef __thumb__
+/* This test is not valid when -mthumb.  */
 extern void bar (int);
 extern void test (void) __attribute__((__interrupt__));
 
 int foo;
 void test()
 {
-  funcptrs(foo);
+  bar (foo);
   foo = 0;
 }
-#else
-void test ()
-{
-  asm (stmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, lr});
-  asm (ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, pc}^);
-}
-#endif
 
 /* { dg-final { scan-assembler stmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, lr} 
} } */
 /* { dg-final { scan-assembler ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, 
pc}\\^ } } */

RE: [PATCH,ARM] remove incscc and decscc patterns

2013-01-25 Thread Greta Yorsh

Ping?

Thanks,
Greta

 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 18 January 2013 11:44
 To: GCC Patches
 Cc: richard.sandif...@linaro.org; Ramana Radhakrishnan; Richard
 Earnshaw
 Subject: [PATCH,ARM] remove incscc and decscc patterns
 
 Remove incscc and decscc expanders that appear to be dead, along with
 the
 related patterns.
 
 This patch is a follow up on:
 http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01128.html
 
 No regression qemu for arm-none-eabi. Bootstrap successful.
 
 Ok for trunk?
 
 Thanks,
 Greta
 
 2013-01-17  Greta Yorsh  greta.yo...@arm.com
 
   * config/arm/arm.md (incscc,arm_incscc,decscc,arm_decscc):
 Delete.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index de57f40..80480a0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1015,28 +1015,6 @@
[(set_attr conds set)]
 )
 
-(define_expand incscc
-  [(set (match_operand:SI 0 s_register_operand =r,r)
-(plus:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand:CC 3 cc_register ) (const_int 0)])
- (match_operand:SI 1 s_register_operand 0,?r)))]
-  TARGET_32BIT
-  
-)
-
-(define_insn *arm_incscc
-  [(set (match_operand:SI 0 s_register_operand =r,r)
-(plus:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand:CC 3 cc_register ) (const_int 0)])
- (match_operand:SI 1 s_register_operand 0,?r)))]
-  TARGET_ARM
-  @
-  add%d2\\t%0, %1, #1
-  mov%D2\\t%0, %1\;add%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length 4,8)]
-)
-
 ; transform ((x  y) - 1) to ~(~(x-1)  y)  Where X is a constant.
 (define_split
   [(set (match_operand:SI 0 s_register_operand )
@@ -1267,29 +1245,6 @@
(set_attr type simple_alu_imm,*,*)]
 )
 
-(define_expand decscc
-  [(set (match_operand:SI0 s_register_operand =r,r)
-(minus:SI (match_operand:SI  1 s_register_operand 0,?r)
- (match_operator:SI 2 arm_comparison_operator
-   [(match_operand   3 cc_register ) (const_int 0)])))]
-  TARGET_32BIT
-  
-)
-
-(define_insn *arm_decscc
-  [(set (match_operand:SI0 s_register_operand =r,r)
-(minus:SI (match_operand:SI  1 s_register_operand 0,?r)
- (match_operator:SI 2 arm_comparison_operator
-   [(match_operand   3 cc_register ) (const_int 0)])))]
-  TARGET_ARM
-  @
-   sub%d2\\t%0, %1, #1
-   mov%D2\\t%0, %1\;sub%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length *,8)
-   (set_attr type simple_alu_imm,*)]
-)
-
 (define_expand subsf3
   [(set (match_operand:SF   0 s_register_operand )
(minus:SF (match_operand:SF 1 s_register_operand )

[PATCH,ARM][0/5] Updates to cortex-a7 pipeline description

2013-01-25 Thread Greta Yorsh

This sequence of patches improves Cortex-A7 pipeline description.

[1/5] Add ffmas and ffmad type attribute and use it instead of fmacs and
fmacd (respectively) for fused multiply and accumulate operations.
[2/5] Update pipeline description of vdiv, vsqrt, and various vfp and neon
mac operations.
[3/5] Add bypass to forward the result of a mac operation to the accumulator
of another mac operation.
[4/5] Improve handling of calls.
[5/5] Cleanup - remove unused reservation units.

Patches 1-3 must be applied in the given order. Patches 4,5 are independent.

No regression on qemu for arm-none-eabi with cpu cortex-a7 arm/thumb.

Bootstrap successful.

Ok for trunk?

Thanks,
Greta

[PATCH,ARM][1/5] Add ffmas and ffmad type attribute

2013-01-25 Thread Greta Yorsh

Fused and not fused multiply and accumulated operations may have different
timing characteristics, for example in Cortex-A7. Currently, the compiler
captures all of these operations using the same type attribute fmac.

This patch adds a new type attribute ffma to separate fused operations
from other fmac operations, for both single and double precision floating
point. The patch also updates existing pipeline descriptions to use both
fmac and ffma whenever fmac was used, so that the generated code remains
unaffected.

A subsequent patch for Cortex-A7 pipeline description takes advantage of the
distinction between fused and other mac operations.

gcc/

2013-01-03  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (type): Add ffmas and ffmad to type attribute.
* config/arm/vfp.md (fma,fmsub,fnmsub,fnmadd): Change type
from fmac to ffma.
* config/arm/vfp11.md (vfp_farith): Use ffmas.
(vfp_fmul): Use ffmad.
* config/arm/cortex-r4f.md (cortex_r4_fmacs): Use ffmas.
(cortex_r4_fmacd): Use ffmad.
* config/arm/cortex-m4-fpu.md (cortex_m4_fmacs): Use ffmas.
* config/arm/cortex-a9.md (cortex_a9_fmacs):  Use ffmas.
(cortex_a9_fmacd): Use ffmad.
* config/arm/cortex-a8-neon.md (cortex_a8_vfp_macs): Use ffmas.
(cortex_a8_vfp_macd): Use ffmad.
* config/arm/cortex-a5.md (cortex_a5_fpmacs): Use ffmas.
(cortex_a5_fpmacd): Use ffmad.
* config/arm/cortex-a15-neon.md (cortex_a15_vfp_macs) Use ffmas.
(cortex_a15_vfp_macd): Use ffmad.
* config/arm/arm1020e.md (v10_fmul): Use ffmas and ffmad.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
6b8e9a75fa4ca7f4f09ae34f5b69c1b71044f9d8..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -284,6 +284,8 @@ (define_attr type
   fmuld,\
   fmacs,\
   fmacd,\
+  ffmas,\
+  ffmad,\
   f_rints,\
   f_rintd,\
   f_flag,\
diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md
index 
9a41d30573605d845385e79966737f44ee61e168..
 100644
--- a/gcc/config/arm/arm1020e.md
+++ b/gcc/config/arm/arm1020e.md
@@ -284,7 +284,7 @@ (define_insn_reservation v10_cvt 5
 
 (define_insn_reservation v10_fmul 6
  (and (eq_attr vfp10 yes)
-  (eq_attr type fmuls,fmacs,fmuld,fmacd))
+  (eq_attr type fmuls,fmacs,ffmas,fmuld,fmacd,ffmad))
  1020a_e+v10_fmac*2)
 
 (define_insn_reservation v10_fdivs 18
diff --git a/gcc/config/arm/cortex-a15-neon.md 
b/gcc/config/arm/cortex-a15-neon.md
index 
afb67a587526fae8145a8474871cc1d4fbe7e3c9..
 100644
--- a/gcc/config/arm/cortex-a15-neon.md
+++ b/gcc/config/arm/cortex-a15-neon.md
@@ -505,12 +505,12 @@ (define_insn_reservation cortex_a15_vfp
 
 (define_insn_reservation cortex_a15_vfp_macs 6
   (and (eq_attr tune cortexa15)
-   (eq_attr type fmacs))
+   (eq_attr type fmacs,ffmas))
   ca15_issue1,ca15_cx_vfp)
 
 (define_insn_reservation cortex_a15_vfp_macd 11
   (and (eq_attr tune cortexa15)
-   (eq_attr type fmacd))
+   (eq_attr type fmacd,ffmad))
   ca15_issue2,ca15_cx_vfp*2)
 
 (define_insn_reservation cortex_a15_vfp_cvt 6
diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md
index 
2b5abe524a63a6c90fc3da1b255b163d44c1455b..
 100644
--- a/gcc/config/arm/cortex-a5.md
+++ b/gcc/config/arm/cortex-a5.md
@@ -185,7 +185,7 @@ (define_insn_reservation cortex_a5_fpmu
 
 (define_insn_reservation cortex_a5_fpmacs 8
   (and (eq_attr tune cortexa5)
-   (eq_attr type fmacs))
+   (eq_attr type fmacs,ffmas))
   cortex_a5_ex1+cortex_a5_fpmul_pipe, nothing*3, cortex_a5_fpadd_pipe)
 
 ;; Non-multiply instructions can issue in the middle two instructions of a
@@ -201,7 +201,7 @@ (define_insn_reservation cortex_a5_fpmu
 
 (define_insn_reservation cortex_a5_fpmacd 11
   (and (eq_attr tune cortexa5)
-   (eq_attr type fmacd))
+   (eq_attr type fmacd,ffmad))
   cortex_a5_ex1+cortex_a5_fpmul_pipe, cortex_a5_fpmul_pipe*2,\
cortex_a5_ex1+cortex_a5_fpmul_pipe, nothing*3, cortex_a5_fpadd_pipe)
 
diff --git a/gcc/config/arm/cortex-a8-neon.md b/gcc/config/arm/cortex-a8-neon.md
index 
03f52b2df8a99ea709ac42cabcc32de587ac403f..
 100644
--- a/gcc/config/arm/cortex-a8-neon.md
+++ b/gcc/config/arm/cortex-a8-neon.md
@@ -149,12 +149,12 @@ (define_insn_reservation cortex_a8_vfp_
 
 (define_insn_reservation cortex_a8_vfp_macs 21
   (and (eq_attr tune cortexa8)
-   (eq_attr type fmacs))
+   (eq_attr type fmacs,ffmas))
   cortex_a8_vfp,cortex_a8_vfplite*20)
 
 (define_insn_reservation cortex_a8_vfp_macd 26
   (and (eq_attr tune cortexa8)
-   (eq_attr type fmacd))
+   (eq_attr type fmacd,ffmad))
   cortex_a8_vfp,cortex_a8_vfplite*25)
 
 (define_insn_reservation cortex_a8_vfp_divs 37
diff --git a/gcc/config/arm/cortex-a9.md b/gcc/config/arm/cortex-a9.md
index

[PATCH,ARM][2/5] Update cortex-a7 vfp/neon pipeline description

2013-01-25 Thread Greta Yorsh

This patch updates the description of vmul, vdiv, vsqrt, vmla,vmls, vfma,
vfms operations for vfp and neon. It uses ffmas and ffmad type attribute
introduced by the previous patch.

gcc/

2013-01-03  Greta Yorsh  greta.yo...@arm.com

* config/arm/cortex-a7.md (cortex_a7_neon_mul, cortex_a7_neon_mla):
New
reservations.
(cortex_a7_fpfmad): New reservation.
(cortex_a7_fpmacs): Use ffmas and update required units.
(cortex_a7_fpmuld): Update required units and latency.
(cortex_a7_fpmacd): Likewise.
(cortex_a7_fdivs, cortex_a7_fdivd): Likewise.
(cortex_a7_neon). Likewise.
(bypass) Update participating units.diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index 74d4ca0..ce70576 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -202,6 +202,9 @@
 
 ;; Floating-point arithmetic.
 
+;; Neon integer, neon floating point, and single-precision floating
+;; point instructions of the same type have the same timing
+;; characteristics, but neon instructions cannot dual-issue.
 
 (define_insn_reservation cortex_a7_fpalu 4
   (and (eq_attr tune cortexa7)
@@ -229,18 +232,37 @@
 (eq_attr neon_type none)))
   cortex_a7_ex1+cortex_a7_fpmul_pipe)
 
-;; For single-precision multiply-accumulate, the add (accumulate) is issued
-;; whilst the multiply is in F4.  The multiply result can then be forwarded
-;; from F5 to F1.  The issue unit is only used once (when we first start
-;; processing the instruction), but the usage of the FP add pipeline could
-;; block other instructions attempting to use it simultaneously.  We try to
-;; avoid that using cortex_a7_fpadd_pipe.
+(define_insn_reservation cortex_a7_neon_mul 4
+  (and (eq_attr tune cortexa7)
+   (eq_attr neon_type
+neon_mul_ddd_8_16_qdd_16_8_long_32_16_long,\
+ neon_mul_qqq_8_16_32_ddd_32,\
+ 
neon_mul_qdd_64_32_long_qqd_16_ddd_32_scalar_64_32_long_scalar,\
+ neon_mul_ddd_16_scalar_32_16_long_scalar,\
+ neon_mul_qqd_32_scalar,\
+ neon_fp_vmul_ddd,\
+ neon_fp_vmul_qqd))
+  (cortex_a7_both+cortex_a7_fpmul_pipe)*2)
 
 (define_insn_reservation cortex_a7_fpmacs 8
   (and (eq_attr tune cortexa7)
-   (and (eq_attr type fmacs)
+   (and (eq_attr type fmacs,ffmas)
 (eq_attr neon_type none)))
-  cortex_a7_ex1+cortex_a7_fpmul_pipe, nothing*3, cortex_a7_fpadd_pipe)
+  cortex_a7_ex1+cortex_a7_fpmul_pipe)
+
+(define_insn_reservation cortex_a7_neon_mla 8
+  (and (eq_attr tune cortexa7)
+   (eq_attr neon_type
+neon_mla_ddd_8_16_qdd_16_8_long_32_16_long,\
+ neon_mla_qqq_8_16,\
+ 
neon_mla_ddd_32_qqd_16_ddd_32_scalar_qdd_64_32_long_scalar_qdd_64_32_long,\
+ neon_mla_qqq_32_qqd_32_scalar,\
+ neon_mla_ddd_16_scalar_qdd_32_16_long_scalar,\
+ neon_fp_vmla_ddd,\
+ neon_fp_vmla_qqq,\
+ neon_fp_vmla_ddd_scalar,\
+ neon_fp_vmla_qqq_scalar))
+  cortex_a7_both+cortex_a7_fpmul_pipe)
 
 ;; Non-multiply instructions can issue between two cycles of a
 ;; double-precision multiply. 
@@ -249,15 +271,19 @@
   (and (eq_attr tune cortexa7)
(and (eq_attr type fmuld)
 (eq_attr neon_type none)))
-  cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*2,\
-   cortex_a7_ex1+cortex_a7_fpmul_pipe)
+  cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*3)
 
 (define_insn_reservation cortex_a7_fpmacd 11
   (and (eq_attr tune cortexa7)
(and (eq_attr type fmacd)
 (eq_attr neon_type none)))
-  cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*2,\
-   cortex_a7_ex1+cortex_a7_fpmul_pipe, nothing*3, cortex_a7_fpadd_pipe)
+  cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*3)
+
+(define_insn_reservation cortex_a7_fpfmad 8
+  (and (eq_attr tune cortexa7)
+   (and (eq_attr type ffmad)
+(eq_attr neon_type none)))
+  cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*4)
 
 
 ;; Floating-point divide/square root instructions.
@@ -267,13 +293,13 @@
   (and (eq_attr tune cortexa7)
(and (eq_attr type fdivs)
 (eq_attr neon_type none)))
-  cortex_a7_ex1, cortex_a7_fp_div_sqrt * 14)
+  cortex_a7_ex1+cortex_a7_fp_div_sqrt, cortex_a7_fp_div_sqrt * 13)
 
-(define_insn_reservation cortex_a7_fdivd 29
+(define_insn_reservation cortex_a7_fdivd 31
   (and (eq_attr tune cortexa7)
(and (eq_attr type fdivd)
 (eq_attr neon_type none)))
-  cortex_a7_ex1, cortex_a7_fp_div_sqrt * 28)
+  cortex_a7_ex1+cortex_a7_fp_div_sqrt, cortex_a7_fp_div_sqrt * 28

[PATCH,ARM][3/5] New bypass between mac operations in cortex-a7 pipeline description

2013-01-25 Thread Greta Yorsh

Add bypasses to forward the result of one MAC operation to the accumulator
of another MAC operation.

Towards this end, we add a new function arm_mac_accumulator_is_result to be
used as a guard for bypasses. Existing guard
arm_mac_accumulator_is_mul_result requires a multiply operation as the
producer and a multiply-accumulate operation as the consumer. The new guard
allows more general producers and consumers. It allows the consumer to be a
multiply-accumulate or multiply-subtract operation. It allows the producer
to be any SET operation, although only MAC operations are used as producers
in the pipeline description of Cortex-A7.

gcc/

2013-01-03  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (arm_mac_accumulator_is_result): New
declaration.
* config/arm/arm.c (arm_mac_accumulator_is_result): New function.
* config/arm/cortex-a7.md: New bypasses using
arm_mac_accumulator_is_result.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4c61e35..885ccff 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -102,6 +102,7 @@ extern int arm_early_load_addr_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
 extern int arm_no_early_mul_dep (rtx, rtx);
+extern int arm_mac_accumulator_is_result (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
 
 extern int tls_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 13d745f..39f1eb3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24610,6 +24610,62 @@ arm_cxx_guard_type (void)
   return TARGET_AAPCS_BASED ? integer_type_node : long_long_integer_type_node;
 }
 
+/* Return non-zero iff the consumer (a multiply-accumulate or a
+   multiple-subtract instruction) has an accumulator dependency on the
+   result of the producer and no other dependency on that result.  It
+   does not check if the producer is multiply-accumulate instruction.  */
+int
+arm_mac_accumulator_is_result (rtx producer, rtx consumer)
+{
+  rtx result;
+  rtx op0, op1, acc;
+
+  producer = PATTERN (producer);
+  consumer = PATTERN (consumer);
+
+  if (GET_CODE (producer) == COND_EXEC)
+producer = COND_EXEC_CODE (producer);
+  if (GET_CODE (consumer) == COND_EXEC)
+consumer = COND_EXEC_CODE (consumer);
+
+  if (GET_CODE (producer) != SET)
+return 0;
+
+  result = XEXP (producer, 0);
+
+  if (GET_CODE (consumer) != SET)
+return 0;
+
+  /* Check that the consumer is of the form
+ (set (...) (plus (mult ...) (...)))
+ or
+ (set (...) (minus (...) (mult ...))).  */
+  if (GET_CODE (XEXP (consumer, 1)) == PLUS)
+{
+  if (GET_CODE (XEXP (XEXP (consumer, 1), 0)) != MULT)
+return 0;
+
+  op0 = XEXP (XEXP (XEXP (consumer, 1), 0), 0);
+  op1 = XEXP (XEXP (XEXP (consumer, 1), 0), 1);
+  acc = XEXP (XEXP (consumer, 1), 1);
+}
+  else if (GET_CODE (XEXP (consumer, 1)) == MINUS)
+{
+  if (GET_CODE (XEXP (XEXP (consumer, 1), 1)) != MULT)
+return 0;
+
+  op0 = XEXP (XEXP (XEXP (consumer, 1), 1), 0);
+  op1 = XEXP (XEXP (XEXP (consumer, 1), 1), 1);
+  acc = XEXP (XEXP (consumer, 1), 0);
+}
+  else
+return 0;
+
+  return (reg_overlap_mentioned_p (result, acc)
+   !reg_overlap_mentioned_p (result, op0)
+   !reg_overlap_mentioned_p (result, op1));
+}
+
 /* Return non-zero if the consumer (a multiply-accumulate instruction)
has an accumulator dependency on the result of the producer (a
multiplication instruction) and no other dependency on that result.  */
diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index 930242d..2cef5fd 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -137,6 +137,12 @@
 (eq_attr neon_type none)))
   cortex_a7_both)
 
+;; Forward the result of a multiply operation to the accumulator 
+;; of the following multiply and accumulate instruction.
+(define_bypass 1 cortex_a7_mul
+ cortex_a7_mul
+ arm_mac_accumulator_is_result)
+
 ;; The latency depends on the operands, so we use an estimate here.
 (define_insn_reservation cortex_a7_idiv 5
   (and (eq_attr tune cortexa7)
@@ -264,6 +271,10 @@
  neon_fp_vmla_qqq_scalar))
   cortex_a7_both+cortex_a7_fpmul_pipe)
 
+(define_bypass 4 cortex_a7_fpmacs,cortex_a7_neon_mla
+ cortex_a7_fpmacs,cortex_a7_neon_mla
+ arm_mac_accumulator_is_result)
+
 ;; Non-multiply instructions can issue between two cycles of a
 ;; double-precision multiply. 
 
@@ -285,6 +296,10 @@
 (eq_attr neon_type none)))
   cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*4)
 
+(define_bypass 7 cortex_a7_fpmacd
+ cortex_a7_fpmacd,cortex_a7_fpfmad
+ arm_mac_accumulator_is_result

[PATCH,ARM][5/5] Cleanup in cortex-a7 pipeline description

2013-01-25 Thread Greta Yorsh

In cortex_a7_idiv, the use of cortex_a7_all reservation can be replaced by
cortex_a7_both, because all other reservations require at least one of
cortex_a7_ex1 or cortex_a7_ex2. Then, remove unused reservation units
cortex_a7_neon and cortex_a7_all.

gcc/

2013-01-03  Greta Yorsh  greta.yo...@arm.com

* config/arm/cortex-a7.md (cortex_a7_neon, cortex_a7_all): Remove.
(cortex_a7_idiv): Use cortex_a7_both instead of cortex_a7_all.diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index 8c45cb8..21f84b5 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -57,15 +57,6 @@
 
 (define_cpu_unit cortex_a7_fp_div_sqrt cortex_a7)
 
-;; Neon pipeline
-(define_cpu_unit cortex_a7_neon cortex_a7)
-
-(define_reservation cortex_a7_all cortex_a7_both+\
- cortex_a7_fpmul_pipe+\
- cortex_a7_fpadd_pipe+\
- cortex_a7_fp_div_sqrt+\
- cortex_a7_neon)
-
 
 ;; Branches.
 
@@ -151,7 +142,7 @@
 (define_insn_reservation cortex_a7_idiv 5
   (and (eq_attr tune cortexa7)
(eq_attr insn udiv,sdiv))
-  cortex_a7_all*5)
+  cortex_a7_both*5)
 
 
 ;; Load/store instructions.

[PATCH,ARM] remove incscc and decscc patterns

2013-01-18 Thread Greta Yorsh

Remove incscc and decscc expanders that appear to be dead, along with the
related patterns.

This patch is a follow up on:
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01128.html

No regression qemu for arm-none-eabi. Bootstrap successful.

Ok for trunk?

Thanks,
Greta

2013-01-17  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (incscc,arm_incscc,decscc,arm_decscc): Delete.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index de57f40..80480a0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1015,28 +1015,6 @@
[(set_attr conds set)]
 )
 
-(define_expand incscc
-  [(set (match_operand:SI 0 s_register_operand =r,r)
-(plus:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand:CC 3 cc_register ) (const_int 0)])
- (match_operand:SI 1 s_register_operand 0,?r)))]
-  TARGET_32BIT
-  
-)
-
-(define_insn *arm_incscc
-  [(set (match_operand:SI 0 s_register_operand =r,r)
-(plus:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand:CC 3 cc_register ) (const_int 0)])
- (match_operand:SI 1 s_register_operand 0,?r)))]
-  TARGET_ARM
-  @
-  add%d2\\t%0, %1, #1
-  mov%D2\\t%0, %1\;add%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length 4,8)]
-)
-
 ; transform ((x  y) - 1) to ~(~(x-1)  y)  Where X is a constant.
 (define_split
   [(set (match_operand:SI 0 s_register_operand )
@@ -1267,29 +1245,6 @@
(set_attr type simple_alu_imm,*,*)]
 )
 
-(define_expand decscc
-  [(set (match_operand:SI0 s_register_operand =r,r)
-(minus:SI (match_operand:SI  1 s_register_operand 0,?r)
- (match_operator:SI 2 arm_comparison_operator
-   [(match_operand   3 cc_register ) (const_int 0)])))]
-  TARGET_32BIT
-  
-)
-
-(define_insn *arm_decscc
-  [(set (match_operand:SI0 s_register_operand =r,r)
-(minus:SI (match_operand:SI  1 s_register_operand 0,?r)
- (match_operator:SI 2 arm_comparison_operator
-   [(match_operand   3 cc_register ) (const_int 0)])))]
-  TARGET_ARM
-  @
-   sub%d2\\t%0, %1, #1
-   mov%D2\\t%0, %1\;sub%d2\\t%0, %1, #1
-  [(set_attr conds use)
-   (set_attr length *,8)
-   (set_attr type simple_alu_imm,*)]
-)
-
 (define_expand subsf3
   [(set (match_operand:SF   0 s_register_operand )
(minus:SF (match_operand:SF 1 s_register_operand )

RE: [PATCH] Yet another non-prototype builtin issue (PR middle-end/55890)

2013-01-11 Thread Greta Yorsh

Tom, are you going to apply this patch? 

There are similar failures on arm-none-eabi after r195008:

FAIL: gcc.dg/torture/pr55890-3.c -O0 (internal compiler error)
FAIL: gcc.dg/torture/pr55890-3.c -O0 (test for excess errors)
FAIL: gcc.dg/torture/pr55890-3.c -O1 (internal compiler error)
FAIL: gcc.dg/torture/pr55890-3.c -O1 (test for excess errors)

ICE with segfault same backtrace as in your report.

The failures are with -OO and -O1 but the test passes with -O2.

Your patch fixes these failures. I haven't tested the patch beyond that.

Thanks,
Greta

 -Original Message-
 From: Tom de Vries [mailto:tom_devr...@mentor.com]
 Sent: 09 January 2013 10:11
 To: Richard Biener
 Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org
 Subject: [PATCH] Yet another non-prototype builtin issue (PR middle-
 end/55890)
 
 Richard,
 
 I've build r195008 for target mips64-linux-gnu with low optimization
 level
 ({CFLAGS,CXXFLAGS,BOOT_CFLAGS}='-g -O0'), and noticed failures for
 gcc.dg/torture/pr55890-{1,2,3}.c at -O0 and -O1 (which are not there
 without the
 low optimization level).
 
 The -O1 pr55890-1.c failure looks like this:
 ...
 $ mips64-linux-gnu-gcc gcc/testsuite/gcc.dg/torture/pr55890-1.c
 -fno-diagnostics-show-caret   -O1   -S  -o pr55890-1.s
 gcc/testsuite/gcc.dg/torture/pr55890-1.c: In function 'main':
 gcc/testsuite/gcc.dg/torture/pr55890-1.c:6:11: internal compiler error:
 Segmentation fault
 0x86dad80 crash_signal
 gcc/toplev.c:334
 0x82bec14 expand_call(tree_node*, rtx_def*, int)
 gcc/calls.c:3139
 0x82adae3 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode,
 int)
 gcc-mainline/gcc/builtins.c:6866
 0x83d5484 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
 expand_modifier, rtx_def**)
 gcc-mainline/gcc/expr.c:10141
 0x82d7721 expand_call_stmt
 gcc-mainline/gcc/cfgexpand.c:2115
 0x82d77eb expand_gimple_stmt_1
 gcc/cfgexpand.c:2153
 0x82d7e51 expand_gimple_stmt
 gcc/cfgexpand.c:2305
 0x82d8f76 expand_gimple_basic_block
 gcc/cfgexpand.c:4084
 0x82d9d53 gimple_expand_cfg
 gcc/cfgexpand.c:4603
 Please submit a full bug report,
 with preprocessed source if appropriate.
 Please include the complete backtrace with any bug report.
 See http://gcc.gnu.org/bugs.html for instructions.
 ...
 
 The segv occurs when evaluating GET_MODE (args[arg_nr].reg) here:
 ...
   if (pass == 1  (return_flags  ERF_RETURNS_ARG))
 {
   int arg_nr = return_flags  ERF_RETURN_ARG_MASK;
   if (PUSH_ARGS_REVERSED)
 arg_nr = num_actuals - arg_nr - 1;
   if (args[arg_nr].reg
valreg
REG_P (valreg)
GET_MODE (args[arg_nr].reg) == GET_MODE (valreg))
   call_fusage
 = gen_rtx_EXPR_LIST (TYPE_MODE (TREE_TYPE
 (args[arg_nr].tree_value)),
  gen_rtx_SET (VOIDmode, valreg,
 args[arg_nr].reg),
  call_fusage);
 }
 ...
 
 The expression (return_flags  ERF_RETURNS_ARG) is true because we're
 calculating return_flags using fndecl == memmove, and for memmove we're
 indeed
 returning the first arg.
 
 So arg_nr evaluates to 0, and we're accessing args[arg_nr].reg. But
 num_actuals
 is 0, so args is the result of alloca (0) and args[arg_nr] contains
 some random
 value, which causes the segv when evaluating GET_MODE
 (args[arg_nr].reg) (unless
 args[arg_nr].reg happens to be NULL, in which case we don't get there).
 
 Attached patch fixes this by testing whether arg_nr is in range before
 using it.
 Using the patch and a cc1 recompile, I'm able to run pr55890-{1,2,3}.c
 successfully.
 
 OK for trunk after I've tested this on mips64?
 
 Thanks,
 - Tom
 
 2013-01-09  Tom de Vries  t...@codesourcery.com
 
   PR middle-end/55890
   * calls.c (expand_call): Check if arg_nr is valid.

[PATCH, ARM] Initial pipeline description for Cortex-A7

2012-12-20 Thread Greta Yorsh

Currently, GCC uses generic ARMv7-A tuning for Cortex-A7.
This patch adds an initial pipeline description for Cortex-A7. Details:
* integer/vfp is based on the pipeline description for Cortex-A5,
* models dual issue in limited circumstances using simple_alu_imm and
simple_alu_shift type attribute (introduced by a previous patch),
* basic neon timings.

No regression on qemu for arm-none-eabi target with cpu cortex-a7.

Bootstrap successful on Cortex-A15 (gcc configured with cpu cortex-a7).

Performance evaluation on Cortex-A7 hardware:

Coremark: 
* No change compared to generic tuning even though the generated assembly is
significantly different due to instruction scheduling. 
* Improvement compared to tuning for Cortex-A5: 4% improvement in arm mode
and 9% improvement in thumb mode.
CINT2000:
* compared to generic tuning, overall improvement of 1.9%.
* compared to tuning for Cortex-A5, overall improvement of 1.5%.
* in both cases, all benchmarks improved except 254.gap.
CFP2000:
* compared to generic tuning (which doesn't do much for FP), overall
improvement of 5.5%, all benchmarks improved.
* compared to Cortex-A5 tuning (as pipeline descriptions are nearly
identical) overall no change, but individual benchmarks mixed results.

Ok for trunk?

Thanks,
Greta

gcc/ChangeLog

2012-12-20  Greta Yorsh  greta.yo...@arm.com

* config/arm/cortex-a7.md: New file.
* config/arm/arm.md: Include cortex-a7.md.
(generic_sched): Don't use generic scheduler for Cortex-A7.
(generic_vfp): Likewise.
* config/arm/t-arm (arm_cpu_table): Likewise.
* config/arm/arm.c: (TARGET_SCHED_REORDER): Use arm_sched_reorder.
(arm_sched_reorder): New function.
(cortexa7_older_only,cortexa7_younger): Likewise.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 84ce56f..ab6c88b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -132,6 +132,7 @@ static void arm_output_function_prologue (FILE *, 
HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
 static void arm_set_default_type_attributes (tree);
 static int arm_adjust_cost (rtx, rtx, rtx, int);
+static int arm_sched_reorder (FILE *, int, rtx *, int *, int);
 static int optimal_immediate_sequence (enum rtx_code code,
   unsigned HOST_WIDE_INT val,
   struct four_ints *return_sequence);
@@ -366,6 +367,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SCHED_REORDER
+#define TARGET_SCHED_REORDER arm_sched_reorder
+
 #undef TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST arm_register_move_cost
 
@@ -8680,6 +8684,164 @@ arm_memory_move_cost (enum machine_mode mode, 
reg_class_t rclass,
 }
 }
 
+
+/* Return true if and only if this insn can dual-issue only as older.  */
+static bool
+cortexa7_older_only (rtx insn)
+{
+  if (recog_memoized (insn)  0)
+return false;
+
+  if (get_attr_insn (insn) == INSN_MOV)
+return false;
+
+  switch (get_attr_type (insn))
+{
+case TYPE_ALU_REG:
+case TYPE_LOAD_BYTE:
+case TYPE_LOAD1:
+case TYPE_STORE1:
+case TYPE_FFARITHS:
+case TYPE_FADDS:
+case TYPE_FFARITHD:
+case TYPE_FADDD:
+case TYPE_FCPYS:
+case TYPE_F_CVT:
+case TYPE_FCMPS:
+case TYPE_FCMPD:
+case TYPE_FCONSTS:
+case TYPE_FCONSTD:
+case TYPE_FMULS:
+case TYPE_FMACS:
+case TYPE_FMULD:
+case TYPE_FMACD:
+case TYPE_FDIVS:
+case TYPE_FDIVD:
+case TYPE_F_2_R:
+case TYPE_F_FLAG:
+case TYPE_F_LOADS:
+case TYPE_F_STORES:
+  return true;
+default:
+  return false;
+}
+}
+
+/* Return true if and only if this insn can dual-issue as younger.  */
+static bool
+cortexa7_younger (FILE *file, int verbose, rtx insn)
+{
+  if (recog_memoized (insn)  0)
+{
+  if (verbose  5)
+fprintf (file, ;; not cortexa7_younger %d\n, INSN_UID (insn));
+  return false;
+}
+
+  if (get_attr_insn (insn) == INSN_MOV)
+return true;
+
+  switch (get_attr_type (insn))
+{
+case TYPE_SIMPLE_ALU_IMM:
+case TYPE_SIMPLE_ALU_SHIFT:
+case TYPE_BRANCH:
+  return true;
+default:
+  return false;
+}
+}
+
+
+/* Look for an instruction that can dual issue only as an older
+   instruction, and move it in front of any instructions that can
+   dual-issue as younger, while preserving the relative order of all
+   other instructions in the ready list.  This is a hueuristic to help
+   dual-issue in later cycles, by postponing issue of more flexible
+   instructions.  This heuristic may affect dual issue opportunities
+   in the current cycle.  */
+static void
+cortexa7_sched_reorder (FILE *file, int verbose, rtx *ready, int *n_readyp,
+int clock)
+{
+  int i;
+  int first_older_only = -1, first_younger = -1;
+
+  if (verbose  5)
+fprintf

[PATCH,ARM] Define simple_alu_shift value for type attribute

2012-12-17 Thread Greta Yorsh

The attribute type for UXTB, UXTH, SXTB, and SXTH is set to
simple_alu_imm if tuning is for cortex-a7 and alu_shift otherwise (since
r193996). The problem is that attributes should not depend on a specific CPU
or tuning. To eliminate the dependency, this patch introduces a new value
for type attribute, called simple_alu_shift and updates the relevant
insn definitions. This patch also updates all existing pipeline descriptions
to use simple_alu_shift in the same way as alu_shift.

The motivation for this patch is cortex-a7 pipeline description that will be
submitted separately and will handle simple_alu_shift differently from
alu_shift.

No regression on qemu for arm-none-eabi Cortex-A15.

No difference in generated assembly when compiling all of the preprocessed
sources of gcc 4.8 as a test in various configurations: -mcpu=cortex-a15
-march=armv6t2 -marm/-mthumb -O0/-O1/-O2/-O3/-Os.

Ok for trunk?

Thanks,
Greta

2012-12-05  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (type): Add simple_alu_shift to attribute
type.
(core_cycles): Update for simple_alu_shift.
(thumb1_zero_extendhisi2,arm_zero_extendhisi2_v6): Use
simple_alu_shift
instead of a CPU-speicific condition for type attribute.
(thumb1_zero_extendqisi2_v6,arm_zero_extendqisi2_v6): Likewise.
(thumb1_extendhisi2,arm_extendhisi2_v6,arm_extendqisi_v6): Likewise.
(thumb1_extendqisi2): Likewise.
* config/arm/thumb2.md (thumb2_extendqisi_v6): Likewise.
(thumb2_zero_extendhisi2_v6,thumb2_zero_extendqisi2_v6) Likewise.
* config/arm/arm1020e.md (alu_shift_op): Use simple_alu_shift.
* config/arm/arm1026ejs.md (alu_shift_op): Likewise.
* config/arm/arm1136jfs.md (11_alu_shift_op): Likewise.
* config/arm/arm926ejs.md (9_alu_op): Likewise.
* config/arm/cortex-a15.md (cortex_a15_alu_shift): Likewise.
* config/arm/cortex-a5.md (cortex_a5_alu_shift): Likewise.
* config/arm/cortex-a8.md (cortex_a8_alu_shift,cortex_a8_mov):
Likewise.
* config/arm/cortex-a9.md (cortex_a9_dp,cortex_a9_dp_shift):
Likewise.
* config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
* config/arm/cortex-r4.md (cortex_r4_alu_shift): Likewise.
* config/arm/fa526.md (526_alu_shift_op): Likewise.
* config/arm/fa606te.md (fa606te_core): Likewise.
* config/arm/fa626te.md (626te_alu_shift_op): Likewise.
* config/arm/fa726te.md (726te_alu_shift_op): Likewise.
* config/arm/fmp626.md (mp626_alu_shift_op): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
331203329e7560864597de3caa401c1d1231a24f..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -331,6 +331,7 @@ (define_attr insn
 ;   regs or have a shifted source operand
 ;   and does not have an immediate operand. This is
 ;   also the default
+; simple_alu_shift covers UXTH, UXTB, SXTH, SXTB
 ; alu_shiftany data instruction that doesn't hit memory or fp
 ;  regs, but has a source operand shifted by a constant
 ; alu_shift_regany data instruction that doesn't hit memory or fp
@@ -362,6 +363,7 @@ (define_attr insn
 (define_attr type
  simple_alu_imm,\
   alu_reg,\
+  simple_alu_shift,\
   alu_shift,\
   alu_shift_reg,\
   mult,\
@@ -543,7 +545,9 @@ (define_attr write_conflict no,yes
 ; than one on the main cpu execution unit.
 (define_attr core_cycles single,multi
   (if_then_else (eq_attr type
-simple_alu_imm,alu_reg,alu_shift,float,fdivd,fdivs)
+simple_alu_imm,alu_reg,\
+  simple_alu_shift,alu_shift,\
+  float,fdivd,fdivs)
(const_string single)
(const_string multi)))
 
@@ -4712,11 +4716,7 @@ (define_insn *thumb1_zero_extendhisi2
 [(if_then_else (eq_attr is_arch6 yes)
   (const_int 2) (const_int 4))
 (const_int 4)])
-   (set_attr_alternative type
- [(if_then_else (eq_attr tune cortexa7)
-(const_string simple_alu_imm)
-(const_string alu_shift))
-  (const_string load_byte)])]
+   (set_attr type simple_alu_shift, load_byte)]
 )
 
 (define_insn *arm_zero_extendhisi2
@@ -4738,11 +4738,7 @@ (define_insn *arm_zero_extendhisi2_v6
uxth%?\\t%0, %1
ldr%(h%)\\t%0, %1
   [(set_attr predicable yes)
-   (set_attr_alternative type
- [(if_then_else (eq_attr tune cortexa7)
-(const_string simple_alu_imm)
-(const_string alu_shift))
-  (const_string load_byte)])]
+   (set_attr type simple_alu_shift,load_byte)]
 )
 
 (define_insn *arm_zero_extendhisi2addsi
@@ -4812,11 +4808,7 @@ (define_insn

[PATCH,ARM] Subdivide alu into alu_reg and simple_alu_imm

2012-11-29 Thread Greta Yorsh

For attribute named type, subdivide alu into alu_reg and
simple_alu_imm.
Set type attribute as appropriate in define_insn patterns with immediate
operands.
Update pipeline descriptions to use the new values of type attribute.

No regression on qemu arm-none-eabi -mcpu=cortex-a15/cortex-a7. 

Bootstrap successful on Cortex-A15.

No difference in generated assembly when compiling all of preprocessed
sources of gcc 4.8 as a test in various configurations: -mcpu=cortex-a15
-march=armv6t2 -marm/-mthumb -O0/-O1/-O2/-O3/-Os.

The motivation for this patch is cortex-a7 pipeline description, which will
be submitted separately.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2012-11-28  Ramana Radhakrishnan ramana.radhakrish...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (type): Subdivide alu into alu_reg and
simple_alu_imm.
  (core_cycles): Use new names.
(arm_addsi3): Set type attribute for patterns involving
simple_alu_imm.
(addsi3_compare0, addsi3_compare0_scratch): Likewise.
(addsi3_compare_op1, addsi3_compare_op2, compare_addsi2_op0):
Likewise.
(compare_addsi2_op1, arm_subsi3_insn, subsi3_compare0): Likewise.
(subsi3_compare, arm_decscc,arm_andsi3_insn): Likewise.
(thumb1_andsi3_insn, andsi3_compare0_scratch): Likewise.
(zeroextractsi_compare0_scratch, iorsi3_insn, iorsi3_compare0):
Likewise.
(iorsi3_compare0_scratch, arm_xorsi3, thumb1_xorsi3_insn): Likewise.
(xorsi3_compare0, xorsi3_compare0_scratch, thumb1_zero_extendhisi2):
Likewise.
(arm_zero_extendhisi2_v6, thumb1_zero_extendqisi2_v): Likewise.
  (arm_zero_extendqisi2_v6, thumb1_extendhisi2, arm_extendqisi_v6):
Likewise.
  (thumb1_extendqisi2, arm_movsi_insn): Likewise.
(movsi_compare0, movhi_insn_arch4, movhi_bytes): Likewise.
(arm_movqi_insn, thumb1_movqi_insn, arm_cmpsi_insn): Likewise.
(movsicc_insn, if_plus_move, if_move_plus): Likewise.
* config/arm/neon.md (neon_movmode/VDX): Likewise.
(neon_movmode/VQXMOV): Likewise.
* config/arm/arm1020e.md (1020alu_op): Likewise.
* config/arm/fmp626.md (mp626_alu_op): Likewise.
* config/arm/fa726te.md (726te_alu_op): Likewise.
* config/arm/fa626te.md (626te_alu_op): Likewise.
* config/arm/fa606te.md (606te_alu_op): Likewise.
* config/arm/fa526.md (526_alu_op): Likewise.
* config/arm/cortex-r4.md (cortex_r4_alu, cortex_r4_mov): Likewise.
* config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
* config/arm/cortex-a9.md (cprtex_a9_dp): Likewise.
* config/arm/cortex-a8.md (cortex_a8_alu, cortex_a8_mov): Likewise.
* config/arm/cortex-a5.md (cortex_a5_alu): Likewise.
* config/arm/cortex-a15.md (cortex_a15_alu): Likewise.
* config/arm/arm926ejs.md (9_alu_op): Likewise.
* config/arm/arm1136jfs.md (11_alu_op): Likewise.
* config/arm/arm1026ejs.md (alu_op): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
7e92b69ad861fe90ed409494d451854f30888462..
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -323,8 +323,14 @@ (define_attr insn
 ; Classification of each insn
 ; Note: vfp.md has different meanings for some of these, and some further
 ; types as well.  See that file for details.
-; alu  any alu  instruction that doesn't hit memory or fp
-;  regs or have a shifted source operand
+; simple_alu_imm  a simple alu instruction that doesn't hit memory or fp
+;   regs or have a shifted source operand and has an immediate
+;   operand. This currently only tracks very basic immediate
+;   alu operations.
+; alu_reg   any alu instruction that doesn't hit memory or fp
+;   regs or have a shifted source operand
+;   and does not have an immediate operand. This is
+;   also the default
 ; alu_shiftany data instruction that doesn't hit memory or fp
 ;  regs, but has a source operand shifted by a constant
 ; alu_shift_regany data instruction that doesn't hit memory or fp
@@ -354,7 +360,8 @@ (define_attr insn
 ;
 
 (define_attr type
- alu,\
+ simple_alu_imm,\
+  alu_reg,\
   alu_shift,\
   alu_shift_reg,\
   mult,\
@@ -398,7 +405,7 @@ (define_attr type
 (eq_attr insn smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,\
 umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
 (const_string mult)
-(const_string alu)))
+(const_string alu_reg)))
 
 ; Is this an (integer side) multiply with a 64-bit result?
 (define_attr mul64 no,yes
@@ -536,7 +543,7 @@ (define_attr write_conflict no,yes
 ; than one on the main cpu execution unit.
 (define_attr core_cycles single,multi
   (if_then_else (eq_attr type
-alu,alu_shift,float,fdivd,fdivs)
+simple_alu_imm,alu_reg,alu_shift,float,fdivd

RE: [PATCH,ARM] Subdivide alu into alu_reg and simple_alu_imm

2012-11-29 Thread Greta Yorsh

 -Original Message-
 From: Richard Earnshaw
 Sent: 29 November 2012 10:12
 To: Greta Yorsh
 Cc: GCC Patches; Ramana Radhakrishnan; ni...@redhat.com;
 p...@codesourcery.com
 Subject: Re: [PATCH,ARM] Subdivide alu into alu_reg and simple_alu_imm

   ; ??? Check Thumb-2 split length
   (define_insn_and_split *arm_subsi3_insn
 -  [(set (match_operand:SI   0 s_register_operand
 =r,r,rk,r)
 - (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,k,?n)
 -   (match_operand:SI 2 reg_or_int_operand r,rI,r, r)))]
 +  [(set (match_operand:SI   0 s_register_operand
 =r,r,r,rk,r)
 + (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,r,k,?n)
 +   (match_operand:SI 2 reg_or_int_operand r,I,r,r, r)))]
 TARGET_32BIT
 @
  rsb%?\\t%0, %2, %1
  sub%?\\t%0, %1, %2
  sub%?\\t%0, %1, %2
 +   sub%?\\t%0, %1, %2
  #
  (CONST_INT_P (operands[1])
   !const_ok_for_arm (INTVAL (operands[1])))
 @@ -1270,8 +1295,9 @@ (define_insn_and_split *arm_subsi3_insn
 INTVAL (operands[1]), operands[0], operands[2],
 0);
 DONE;

 -  [(set_attr length 4,4,4,16)
 -   (set_attr predicable yes)]
 +  [(set_attr length 4,4,4,4,16)
 +   (set_attr predicable yes)
 +   (set_attr type  *,simple_alu_imm,*,*,*)]
   )

 There's something wrong here.  MINUS (reg, imm) should be canonicalized
 elsewhere to PLUS (reg, -imm), so the alternative you've split /should/
 never match anything.  On the other hand, you haven't split the first
 alternative (that generates RSB), which is a legitimate use of an
 immediate in MINUS.

The rI constraint on operand 1 in the first alternative is not split because 
RSB instruction it generates cannot dual-issue (even with an immediate operand) 
in the second slot on Cortex-A7, and so the alternative should not be marked as 
simple_alu_imm.

Thanks,
Greta

 Otherwise, OK.

 R.

 On 29/11/12 09:57, Greta Yorsh wrote:
  For attribute named type, subdivide alu into alu_reg and
  simple_alu_imm.
  Set type attribute as appropriate in define_insn patterns with
 immediate
  operands.
  Update pipeline descriptions to use the new values of type attribute.

  No regression on qemu arm-none-eabi -mcpu=cortex-a15/cortex-a7.

  Bootstrap successful on Cortex-A15.

  No difference in generated assembly when compiling all of
 preprocessed
  sources of gcc 4.8 as a test in various configurations: -mcpu=cortex-
 a15
  -march=armv6t2 -marm/-mthumb -O0/-O1/-O2/-O3/-Os.

  The motivation for this patch is cortex-a7 pipeline description,
 which will
  be submitted separately.

  Ok for trunk?

  Thanks,
  Greta

  ChangeLog

  gcc/

  2012-11-28  Ramana Radhakrishnan ramana.radhakrish...@arm.com
   Greta Yorsh  greta.yo...@arm.com

   * config/arm/arm.md (type): Subdivide alu into alu_reg
 and
  simple_alu_imm.
(core_cycles): Use new names.
   (arm_addsi3): Set type attribute for patterns involving
  simple_alu_imm.
   (addsi3_compare0, addsi3_compare0_scratch): Likewise.
   (addsi3_compare_op1, addsi3_compare_op2,
 compare_addsi2_op0):
  Likewise.
   (compare_addsi2_op1, arm_subsi3_insn, subsi3_compare0):
 Likewise.
   (subsi3_compare, arm_decscc,arm_andsi3_insn): Likewise.
   (thumb1_andsi3_insn, andsi3_compare0_scratch): Likewise.
   (zeroextractsi_compare0_scratch, iorsi3_insn,
 iorsi3_compare0):
  Likewise.
   (iorsi3_compare0_scratch, arm_xorsi3, thumb1_xorsi3_insn):
 Likewise.
   (xorsi3_compare0, xorsi3_compare0_scratch,
 thumb1_zero_extendhisi2):
  Likewise.
   (arm_zero_extendhisi2_v6, thumb1_zero_extendqisi2_v):
 Likewise.
(arm_zero_extendqisi2_v6, thumb1_extendhisi2,
 arm_extendqisi_v6):
  Likewise.
(thumb1_extendqisi2, arm_movsi_insn): Likewise.
   (movsi_compare0, movhi_insn_arch4, movhi_bytes): Likewise.
   (arm_movqi_insn, thumb1_movqi_insn, arm_cmpsi_insn):
 Likewise.
   (movsicc_insn, if_plus_move, if_move_plus): Likewise.
   * config/arm/neon.md (neon_movmode/VDX): Likewise.
   (neon_movmode/VQXMOV): Likewise.
   * config/arm/arm1020e.md (1020alu_op): Likewise.
   * config/arm/fmp626.md (mp626_alu_op): Likewise.
   * config/arm/fa726te.md (726te_alu_op): Likewise.
   * config/arm/fa626te.md (626te_alu_op): Likewise.
   * config/arm/fa606te.md (606te_alu_op): Likewise.
   * config/arm/fa526.md (526_alu_op): Likewise.
   * config/arm/cortex-r4.md (cortex_r4_alu, cortex_r4_mov):
 Likewise.
   * config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
   * config/arm/cortex-a9.md (cprtex_a9_dp): Likewise.
   * config/arm/cortex-a8.md (cortex_a8_alu, cortex_a8_mov):
 Likewise.
   * config/arm/cortex-a5.md (cortex_a5_alu): Likewise.
   * config/arm/cortex-a15.md (cortex_a15_alu): Likewise.
   * config/arm/arm926ejs.md (9_alu_op): Likewise

[ARM, PATCH] TARGET_LDRD reject Thumb1 targets

2012-11-21 Thread Greta Yorsh

This patch adjusts the definition of TARGET_LDRD to false on Thumb1 targets,
as suggested here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02048.html

No regression on qemu for arm none-eabi with arch=armv5t/armv7-a
mode=thumb/arm.

Ok for trunk?

Thanks,
Greta

ChangeLog

2012-11-21  Greta Yorsh  greta.yo...@arm.com

   * config/arm/arm.h (TARGET_LDRD): Reject Thumb1 targets.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 5f34f2a..1adcf9f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -252,7 +252,6 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_BACKTRACE   (leaf_function_p () \
 ? TARGET_TPCS_LEAF_FRAME \
 : TARGET_TPCS_FRAME)
-#define TARGET_LDRD(arm_arch5e  ARM_DOUBLEWORD_ALIGN)
 #define TARGET_AAPCS_BASED \
 (arm_abi != ARM_ABI_APCS  arm_abi != ARM_ABI_ATPCS)
 
@@ -269,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Thumb-1 only.  */
 #define TARGET_THUMB1_ONLY (TARGET_THUMB1  !arm_arch_notm)
 
+#define TARGET_LDRD(arm_arch5e  ARM_DOUBLEWORD_ALIGN \
+  !TARGET_THUMB1)
+
 /* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need

[PATCH, ARM] Fix offset_ok_for_ldrd_strd in Thumb1

2012-10-23 Thread Greta Yorsh

The function offset_ok_for_ldrd_strd should return false for Thumb1, because
TARGET_LDRD and Thumb1 can be both enabled (for example, the default for
cortex-m0).

This patch fixes ICE that is caused by gcc r192678 and occurs when building
gcc with newlib for arm-none-eabi cortex-m0.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/


2012-10-23  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (offset_ok_for_ldrd_strd): Return false for
Thumb1.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e9b9463..a94e537 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12209,7 +12209,7 @@ offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
   else if (TARGET_ARM)
 max_offset = 255;
   else
-gcc_unreachable ();
+return false;
 
   return ((offset = max_offset)  (offset = -max_offset));
 }

RE: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode

2012-10-19 Thread Greta Yorsh

On 18 October 2012 14:41, Richard Earnshaw wrote:
  +/* Checks whether the operands are valid for use in an LDRD/STRD
instruction.
  +   Assumes that RT, RT2, and RTN are REG.  This is guaranteed by the
patterns.
  +   Assumes that the address in the base register RTN is word aligned.
Pattern
  +   guarantees that both memory accesses use the same base register,
  +   the offsets are constants within the range, and the gap between the
offsets is 4.
  +   If preload complete then check that registers are legal.  WBACK
indicates whether
  +   address is updated.  LOAD indicates whether memory access is load or
store.  */
 
 ARM ARM terminology uses Rn for the base reg, so:
 
 s/RTN/RN/

Fixed.

 
  +bool
  +operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rtn, HOST_WIDE_INT offset,
 
 s/rtn/rn/

Fixed.

  +;; Patterns for LDRD/STRD in Thumb2 mode
  +
  +(define_insn *thumb2_ldrd
  +  [(set (match_operand:SI 0 s_register_operand =r)
  +(mem:SI (plus:SI (match_operand:SI 1 s_register_operand rk)
  + (match_operand:SI 2 ldrd_strd_offset_operand
Do
  +   (set (match_operand:SI 3 s_register_operand =r)
  +(mem:SI (plus:SI (match_dup 1)
  + (match_operand:SI 4 const_int_operand
]
  +  TARGET_LDRD  TARGET_THUMB2
  +  (current_tune-prefer_ldrd_strd 
!optimize_function_for_size_p (cfun))
 
 All these should be gated on reload_completed and not on the tune or 
 size optimization.

Removed the condition !optimize_function_for_size_p (cfun)).

The condition current_tune-prefer_ldrd_strd is needed because the
patterns 
for LDRD/STRD appear before the patterns for LDM/STM that can match the same
RTL
(two register in the list). Condition reload_completed does not help with
it
because peephole optimizations in ldmstm.md may (after reload) create new
RTL insn 
that match this pattern.

  diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
  index f330da3..21d1aa8 100644
  --- a/gcc/config/arm/arm.c
  +++ b/gcc/config/arm/arm.c
  @@ -12130,6 +12130,9 @@ offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
{
  HOST_WIDE_INT max_offset;
 
  +  if (!TARGET_LDRD)
  +return false;
  +
 
 This seems to be in the wrong place.  If we don't have ldrd then the 
 question as to what is a valid offset is irrelevant.

Moved this condition to predicates.md and constraints.md.

Other uses of offset_ok_for_ldrd_strd are already guarded by the conditions.

I am attaching a new version of this patch. 

No regression on qemu for arm-none-eabi with cpu cortex-m4 and cortex-a15.

Ok for trunk?

Thank you,
Greta

ChangeLog


gcc/

2012-10-19  Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
declaration.
(operands_ok_ldrd_strd): Likewise.
* config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
(operands_ok_ldrd_strd): Likewise.
* config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
(thumb2_ldrd_base_neg): Likewise.
(thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
* predicates.md (ldrd_strd_offset_operand): New predicate.
* config/arm/constraints.md (Do): New constraint.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 010e7fc..bfe96ea 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
 extern bool gen_const_stm_seq (rtx *, int);
 extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
+extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fc3a508..c60e62f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12185,6 +12185,75 @@ arm_pad_reg_upward (enum machine_mode mode,
   return !BYTES_BIG_ENDIAN;
 }
 
+/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
+   assuming that the address in the base register is word aligned.  */
+bool
+offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
+{
+  HOST_WIDE_INT max_offset;
+
+  /* Offset must be a multiple of 4 in Thumb mode.  */
+  if (TARGET_THUMB2  ((offset  3) != 0))
+return false;
+
+  if (TARGET_THUMB2)
+max_offset = 1020;
+  else if (TARGET_ARM)
+max_offset = 255;
+  else
+gcc_unreachable ();
+
+  return ((offset = max_offset)  (offset = -max_offset));
+}
+
+/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
+   Assumes that RT, RT2, and RN are REG.  This is guaranteed by the patterns.
+   Assumes that the address

RE: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode

2012-10-19 Thread Greta Yorsh

 -Original Message-
 From: Richard Earnshaw
 Sent: 19 October 2012 16:44
 To: Greta Yorsh
 Cc: GCC Patches; Ramana Radhakrishnan; ni...@redhat.com;
 p...@codesourcery.com
 Subject: Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb
 mode
 
 On 19/10/12 16:20, Greta Yorsh wrote:
 
  Removed the condition !optimize_function_for_size_p (cfun)).
 
  The condition current_tune-prefer_ldrd_strd is needed because the
  patterns
  for LDRD/STRD appear before the patterns for LDM/STM that can match
 the same
  RTL
  (two register in the list). Condition reload_completed does not
 help with
  it
  because peephole optimizations in ldmstm.md may (after reload) create
 new
  RTL insn
  that match this pattern.
 
 
 The point of the reload_completed is that these patterns have the
 potential to cause some problems if they somehow matched during earlier
 passes and the address base was an eliminable register.
 

Thank you for the explanation. Here is an updated patch.

Regression tests and bootstrap in progress for the entire sequence, after
addressing all other comments as well. 

OK for trunk, if bootstrap successful?

Thanks,
Greta


ChangeLog


gcc/

2012-10-19  Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
declaration.
(operands_ok_ldrd_strd): Likewise.
* config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
(operands_ok_ldrd_strd): Likewise.
* config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
(thumb2_ldrd_base_neg): Likewise.
(thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
* predicates.md (ldrd_strd_offset_operand): New predicate.
* config/arm/constraints.md (Do): New constraint.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 010e7fc..bfe96ea 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
 extern bool gen_const_stm_seq (rtx *, int);
 extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
+extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fc3a508..c60e62f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12185,6 +12185,75 @@ arm_pad_reg_upward (enum machine_mode mode,
   return !BYTES_BIG_ENDIAN;
 }
 
+/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
+   assuming that the address in the base register is word aligned.  */
+bool
+offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
+{
+  HOST_WIDE_INT max_offset;
+
+  /* Offset must be a multiple of 4 in Thumb mode.  */
+  if (TARGET_THUMB2  ((offset  3) != 0))
+return false;
+
+  if (TARGET_THUMB2)
+max_offset = 1020;
+  else if (TARGET_ARM)
+max_offset = 255;
+  else
+gcc_unreachable ();
+
+  return ((offset = max_offset)  (offset = -max_offset));
+}
+
+/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
+   Assumes that RT, RT2, and RN are REG.  This is guaranteed by the patterns.
+   Assumes that the address in the base register RN is word aligned.  Pattern
+   guarantees that both memory accesses use the same base register,
+   the offsets are constants within the range, and the gap between the offsets 
is 4.
+   If preload complete then check that registers are legal.  WBACK indicates 
whether
+   address is updated.  LOAD indicates whether memory access is load or store. 
 */
+bool
+operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rn, HOST_WIDE_INT offset,
+   bool wback, bool load)
+{
+  unsigned int t, t2, n;
+
+  if (!reload_completed)
+return true;
+
+  if (!offset_ok_for_ldrd_strd (offset))
+return false;
+
+  t = REGNO (rt);
+  t2 = REGNO (rt2);
+  n = REGNO (rn);
+
+  if ((TARGET_THUMB2)
+   ((wback  (n == t || n == t2))
+  || (t == SP_REGNUM)
+  || (t == PC_REGNUM)
+  || (t2 == SP_REGNUM)
+  || (t2 == PC_REGNUM)
+  || (!load  (n == PC_REGNUM))
+  || (load  (t == t2))
+  /* Triggers Cortex-M3 LDRD errata.  */
+  || (!wback  load  fix_cm3_ldrd  (n == t
+return false;
+
+  if ((TARGET_ARM)
+   ((wback  (n == t || n == t2))
+  || (t2 == PC_REGNUM)
+  || (t % 2 != 0)   /* First destination register is not even.  */
+  || (t2 != t + 1)
+  /* PC can be used as base register (for offset addressing only),
+ but it is depricated.  */
+  || (n == PC_REGNUM)))
+return false

[PING][PATCH, Testsuite] Add new effective target arm_prefer_ldrd_strd

2012-10-18 Thread Greta Yorsh

Ping! Thanks.

-Original Message-
From: Greta Yorsh [mailto:greta.yo...@arm.com] 
Sent: 10 October 2012 15:28
To: GCC Patches
Cc: Ramana Radhakrishnan; Richard Earnshaw; ni...@redhat.com; 
p...@codesourcery.com; Greta Yorsh; mikest...@comcast.net; 
r...@cebitec.uni-bielefeld.de; jani...@codesourcery.com
Subject: [PATCH, Testsuite] Add new effective target arm_prefer_ldrd_strd

In the testsuite, distinguish between arm targets that prefer LDRD/STRD and
arm targets that prefer LDM/STM. This patch adds a new effective target test
and updates documentation accordingly.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/testsuite/

2012-09-13  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/target-supports.exp
(check_effective_target_arm_prefer_ldrd_strd): New procedure.

gcc/

2012-09-13  Greta Yorsh  greta.yo...@arm.com

* doc/sourcebuild.texi: Document new effective target keyword
arm_prefer_ldrd_strd.diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 055567b..b80ee02 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1552,6 +1552,11 @@ ARM target generates Thumb-2 code for @code{-mthumb}.
 @item arm_vfp_ok
 ARM target supports @code{-mfpu=vfp -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
+
+@item arm_prefer_ldrd_strd
+ARM target prefers @code{LDRD} and @code{STRD} instructions over
+@code{LDM} and @code{STM} instructions.
+
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 8f793b7..4bf2424 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2462,6 +2462,18 @@ proc check_effective_target_arm_iwmmxt_ok { } {
 }
 }
 
+# Return true if LDRD/STRD instructions are prefered over LDM/STM instructions
+# for an ARM target.
+proc check_effective_target_arm_prefer_ldrd_strd { } {
+if { ![check_effective_target_arm32] } {
+  return 0;
+}
+
+return [check_no_messages_and_pattern arm_prefer_ldrd_strd strd\tr 
assembly {
+void foo (int *p) { p[0] = 1; p[1] = 0;}
+}  -O2 -mthumb ]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {

[PING][Patch, ARM] cleanup prologue_use pattern

2012-10-17 Thread Greta Yorsh

Ping!

Thanks,
Greta

-Original Message-
From: Greta Yorsh [mailto:greta.yo...@arm.com] 
Sent: 10 October 2012 16:14
To: GCC Patches
Cc: Ramana Radhakrishnan; Richard Earnshaw; ni...@redhat.com;
p...@codesourcery.com
Subject: [Patch, ARM] cleanup prologue_use pattern

The pattern prologue_use is emitted for both prologue and epilogue.
In particular, the assembly comment
@sp needed for prologue
is printed out for both prologue and epilogue.

This patch adds a separate pattern for epilogue_use and replaces
prologue_use with epilogue_use where appropriate.

No regression on qemu for arm-none-eabi.

Ok for trunk?

Thanks,
Greta

2012-09-17  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (UNSPEC_EPILOGUE_USE): New unspec value.
(sibcall_epilogue): Use UNSPEC_EPILOGUE_USE instead of
UNSPEC_PROLOGUE_USE.
(epilogue_use): New define_insn.
(epilogue): Use gen_epilogue_use instead of gen_prologue_use.
* config/arm/arm.c (arm_expand_epilogue): Likewise.
(thumb1_expand_epilogue) Likewise.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd073da..f23c2d0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22581,7 +22581,7 @@ thumb1_expand_epilogue (void)
 
   /* Emit a USE (stack_pointer_rtx), so that
  the stack adjustment will not be deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 
   if (crtl-profile || !TARGET_SCHED_PROLOG)
 emit_insn (gen_blockage ());
@@ -22805,7 +22805,7 @@ arm_expand_epilogue (bool really_return)
 
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
  deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
   else
 {
@@ -22823,7 +22823,7 @@ arm_expand_epilogue (bool really_return)
   emit_insn (gen_movsi (stack_pointer_rtx, hard_frame_pointer_rtx));
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
  deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
 }
   else
@@ -22841,7 +22841,7 @@ arm_expand_epilogue (bool really_return)
  GEN_INT (amount)));
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is
  not deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
 }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a60e659..6a910a3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -81,6 +81,7 @@
 ; instructions setting registers for EH handling
 ; and stack frame generation.  Operand 0 is the
 ; register to use.
+  UNSPEC_EPILOGUE_USE   ; Same for epilogue.
   UNSPEC_CHECK_ARCH ; Set CCs to indicate 26-bit or 32-bit mode.
   UNSPEC_WSHUFH ; Used by the intrinsic form of the iWMMXt WSHUFH 
instruction.
   UNSPEC_WACC   ; Used by the intrinsic form of the iWMMXt WACC 
instruction.
@@ -10610,7 +10611,7 @@
   TARGET_EITHER
   
   if (crtl-calls_eh_return)
-emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, 2)));
+emit_insn (gen_epilogue_use (gen_rtx_REG (Pmode, 2)));
   if (TARGET_THUMB1)
{
  thumb1_expand_epilogue ();
@@ -10644,7 +10645,7 @@
 ;; does not think that it is unused by the sibcall branch that
 ;; will replace the standard function epilogue.
 (define_expand sibcall_epilogue
-   [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_PROLOGUE_USE)
+   [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_EPILOGUE_USE)
(unspec_volatile [(return)] VUNSPEC_EPILOGUE)])]
TARGET_32BIT

@@ -11267,6 +11268,12 @@
   [(set_attr length 0)]
 )
 
+(define_insn epilogue_use
+  [(unspec:SI [(match_operand:SI 0 register_operand )] 
UNSPEC_EPILOGUE_USE)]
+  
+  %@ %0 needed for epilogue
+  [(set_attr length 0)]
+)
 
 ;; Patterns for exception handling

RE: [PING][Patch, ARM] cleanup prologue_use pattern

2012-10-17 Thread Greta Yorsh

I am attaching a new version of the patch, addressing Richard's comments.

This patch renames the exiting pattern prologue_use to force_register_use,
because the pattern is used in both prologue and epilogue.

No regression on qemu for arm-none-eabi.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2012-10-17  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (UNSPEC_PROLOGUE_USE): Rename this...
(UNSPEC_REGISTER_USE): ... to this.
(prologue_use): Rename this...
  (force_register_use): ... to this and update output assembly.
(epilogue) Rename gen_prologue_use to gen_force_register_use.
* config/arm/arm.c (arm_expand_prologue): Likewise.
(thumb1_expand_epilogue): Likewise.
(arm_expand_epilogue): Likewise.
(arm_expand_epilogue): Likewise.




-Original Message-
From: Richard Earnshaw 
Sent: 17 October 2012 11:14
To: Greta Yorsh
Cc: GCC Patches; Ramana Radhakrishnan; ni...@redhat.com;
p...@codesourcery.com
Subject: Re: [PING][Patch, ARM] cleanup prologue_use pattern

On 17/10/12 11:08, Greta Yorsh wrote:
 Ping!


I've been pondering why this was being asked for.  As far as I can tell 
it's just a naming issue (mention of the epilogue in the prologue).

The right thing to do is to rename the pattern to reflect the dual use 
rather than add additional patterns with identical NOP behaviour.  Can't 
you just rename the existing pattern?  Something like force_register_use?

R.

 Thanks,
 Greta

 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 10 October 2012 16:14
 To: GCC Patches
 Cc: Ramana Radhakrishnan; Richard Earnshaw; ni...@redhat.com;
 p...@codesourcery.com
 Subject: [Patch, ARM] cleanup prologue_use pattern

 The pattern prologue_use is emitted for both prologue and epilogue.
 In particular, the assembly comment
 @sp needed for prologue
 is printed out for both prologue and epilogue.

 This patch adds a separate pattern for epilogue_use and replaces
 prologue_use with epilogue_use where appropriate.

 No regression on qemu for arm-none-eabi.

 Ok for trunk?

 Thanks,
 Greta

 2012-09-17  Greta Yorsh  greta.yo...@arm.com

  * config/arm/arm.md (UNSPEC_EPILOGUE_USE): New unspec value.
  (sibcall_epilogue): Use UNSPEC_EPILOGUE_USE instead of
  UNSPEC_PROLOGUE_USE.
  (epilogue_use): New define_insn.
  (epilogue): Use gen_epilogue_use instead of gen_prologue_use.
  * config/arm/arm.c (arm_expand_epilogue): Likewise.
  (thumb1_expand_epilogue) Likewise.


 rename-prolog-use.v2.patch.txt


 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index dd073da..f23c2d0 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -22581,7 +22581,7 @@ thumb1_expand_epilogue (void)

 /* Emit a USE (stack_pointer_rtx), so that
the stack adjustment will not be deleted.  */
 -  emit_insn (gen_prologue_use (stack_pointer_rtx));
 +  emit_insn (gen_epilogue_use (stack_pointer_rtx));

 if (crtl-profile || !TARGET_SCHED_PROLOG)
   emit_insn (gen_blockage ());
 @@ -22805,7 +22805,7 @@ arm_expand_epilogue (bool really_return)

 /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment
is not
deleted.  */
 -  emit_insn (gen_prologue_use (stack_pointer_rtx));
 +  emit_insn (gen_epilogue_use (stack_pointer_rtx));
   }
 else
   {
 @@ -22823,7 +22823,7 @@ arm_expand_epilogue (bool really_return)
 emit_insn (gen_movsi (stack_pointer_rtx,
hard_frame_pointer_rtx));
 /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment
is not
deleted.  */
 -  emit_insn (gen_prologue_use (stack_pointer_rtx));
 +  emit_insn (gen_epilogue_use (stack_pointer_rtx));
   }
   }
 else
 @@ -22841,7 +22841,7 @@ arm_expand_epilogue (bool really_return)
GEN_INT (amount)));
 /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment
is
not deleted.  */
 -  emit_insn (gen_prologue_use (stack_pointer_rtx));
 +  emit_insn (gen_epilogue_use (stack_pointer_rtx));
   }
   }

 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index a60e659..6a910a3 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -81,6 +81,7 @@
   ; instructions setting registers for EH handling
   ; and stack frame generation.  Operand 0 is the
   ; register to use.
 +  UNSPEC_EPILOGUE_USE   ; Same for epilogue.
 UNSPEC_CHECK_ARCH ; Set CCs to indicate 26-bit or 32-bit mode.
 UNSPEC_WSHUFH ; Used by the intrinsic form of the iWMMXt
WSHUFH instruction.
 UNSPEC_WACC   ; Used by the intrinsic form of the iWMMXt WACC
instruction.
 @@ -10610,7 +10611,7 @@
 TARGET_EITHER
 
 if (crtl-calls_eh_return)
 -emit_insn

[Patch, testsuite] add effective target pthread to test gcc.dg/pr54782.c

2012-10-10 Thread Greta Yorsh

The test gcc.dg/pr54782.c uses command line option
-ftree-parallelize-loops=2 which implies -pthread and thus the test fails on
targets that do not support pthread, such as arm-none-eabi.

This patch adds effective target check.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/testsuite

2012-10-05  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/pr54782.c: Require effective target pthread.
diff --git a/gcc/testsuite/gcc.dg/pr54782.c b/gcc/testsuite/gcc.dg/pr54782.c
index 2a30754..161b043 100644
--- a/gcc/testsuite/gcc.dg/pr54782.c
+++ b/gcc/testsuite/gcc.dg/pr54782.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target pthread } */
 /* { dg-options -O -ffast-math -ftree-parallelize-loops=2 -g } */
 
 struct S

[PATCH, Testsuite] Add new effective target arm_prefer_ldrd_strd

2012-10-10 Thread Greta Yorsh

In the testsuite, distinguish between arm targets that prefer LDRD/STRD and
arm targets that prefer LDM/STM. This patch adds a new effective target test
and updates documentation accordingly.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/testsuite/

2012-09-13  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/target-supports.exp
(check_effective_target_arm_prefer_ldrd_strd): New procedure.

gcc/

2012-09-13  Greta Yorsh  greta.yo...@arm.com

* doc/sourcebuild.texi: Document new effective target keyword
arm_prefer_ldrd_strd.diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 055567b..b80ee02 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1552,6 +1552,11 @@ ARM target generates Thumb-2 code for @code{-mthumb}.
 @item arm_vfp_ok
 ARM target supports @code{-mfpu=vfp -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
+
+@item arm_prefer_ldrd_strd
+ARM target prefers @code{LDRD} and @code{STRD} instructions over
+@code{LDM} and @code{STM} instructions.
+
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 8f793b7..4bf2424 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2462,6 +2462,18 @@ proc check_effective_target_arm_iwmmxt_ok { } {
 }
 }
 
+# Return true if LDRD/STRD instructions are prefered over LDM/STM instructions
+# for an ARM target.
+proc check_effective_target_arm_prefer_ldrd_strd { } {
+if { ![check_effective_target_arm32] } {
+  return 0;
+}
+
+return [check_no_messages_and_pattern arm_prefer_ldrd_strd strd\tr 
assembly {
+void foo (int *p) { p[0] = 1; p[1] = 0;}
+}  -O2 -mthumb ]
+}
+
 # Return 1 if this is a PowerPC target with floating-point registers.
 
 proc check_effective_target_powerpc_fprs { } {

[PATCH, ARM][0/3] Prologue/epilogue using STRD/LDRD for ARM mode

2012-10-10 Thread Greta Yorsh

Generate prologue/epilogue using STRD/LDRD in ARM mode, when tuning
prefer_ldrd_strd flag is set, such as in Cortex-A15.

[1/3] Prologue using STRD in ARM mode
[2/3] Epilogue using LDRD in ARM mode
[3/3] Adjust tests gcc.target/arm/interrupt-*.c

Testing and benchmarking:
* No regression on qemu for arm-none-eabi cortex-a15 neon softfp arm.
* Successful bootstrap on Cortex-A15.
* No change in performance for Spec2k on Cortex-A15 hardware.

Ok for trunk?

Thanks,
Greta

[PATCH, ARM][1/3] Prologue using STRD in ARM mode

2012-10-10 Thread Greta Yorsh

Emit prologue using STRD in ARM mode when tune parameter prefer_ldrd_strd is
set.

ChangeLog

gcc/

2012-09-13  Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  Greta.Yorsh at arm.com

* config/arm/arm.c (emit_multi_reg_push): New declaration
for an existing function.
(arm_emit_strd_push): New function.
(arm_expand_prologue): Use here.
(arm_get_frame_offsets): Update condition.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 84f099f..3522da7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -174,6 +174,7 @@ static rtx arm_expand_builtin (tree, rtx, rtx, enum 
machine_mode, int);
 static tree arm_builtin_decl (unsigned, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx emit_set_insn (rtx, rtx);
+static rtx emit_multi_reg_push (unsigned long);
 static int arm_arg_partial_bytes (cumulative_args_t, enum machine_mode,
  tree, bool);
 static rtx arm_function_arg (cumulative_args_t, enum machine_mode,
@@ -15906,6 +15907,108 @@ thumb2_emit_strd_push (unsigned long saved_regs_mask)
   return;
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive 
register
+   pair is found.  It then generates multi register PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi register PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  gcc_assert (num_regs  num_regs = 16);
+
+  /* Var j iterates over all registers to gather all registers in
+ saved_regs_mask.  Var i is used to count number of registers stored on
+ stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+ that can be pushed using multi register PUSH before STRD is
+ generated.  */
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i  num_regs; j--)
+if (saved_regs_mask  (1  j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+i++;
+
+if ((j % 2 == 1)
+ (saved_regs_mask  (1  (j - 1)))
+ regs_to_be_pushed_mask)
+  {
+/* Current register and previous register form register pair for
+   which STRD can be generated.  Hence, emit PUSH for accumulated
+   registers and reset regs_to_be_pushed_mask.  */
+insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+regs_to_be_pushed_mask = 0;
+RTX_FRAME_RELATED_P (insn) = 1;
+continue;
+  }
+
+regs_to_be_pushed_mask |= (1  j);
+
+if ((j % 2) == 0  (saved_regs_mask  (1  (j + 1
+  {
+/* We have found 2 consecutive registers, for which STRD can be
+   generated.  Generate pattern to emit STRD as accumulated
+   registers have already been pushed.  */
+tmp = gen_rtx_SET (DImode,
+   gen_frame_mem (DImode,
+  gen_rtx_PRE_DEC (Pmode, 
stack_pointer_rtx)),
+   gen_rtx_REG (DImode, j));
+tmp = emit_insn(tmp);
+RTX_FRAME_RELATED_P (tmp) = 1;
+
+/* Generate dwarf info.  */
+dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+tmp = gen_rtx_SET (VOIDmode,
+   stack_pointer_rtx,
+   plus_constant (Pmode, stack_pointer_rtx, -8));
+RTX_FRAME_RELATED_P (tmp) = 1;
+XVECEXP (dwarf, 0, 0) = tmp;
+
+tmp = gen_rtx_SET (SImode,
+   gen_frame_mem (SImode, stack_pointer_rtx),
+   gen_rtx_REG (SImode, j));
+tmp1 = gen_rtx_SET (SImode,
+gen_frame_mem (SImode,
+   plus_constant(Pmode,
+ stack_pointer_rtx,
+ 4)),
+gen_rtx_REG (SImode, j+1));
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+XVECEXP (dwarf, 0, 1) = tmp;
+XVECEXP (dwarf, 0, 2) = tmp1;
+
+insn = emit_insn (par);
+add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+RTX_FRAME_RELATED_P (insn) = 1;
+regs_to_be_pushed_mask = 0

[PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode

2012-10-10 Thread Greta Yorsh

Generate prologue/epilogue using STRD/LDRD in Thumb mode, when tuning
prefer_ldrd_strd flag is set, such as in Cortex-A15.

[1/4] New RTL patterns for LDRD/STRD in Thumb mode
[2/4] Prologue using STRD in Thumb mode
[3/4] Epilogue using LDRD in Thumb mode
[4/4] Adjust tests gcc.target/arm/pr40457-*.c

Testing and benchmarking:
* No regression on qemu for arm-none-eabi cortex-a15 neon softfp thumb.
* Successful bootstrap on Cortex-A15.
* 3% performance improvement in a popular embedded benchmark.

Ok for trunk?

Thanks,
Greta

[PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c

2012-10-10 Thread Greta Yorsh

As a result of adding LDRD/STRD patterns in Thumb mode, the compiler
generates LDRD/STRD instead of LDM/STM in some cases. This patch adjusts
existing tests to accept LDRD/STRD in addition to LDM/STM.

ChangeLog

gcc/testsuite

2012-09-13  Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/pr40457-1.c: Adjust expected output.
* gcc.target/arm/pr40457-2.c: Likewise.
* gcc.target/arm/pr40457-3.c: Likewise.diff --git a/gcc/testsuite/gcc.target/arm/pr40457-1.c 
b/gcc/testsuite/gcc.target/arm/pr40457-1.c
index 815fd38..8895659 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-1.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-1.c
@@ -7,4 +7,4 @@ int bar(int* p)
   return x;
 }
 
-/* { dg-final { scan-assembler ldm } } */
+/* { dg-final { scan-assembler ldrd|ldm } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr40457-2.c 
b/gcc/testsuite/gcc.target/arm/pr40457-2.c
index 187f7bf..5079939 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-2.c
@@ -7,4 +7,4 @@ void foo(int* p)
   p[1] = 0;
 }
 
-/* { dg-final { scan-assembler stm } } */
+/* { dg-final { scan-assembler strd|stm } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr40457-3.c 
b/gcc/testsuite/gcc.target/arm/pr40457-3.c
index 9bd5a17..8823a80 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-3.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-3.c
@@ -7,4 +7,4 @@ void foo(int* p)
   p[1] = 0;
 }
 
-/* { dg-final { scan-assembler stm } } */
+/* { dg-final { scan-assembler strd|stm } } */

[Patch, ARM] cleanup prologue_use pattern

2012-10-10 Thread Greta Yorsh

The pattern prologue_use is emitted for both prologue and epilogue.
In particular, the assembly comment
@sp needed for prologue
is printed out for both prologue and epilogue.

This patch adds a separate pattern for epilogue_use and replaces
prologue_use with epilogue_use where appropriate.

No regression on qemu for arm-none-eabi.

Ok for trunk?

Thanks,
Greta

2012-09-17  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (UNSPEC_EPILOGUE_USE): New unspec value.
(sibcall_epilogue): Use UNSPEC_EPILOGUE_USE instead of
UNSPEC_PROLOGUE_USE.
(epilogue_use): New define_insn.
(epilogue): Use gen_epilogue_use instead of gen_prologue_use.
* config/arm/arm.c (arm_expand_epilogue): Likewise.
(thumb1_expand_epilogue) Likewise.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd073da..f23c2d0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22581,7 +22581,7 @@ thumb1_expand_epilogue (void)
 
   /* Emit a USE (stack_pointer_rtx), so that
  the stack adjustment will not be deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 
   if (crtl-profile || !TARGET_SCHED_PROLOG)
 emit_insn (gen_blockage ());
@@ -22805,7 +22805,7 @@ arm_expand_epilogue (bool really_return)
 
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
  deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
   else
 {
@@ -22823,7 +22823,7 @@ arm_expand_epilogue (bool really_return)
   emit_insn (gen_movsi (stack_pointer_rtx, hard_frame_pointer_rtx));
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
  deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
 }
   else
@@ -22841,7 +22841,7 @@ arm_expand_epilogue (bool really_return)
  GEN_INT (amount)));
   /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is
  not deleted.  */
-  emit_insn (gen_prologue_use (stack_pointer_rtx));
+  emit_insn (gen_epilogue_use (stack_pointer_rtx));
 }
 }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a60e659..6a910a3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -81,6 +81,7 @@
 ; instructions setting registers for EH handling
 ; and stack frame generation.  Operand 0 is the
 ; register to use.
+  UNSPEC_EPILOGUE_USE   ; Same for epilogue.
   UNSPEC_CHECK_ARCH ; Set CCs to indicate 26-bit or 32-bit mode.
   UNSPEC_WSHUFH ; Used by the intrinsic form of the iWMMXt WSHUFH 
instruction.
   UNSPEC_WACC   ; Used by the intrinsic form of the iWMMXt WACC 
instruction.
@@ -10610,7 +10611,7 @@
   TARGET_EITHER
   
   if (crtl-calls_eh_return)
-emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, 2)));
+emit_insn (gen_epilogue_use (gen_rtx_REG (Pmode, 2)));
   if (TARGET_THUMB1)
{
  thumb1_expand_epilogue ();
@@ -10644,7 +10645,7 @@
 ;; does not think that it is unused by the sibcall branch that
 ;; will replace the standard function epilogue.
 (define_expand sibcall_epilogue
-   [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_PROLOGUE_USE)
+   [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_EPILOGUE_USE)
(unspec_volatile [(return)] VUNSPEC_EPILOGUE)])]
TARGET_32BIT

@@ -11267,6 +11268,12 @@
   [(set_attr length 0)]
 )
 
+(define_insn epilogue_use
+  [(unspec:SI [(match_operand:SI 0 register_operand )] 
UNSPEC_EPILOGUE_USE)]
+  
+  %@ %0 needed for epilogue
+  [(set_attr length 0)]
+)
 
 ;; Patterns for exception handling

[PING][Patch, ARM] Cleanup in arm_expand_epilogue

2012-08-31 Thread Greta Yorsh

Ping

http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01026.html

From: Greta Yorsh [greta.yo...@arm.com]
Sent: Friday, July 20, 2012 7:33 PM
To: GCC Patches
Cc: Richard Earnshaw; Ramana Radhakrishnan
Subject: [Patch, ARM] Cleanup in arm_expand_epilogue

The variable floats_from_frame in function arm_expand_epilogue became unused
after removal of FPA support. This patch cleans it up and simplifies the
initialization of num_regs variable.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2012-07-20  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_expand_epilogue): Remove unused variable
floats_from_frame.

[PING][Patch,ARM] unwind in epilogue ignore dwarf info

2012-08-31 Thread Greta Yorsh

Ping
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01025.html

From: Greta Yorsh [greta.yo...@arm.com]
Sent: Friday, July 20, 2012 7:28 PM
To: GCC Patches
Cc: Richard Earnshaw; Ramana Radhakrishnan
Subject: [Patch,ARM] unwind in epilogue ignore dwarf info

The final pass of gcc uses dwarf information to generate unwind tables and
directives (e.g., with command line option -fexceptions). Dwarf information
generated for epilogues should be ignored when generating unwind info,
because the ARM ABI only allows unwind at procedure boundaries. This patch
adds a flag unwind_in_epilogue for it. It wasn't needed in the past, because
there was no dwarf info generated for epilogues, but recent patches for
epilogue generation in RTL added it.

No regression on qemu.

2012-07-20  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_unwind_function_begin_epilogue): New
function.
(unwind_in_epilogue) New static variable.
(arm_unwind_emit) Use unwind_in_epilogue flag.
(arm_asm_emit_except_personality) Clear unwind_in_epilogue flag.

[Patch,ARM] unwind in epilogue ignore dwarf info

2012-07-20 Thread Greta Yorsh

The final pass of gcc uses dwarf information to generate unwind tables and
directives (e.g., with command line option -fexceptions). Dwarf information
generated for epilogues should be ignored when generating unwind info,
because the ARM ABI only allows unwind at procedure boundaries. This patch
adds a flag unwind_in_epilogue for it. It wasn't needed in the past, because
there was no dwarf info generated for epilogues, but recent patches for
epilogue generation in RTL added it.

No regression on qemu.

2012-07-20  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_unwind_function_begin_epilogue): New
function.
(unwind_in_epilogue) New static variable.
(arm_unwind_emit) Use unwind_in_epilogue flag.
(arm_asm_emit_except_personality) Clear unwind_in_epilogue flag.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0023f..137980a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -213,6 +213,7 @@ static void arm_unwind_emit (FILE *, rtx);
 static bool arm_output_ttype (rtx);
 static void arm_asm_emit_except_personality (rtx);
 static void arm_asm_init_sections (void);
+static void arm_unwind_function_begin_epilogue (FILE *);
 #endif
 static rtx arm_dwarf_register_span (rtx);
 
@@ -521,6 +522,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_ASM_INIT_SECTIONS
 #define TARGET_ASM_INIT_SECTIONS arm_asm_init_sections
+
+#undef TARGET_ASM_FUNCTION_BEGIN_EPILOGUE
+#define TARGET_ASM_FUNCTION_BEGIN_EPILOGUE arm_unwind_function_begin_epilogue
 #endif /* ARM_UNWIND_INFO */
 
 #undef TARGET_DWARF_REGISTER_SPAN
@@ -811,6 +815,11 @@ unsigned arm_pic_register = INVALID_REGNUM;
the next function.  */
 static int after_arm_reorg = 0;
 
+#if ARM_UNWIND_INFO
+/* Set when epilogue begins and reset at the end of epilogue.  */
+static bool unwind_in_epilogue = false;
+#endif
+
 enum arm_pcs arm_pcs_default;
 
 /* For an explanation of these variables, see final_prescan_insn below.  */
@@ -24813,6 +24822,9 @@ arm_unwind_emit (FILE * asm_out_file, rtx insn)
   rtx note, pat;
   bool handled_one = false;
 
+  if (unwind_in_epilogue)
+return;
+
   if (arm_except_unwind_info (global_options) != UI_TARGET)
 return;
 
@@ -24911,6 +24923,7 @@ arm_output_ttype (rtx x)
 static void
 arm_asm_emit_except_personality (rtx personality)
 {
+  unwind_in_epilogue = false;
   fputs (\t.personality\t, asm_out_file);
   output_addr_const (asm_out_file, personality);
   fputc ('\n', asm_out_file);
@@ -24924,6 +24937,14 @@ arm_asm_init_sections (void)
   exception_section = get_unnamed_section (0, output_section_asm_op,
   \t.handlerdata);
 }
+
+/* Implement TARGET_ASM_FUNCTION_BEGIN_EPILOGUE.  */
+static void
+arm_unwind_function_begin_epilogue (FILE *file)
+{
+  unwind_in_epilogue = true;
+}
+
 #endif /* ARM_UNWIND_INFO */
 
 /* Output unwind directives for the start/end of a function.  */

[Patch, ARM] Cleanup in arm_expand_epilogue

2012-07-20 Thread Greta Yorsh

The variable floats_from_frame in function arm_expand_epilogue became unused
after removal of FPA support. This patch cleans it up and simplifies the
initialization of num_regs variable.

Ok for trunk?

Thanks,
Greta

ChangeLog

gcc/

2012-07-20  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_expand_epilogue): Remove unused variable
floats_from_frame.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 627b436..659d6b3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23006,8 +23015,7 @@ arm_expand_epilogue (bool really_return)
   int num_regs = 0;
   int i;
   int amount;
-  int floats_from_frame = 0;
   arm_stack_offsets *offsets;
 
   func_type = arm_current_func_type ();
 
@@ -23033,18 +23042,7 @@ arm_expand_epilogue (bool really_return)
   /* Get frame offsets for ARM.  */
   offsets = arm_get_frame_offsets ();
   saved_regs_mask = offsets-saved_regs_mask;
-
-  /* Find offset of floating point register from frame pointer.
- The initialization is done in this way to take care of frame pointer
- and static-chain register, if stored.  */
-  floats_from_frame = offsets-saved_args - offsets-frame;
-  /* Compute how many registers saved and how far away the floats will be.  */
-  for (i = 0; i = LAST_ARM_REGNUM; i++)
-if (saved_regs_mask  (1  i))
-  {
-num_regs++;
-floats_from_frame += 4;
-  }
+  num_regs = bit_count (saved_regs_mask);
 
   if (frame_pointer_needed)
 {

[Patch, ARM][1/2] Add prefer_ldrd_strd field to tune

2012-07-20 Thread Greta Yorsh

Add a new field to tune_params structure to indicate whether LDRD/STRD
instructions are preferred over POP/PUSH/STM/LDM. Set the new field to false
for all existing tunes. Subsequent patches will set it to true for
Cortex-A15 and use it to determine which instructions to emit, in particular
for prologue and epilogue.

Ok for trunk?

Thanks,
Greta

Changelog

gcc/

2012-07-20  Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (prefer_ldrd_strd): New field.
* config/arm/arm.c (arm_slowmul_tune): Initialized the new field.
(arm_fastmul_tune, arm_strongarm_tune): Likewise.
(arm_xscale_tune, arm_9e_tune, arm_v6t2_tune): Likewise.
(rm_cortex_tune, am_cortex_a5_tune, arm_cortex_a9_tune): Likewise.
(arm_fa726te_tune): Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index ba5802e..7cd6a7c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -245,6 +245,8 @@ struct tune_params
   int l1_cache_line_size;
   bool prefer_constant_pool;
   int (*branch_cost) (bool, bool);
+  /* Prefer STRD/LDRD instructions over PUSH/POP/LDM/STM.  */
+  bool prefer_ldrd_strd;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a385e30..3f13a3d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -878,7 +878,8 @@ const struct tune_params arm_slowmul_tune =
   5,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,/* Prefer constant 
pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -889,7 +890,8 @@ const struct tune_params arm_fastmul_tune =
   5,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,/* Prefer constant 
pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -903,7 +905,8 @@ const struct tune_params arm_strongarm_tune =
   3,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,/* Prefer constant 
pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -914,7 +917,8 @@ const struct tune_params arm_xscale_tune =
   3,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,/* Prefer constant 
pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -925,7 +929,8 @@ const struct tune_params arm_9e_tune =
   5,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,/* Prefer constant 
pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -936,7 +941,8 @@ const struct tune_params arm_v6t2_tune =
   5,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,   /* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -948,7 +954,8 @@ const struct tune_params arm_cortex_tune =
   5,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,   /* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -962,7 +969,8 @@ const struct tune_params arm_cortex_a5_tune =
   1,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,   /* Prefer constant pool.  */
-  arm_cortex_a5_branch_cost
+  arm_cortex_a5_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -973,7 +981,8 @@ const struct tune_params

[Patch, ARM][2/2] Create tune for Cortex-A15.

2012-07-20 Thread Greta Yorsh

This patch creates a new tune_param structure for Cortex-A15. The new tune
is identical to the generic cortex tune used for Cortex-A15 before this
patch, except the field prefer_ldrd_strd is set to true. This field will be
used by subsequent patches, in particular for prologue/epilogue.

Ok for trunk?

Thanks,
Greta

Changelog

gcc/

2012-07-20  Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_cortex_a15_tune): New tune.
* config/arm/arm-cores.def (cortex-a15): Use new tune.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 1407771..b8d4ab6 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,7 +129,7 @@ ARM_CORE(cortex-a5, cortexa5, 7A, 
 FL_LDSCHED, cortex_a5)
 ARM_CORE(cortex-a7,cortexa7, 7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
 ARM_CORE(cortex-a8,cortexa8, 7A,  
FL_LDSCHED, cortex)
 ARM_CORE(cortex-a9,cortexa9, 7A,  
FL_LDSCHED, cortex_a9)
-ARM_CORE(cortex-a15,   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE(cortex-a15,   cortexa15,7A,  
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE(cortex-r4,cortexr4, 7R,  
FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4f,   cortexr4f,7R,  
FL_LDSCHED, cortex)
 ARM_CORE(cortex-r5,cortexr5, 7R,  
FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3f13a3d..29d1974 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -958,6 +958,18 @@ const struct tune_params arm_cortex_tune =
   false /* Prefer LDRD/STRD.  */
 };
 
+const struct tune_params arm_cortex_a15_tune =
+{
+  arm_9e_rtx_costs,
+  NULL,
+  1,   /* Constant limit.  */
+  5,   /* Max cond insns.  */
+  ARM_PREFETCH_NOT_BENEFICIAL,
+  false,   /* Prefer constant pool.  */
+  arm_default_branch_cost,
+  true  /* Prefer LDRD/STRD.  */
+};
+
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
less appealing.  Set max_insns_skipped to a low value.  */

[Patch, ARM] Fix PR53859: ICE on armv7e-m

2012-07-10 Thread Greta Yorsh

New RTL patterns generated for epilogues with RETURN (trunk r188742) are not
recognized by the pattern matching code in arm_early_load_addr_dep, which is
used for insn latency calculation when tuning for cortex-m4. It causes an
ICE when tuning for armv7e-m or cortex-m4:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53859.

The obvious fix is to detect RETURN pattern in arm_early_load_addr_dep. 

No regression on qemu.

Ok for trunk?

Thanks,
Greta

ChangeLog

2012-07-10  Greta Yorsh  greta.yo...@arm.com

gcc/
PR target/53859
* config/arm/arm.c (arm_early_load_addr_dep): Handle new
epilogue patterns.

gcc/testsuite

PR target/53859
* gcc.target/arm/pr53859.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a385e30..4a71a14 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24038,7 +24038,12 @@ arm_early_load_addr_dep (rtx producer, rtx consumer)
   if (GET_CODE (addr) == COND_EXEC)
 addr = COND_EXEC_CODE (addr);
   if (GET_CODE (addr) == PARALLEL)
-addr = XVECEXP (addr, 0, 0);
+{
+  if (GET_CODE (XVECEXP (addr, 0, 0)) == RETURN)
+addr = XVECEXP (addr, 0, 1);
+  else
+addr = XVECEXP (addr, 0, 0);
+}
   addr = XEXP (addr, 1);
 
   return reg_overlap_mentioned_p (value, addr);
diff --git a/gcc/testsuite/gcc.target/arm/pr53859.c 
b/gcc/testsuite/gcc.target/arm/pr53859.c
new file mode 100644
index 000..e4e9380
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr53859.c
@@ -0,0 +1,11 @@
+/* PR target/53859 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options -mcpu=cortex-m4 -mthumb -O2 } */
+
+void bar (int,int,char* ,int);
+
+void foo (char c)
+{
+bar (1,2,c,3);
+}

[Patch, Testsuite, ARM] Improve test gcc.target/arm/handler-align.c

2012-06-18 Thread Greta Yorsh

This test checks that the stack pointer is handled correctly in
prologue/epilogue of Cortex-M interrupt handlers. An interrupt handler may
be called when stack is not double-word aligned. The prologue of the
interrupt handler aligns the stack pointer and the epilogue restores the
original stack pointer.

However, in this test, the stack is always double-word aligned when the
handler function foo is called. As a result, the test is not very effective,
for example it passes even if the epilogue does not restore the stack
pointer. This patch forces the stack pointer to be not double-word aligned
on the call to foo.

Tested on qemu -cpu cortex-m3.

Ok for trunk?

Thanks,
Greta

ChangeLog:

gcc/testsuite

2012-06-18  Greta Yorsh  greta.yo...@arm.com

* gcc.target/arm/handler-align.c (main): Force the stack pointer
to be not double-word aligned on the call to the interrupt handler.diff --git a/gcc/testsuite/gcc.target/arm/handler-align.c 
b/gcc/testsuite/gcc.target/arm/handler-align.c
index 6c5187b..b0efa58 100644
--- a/gcc/testsuite/gcc.target/arm/handler-align.c
+++ b/gcc/testsuite/gcc.target/arm/handler-align.c
@@ -29,8 +29,15 @@ int main()
/* Check stack pointer before/after calling the interrupt
  * handler. Not equal means that handler doesn't restore
  * stack correctly.  */
save_sp = sp;
-   foo();
+
+/* The stack is always double-word aligned here. To test interrupt 
handling,
+   force the stack to be not double-word aligned. */
+asm volatile (sub\tsp, sp, #4 : : : memory );
+foo ();
+/* Restore the stack.  */
+asm volatile (add\t sp, sp, #4 : : : memory );
+
/* Abort here instead of return non-zero. Due to wrong sp, lr value,
 * returning from main may not work.  */
if (save_sp != sp)

[patch, testsuite] new test for arm epilogue

2012-06-18 Thread Greta Yorsh

This test relies on epilogue generated in RTL to provide register liveness
information that enables peephole optimization.

OK for trunk?

Thanks,
Greta


2012-06-18  Joey Ye joey...@arm.com
Greta Yorsh  greta.yo...@arm.com

 * gcc.target/arm/epilog-1.c: New test.
diff --git a/gcc/testsuite/gcc.target/arm/epilog-1.c 
b/gcc/testsuite/gcc.target/arm/epilog-1.c
new file mode 100644
index 000..f97f1eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/epilog-1.c
@@ -0,0 +1,17 @@
+/* Register liveness information from epilgoue enables peephole optimization. 
*/
+/* { dg-do compile } */
+/* { dg-options -mthumb -Os } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+
+volatile int g_k;
+extern void bar(int, int, int, int);
+
+int foo(int a, int b, int c, int d)
+{
+  if (g_k  4) c++;
+  bar (a, b, c, d);
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times lsls.*#29 1 } } */
+/* { dg-final { scan-assembler-not tst } } */

RE: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)

2012-06-18 Thread Greta Yorsh

Paul, 

I did additional testing of the patches, as you suggested. 

For iwmmxt, no regression on qemu (using -cpu pxa270) for arm-none-eabi
taget configured --with-cpu iwmmxt --with-float soft --with-arch iwmmxt
--with-abi iwmmxt --disable-multilib. There is already a test for mmx stack
alignment in gcc.target/arm/mmx-1.c.  I have also tested a few other options
(including -mtcps-frame and -mtpcs-leaf-frame) on several examples and
haven't found any problems with the patches (at least, not yet :)

Separately, I submitted a couple of testsuite patches related to RTL
epilogue:
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01175.html
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01176.html

FPA support is in the process of being removed from ARM backend trunk:
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00825.html

I hope it addresses your concerns. 

Following Richard's comments, I removed FPA support from RTL epilogue
patches, rebased patches to trunk, and fixed some formatting problems. I'll
go ahead and apply individual patches that have already been approved. 

Thank you,
Greta

 -Original Message-
 From: Paul Brook [mailto:p...@codesourcery.com]
 Sent: 31 May 2012 19:18
 To: Greta Yorsh
 Cc: GCC Patches; jos...@codesourcery.com; Richard Earnshaw;
 sameera.deshpa...@gmail.com; Ramana Radhakrishnan; ni...@redhat.com
 Subject: Re: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's
 patches, Part I)
 
  Testing:
  * Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp
 and
  tested in three configuration: -marm (default), -mthumb, -mapcs-
 frame. No
  regression on qemu.
  * Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
  regression on qemu.
  * Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
  arm1136jf-s. No regression on qemu.
  * Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with
 eglibc
  and used this compiler to build AEL linux kernel. It boots
 successfully. *
  Bootstrap the compiler on cortex-a8 successfully for
  --languages=c,c++,fortran and used this compiler to build gdb. No
  regression with check-gcc and check-gdb.
 
 What other testing have you done?  Thate's a good number of
 combinations not
 covered by your above list.  In particular:
 - Coverage of old cores looks pretty thin.  In particular ARMv4t has
 different
 interworking requirements.
 - iWMMXT has special alignment requirements.
 - Interrupt functions with special prologue/epilogue.  Both traditional
 ARM
 and Cortex-M3.
 - -mtpcs-frame and -mtpcs-leaf-frame
 
 Some of these options are orthogonal.
 
 As you've proved with -mapcs-frame it's near impossible to get these
 right
 without actually testing them.I'm not saying you have to do a full
 testrun
 in every combination, but it's worth testing a representative selection
 of
 functions (large and small frame, leaf or not, with and without frame
 pointer,
 uses alloca, etc).  Also worth explicitly clobbering a selection (both
 odd and
 even numbers) of callee saved registers to make sure we get that right.
 Any
 difference in the output should be manually verified (ideally the
 assembly
 output would be identical).
 
  * The patches have not been explicitly tested with any FPA variants
 (which
  are deprecated in 4.7 and expected to become obsolete in 4.8).
 
 I'm not keen on breaking these without actually removing them.
 
 Paul

RE: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)

2012-06-01 Thread Greta Yorsh


On 31 May 2012 19:18, Paul Brook wrote:
  Testing:
  * Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp
 and
  tested in three configuration: -marm (default), -mthumb, -mapcs-
 frame. No
  regression on qemu.
  * Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
  regression on qemu.
  * Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
  arm1136jf-s. No regression on qemu.
  * Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with
 eglibc
  and used this compiler to build AEL linux kernel. It boots
 successfully. *
  Bootstrap the compiler on cortex-a8 successfully for
  --languages=c,c++,fortran and used this compiler to build gdb. No
  regression with check-gcc and check-gdb.
 
 What other testing have you done?  Thate's a good number of
 combinations not
 covered by your above list.  In particular:
 - Coverage of old cores looks pretty thin.  In particular ARMv4t has
 different
 interworking requirements.

I ran a full regression test of gcc configured with cpu arm7tdmi on qemu. Is
there another ARMv4t configuration that should be tested?

 - iWMMXT has special alignment requirements.
 - Interrupt functions with special prologue/epilogue.  Both traditional
 ARM
 and Cortex-M3.

A few tests for interrupt functions are included in gcc's regression suite.
Specifically, the test gcc.target/arm/handler-align.c checks that the stack
pointer is handled correctly in prologue/epilogue of Cortex-M interrupt
handlers. I have a patch (not yet posted) to make this test more effective. 

 - -mtpcs-frame and -mtpcs-leaf-frame
 
 Some of these options are orthogonal.
 
 As you've proved with -mapcs-frame it's near impossible to get these
 right
 without actually testing them.I'm not saying you have to do a full
 testrun
 in every combination, but it's worth testing a representative selection
 of
 functions (large and small frame, leaf or not, with and without frame
 pointer,
 uses alloca, etc).  
 Also worth explicitly clobbering a selection (both
 odd and
 even numbers) of callee saved registers to make sure we get that right.
 Any
 difference in the output should be manually verified (ideally the
 assembly
 output would be identical).

For interrupt-related tests, interworking, and several other tests, I've
compared the assembly outputs before and after the patch (and caught a
couple of bugs this way).
In most cases now, the assembly outputs before and after the patch are
identical. The few differences I have seen are due to successful compiler
optimizations, where we benefit from having generated epilogues in RTL. For
example, replacing sub sp, fp, #0 with mov sp, fp in epilogue. Also,
explicit write to callee-saved registers to restore them in epilgoue allows
the data flow analysis pass to deduce that registers are dead and enables
peephole optimizations that were not possible before. 

 
  * The patches have not been explicitly tested with any FPA variants
 (which
  are deprecated in 4.7 and expected to become obsolete in 4.8).
 
 I'm not keen on breaking these without actually removing them.

Thanks for pointing out additional configurations to test. I will test
-mtpcs-frame and -mtpcs-leaf-frame as you suggested and run regression tests
for iWMMXT. 

Properly testing FPA variants at this point is a lot of work, especially
considering the fact that these variants are obsolete. What minimal
configurations would be sufficient to test?

Thank you,
Greta

[Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)

2012-05-31 Thread Greta Yorsh

This sequence of patches adds support for epilogue generation in RTL.

This is the first part of Sameera's work on ARM prologue/epilogue. Sameera
Deshpande posted it for review in December 2011, having addressed all
previous comments: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00049.html.
The latest version hasn't been approved yet. Originally, it was split into
two patches:
[1/2]: Thumb2 epilogue in RTL
[2/2]: ARM epilogue in RTL
I rebased Sameera's patches, made small changes in the patterns and fixed
RTL epilogue generated for -mapcs-frame. To make reviewing easier, I split
the patches into smaller steps:
* Reorganization - already committed upstream.
* New insn and expand patterns - main functionality change.
* Cleanup of dead code.

Here is the list of patches:
 1-update-predicate.patch
 2-patterns.patch
 3-patterns-vfp.patch
 4-expand-epilog-apcs-frame.patch
 5-expand-epilog.patch
 6-simple-return.patch
 7-expand-thumb2-return.patch
 8-remove-dead-code.patch

Testing:
* Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp and
tested in three configuration: -marm (default), -mthumb, -mapcs-frame. No
regression on qemu.
* Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
regression on qemu.
* Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
arm1136jf-s. No regression on qemu.
* Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with eglibc and
used this compiler to build AEL linux kernel. It boots successfully.
* Bootstrap the compiler on cortex-a8 successfully for
--languages=c,c++,fortran and used this compiler to build gdb. No regression
with check-gcc and check-gdb.

Notes:
* The patches are to be applied in the above order.
* The patches are not intended to be used individually.
* The patches have been tested only as a sequence.
* The patches have been tested on gcc from 20 March 2012 (fsf trunk r185582
of gcc-4.8 stage 1). The patches apply cleanly to current trunk (r188056).
* The patches have not been explicitly tested with any FPA variants (which
are deprecated in 4.7 and expected to become obsolete in 4.8).

Ok for trunk?

Thank you,
Greta

[Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p

2012-05-31 Thread Greta Yorsh

This patch updates ldm_stm_operation_p to check for loads that if SP is in
the register list, then the base register is SP. It guarantees that SP is
reset correctly when an LDM instruction is interrupted. Otherwise, we might
end up with a corrupt stack. 

ChangeLog:

gcc

2012-05-31  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (ldm_stm_operation_p): Require SP
  as base register for loads if SP is in the register list.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3290e2..4717725 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10247,6 +10247,12 @@ ldm_stm_operation_p (rtx op, bool load, enum 
machine_mode mode,
   if (!REG_P (addr))
 return false;
 
+  /* Don't allow SP to be loaded unless it is also the base register. It
+ guarantees that SP is reset correctly when an LDM instruction
+ is interruptted. Otherwise, we might end up with a corrupt stack.  */
+  if (load  (REGNO (reg) == SP_REGNUM)  (REGNO (addr) != SP_REGNUM))
+return false;
+
   for (; i  count; i++)
 {
   elt = XVECEXP (op, 0, i);
@@ -10270,6 +10276,10 @@ ldm_stm_operation_p (rtx op, bool load, enum 
machine_mode mode,
   || (consecutive
(REGNO (reg) !=
   (unsigned int) (first_regno + regs_per_val * (i - base
+  /* Don't allow SP to be loaded unless it is also the base register. 
It
+ guarantees that SP is reset correctly when an LDM instruction
+ is interrupted. Otherwise, we might end up with a corrupt stack.  
*/
+  || (load  (REGNO (reg) == SP_REGNUM)  (REGNO (addr) != 
SP_REGNUM))
   || !MEM_P (mem)
   || GET_MODE (mem) != mode
   || ((GET_CODE (XEXP (mem, 0)) != PLUS

[Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs

2012-05-31 Thread Greta Yorsh

This patch adds new define_insn patterns for epilogue with integer
registers.

The patterns can handle pop multiple with writeback and return (loading into
PC directly).
To handle return, the patterns use a new special predicate
pop_multiple_return, that uses ldm_stm_operation_p function from a previous
patch. To output assembly, the patterns use a new function
arm_output_multireg_pop.

This patch also adds a new function arm_emit_multi_reg_pop
that emits RTL that matches the new pop patterns for integer registers.
This is a helper function for epilogue expansion. It is used by a later
patch.

ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (load_multiple_with_writeback) New define_insn.
(load_multiple, pop_multiple_with_writeback_and_return) Likewise.
(pop_multiple_with_return, ldr_with_return) Likewise.
* config/arm/predicates.md (pop_multiple_return) New special
predicate.
* config/arm/arm-protos.h (arm_output_multireg_pop) New declaration.
* config/arm/arm.c (arm_output_multireg_pop) New function.
(arm_emit_multi_reg_pop): New function.
(ldm_stm_operation_p): Check SP in the register list.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 53c2aef..7b25e37 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -156,6 +156,7 @@ extern intarm_emit_vector_const (FILE *, rtx);
 extern void arm_emit_fp16_const (rtx c);
 extern const char * arm_output_load_gr (rtx *);
 extern const char *vfp_output_fstmd (rtx *);
+extern void arm_output_multireg_pop (rtx *, bool, rtx, bool, bool);
 extern void arm_set_return_address (rtx, rtx);
 extern int arm_eliminable_register (rtx);
 extern const char *arm_output_shift(rtx *, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4717725..9093801 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13815,6 +13815,84 @@ vfp_output_fldmd (FILE * stream, unsigned int base, 
int reg, int count)
 }
 
 
+/* OPERANDS[0] is the entire list of insns that constitute pop,
+   OPERANDS[1] is the base register, RETURN_PC is true iff return insn
+   is in the list, UPDATE is true iff the list contains explicit
+   update of base register.
+ */
+void
+arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
+ bool update)
+{
+  int i;
+  char pattern[100];
+  int offset;
+  const char *conditional;
+  int num_saves = XVECLEN (operands[0], 0);
+  unsigned int regno;
+  unsigned int regno_base = REGNO (operands[1]);
+
+  offset = 0;
+  offset += update ? 1 : 0;
+  offset += return_pc ? 1 : 0;
+
+  /* Is the base register in the list? */
+  for (i = offset; i  num_saves; i++)
+{
+  regno = REGNO (XEXP (XVECEXP (operands[0], 0, i), 0));
+  /* If SP is in the list, then the base register must be SP. */
+  gcc_assert ((regno != SP_REGNUM) || (regno_base == SP_REGNUM));
+  /* If base register is in the list, there must be no explicit update.  */
+  if (regno == regno_base)
+gcc_assert (!update);
+}
+
+  conditional = reverse ? %?%D0 : %?%d0;
+  if ((regno_base == SP_REGNUM)  TARGET_UNIFIED_ASM)
+{
+  /* Output pop (not stmfd) because it has a shorter encoding. */
+  gcc_assert (update);
+  sprintf (pattern, pop%s\t{, conditional);
+}
+  else
+{
+  /* Output ldmfd when the base register is SP, otherwise output ldmia.
+ It's just a convention, their semantics are identical.  */
+  if (regno_base == SP_REGNUM)
+sprintf (pattern, ldm%sfd\t, conditional);
+  else if (TARGET_UNIFIED_ASM)
+sprintf (pattern, ldmia%s\t, conditional);
+  else
+sprintf (pattern, ldm%sia\t, conditional);
+
+  strcat (pattern, reg_names[regno_base]);
+  if (update)
+strcat (pattern, !, {);
+  else
+strcat (pattern, , {);
+}
+
+  /* Output the first destination register. */
+  strcat (pattern,
+  reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, offset), 0))]);
+
+  /* Output the rest of the destination registers.  */
+  for (i = offset + 1; i  num_saves; i++)
+{
+  strcat (pattern, , );
+  strcat (pattern,
+  reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i), 0))]);
+}
+
+  strcat (pattern, });
+
+  if (IS_INTERRUPT (arm_current_func_type ())  return_pc)
+strcat (pattern, ^);
+
+  output_asm_insn (pattern, cond);
+}
+
+
 /* Output the assembly for a store multiple.  */
 
 const char *
@@ -16461,6 +16539,85 @@ emit_multi_reg_push (unsigned long mask)
   return par;
 }
 
+/* Generate and emit an insn pattern that we will recognize as a pop_multi.
+   SAVED_REGS_MASK shows which registers need to be restored.
+
+   Unfortunately, since this insn does not reflect very well the actual
+   semantics of the operation, we need

[Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs

2012-05-31 Thread Greta Yorsh

New define insn pattern for epilogue with floating point registers (DFmode)
and a new function that emits RTL for this pattern. This function is a
helper for epilogue extension. It is used by a later patch.

ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (vfp_pop_multiple_with_writeback) New
define_insn.
* config/arm/predicates.md (pop_multiple_fp) New special predicate.
* config/arm/arm.c (arm_emit_vfp_multi_reg_pop): New function.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9093801..491ffea 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16618,6 +16618,76 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
   REG_NOTES (par) = dwarf;
 }
 
+/* Generate and emit an insn pattern that we will recognize as a pop_multi
+   of NUM_REGS consecutive VFP regs, starting at FIRST_REG.
+
+   Unfortunately, since this insn does not reflect very well the actual
+   semantics of the operation, we need to annotate the insn for the benefit
+   of DWARF2 frame unwind information.  */
+static void
+arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
+{
+  int i, j;
+  rtx par;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg;
+
+  gcc_assert (num_regs  num_regs = 32);
+
+/* Workaround ARM10 VFPr1 bug.  */
+  if (num_regs == 2  !arm_arch6)
+{
+  if (first_reg == 15)
+first_reg--;
+
+  num_regs++;
+}
+
+  /* We can emit at most 16 D-registers in a single pop_multi instruction, and
+ there could be up to 32 D-registers to restore.
+ If there are more than 16 D-registers, make two recursive calls,
+ each of which emits one pop_multi instruction.  */
+  if (num_regs  16)
+{
+  arm_emit_vfp_multi_reg_pop (first_reg, 16, base_reg);
+  arm_emit_vfp_multi_reg_pop (first_reg + 16, num_regs - 16, base_reg);
+  return;
+}
+
+  /* The parallel needs to hold num_regs SETs
+ and one SET for the stack update.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* Increment the stack pointer, based on there being
+ num_regs 8-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+ base_reg,
+ plus_constant (base_reg, 8 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  XVECEXP (par, 0, 0) = tmp;
+
+  /* Now show every reg that will be restored, using a SET for each.  */
+  for (j = 0, i=first_reg; j  num_regs; i += 2)
+{
+  reg = gen_rtx_REG (DFmode, i);
+
+  tmp = gen_rtx_SET (VOIDmode,
+ reg,
+ gen_frame_mem
+ (DFmode,
+  plus_constant (base_reg, 8 * j)));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  XVECEXP (par, 0, j + 1) = tmp;
+
+  dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+  j++;
+}
+
+  par = emit_insn (par);
+  REG_NOTES (par) = dwarf;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 862ccf4..98387fa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11042,6 +11042,41 @@
   [(set_attr type load1)
(set_attr predicable yes)]
 )
+;; Pop for floating point registers (as used in epilogue RTL)
+(define_insn *vfp_pop_multiple_with_writeback
+  [(match_parallel 0 pop_multiple_fp
+[(set (match_operand:SI 1 s_register_operand +rk)
+  (plus:SI (match_dup 1)
+   (match_operand:SI 2 const_int_operand I)))
+ (set (match_operand:DF 3 arm_hard_register_operand )
+  (mem:DF (match_dup 1)))])]
+  TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP
+  *
+  {
+int num_regs = XVECLEN (operands[0], 0);
+char pattern[100];
+rtx op_list[2];
+strcpy (pattern, \fldmfdd\\t\);
+strcat (pattern, reg_names[REGNO (SET_DEST (XVECEXP (operands[0], 0, 
0)))]);
+strcat (pattern, \!, {\);
+op_list[0] = XEXP (XVECEXP (operands[0], 0, 1), 0);
+strcat (pattern, \%P0\);
+if ((num_regs - 1)  1)
+  {
+strcat (pattern, \-%P1\);
+op_list [1] = XEXP (XVECEXP (operands[0], 0, num_regs - 1), 0);
+  }
+
+strcat (pattern, \}\);
+output_asm_insn (pattern, op_list);
+return \\;
+  }
+  
+  [(set_attr type load4)
+   (set_attr conds unconditional)
+   (set_attr predicable no)]
+)
+
 ;; Special patterns for dealing with the constant pool
 
 (define_insn align_4
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 24dd4ea..92114bd 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -401,6 +401,14 @@
  /*return_pc=*/true);
 })
 
+(define_special_predicate pop_multiple_fp
+  (match_code parallel)
+{
+ return ldm_stm_operation_p (op, /*load=*/true, DFmode

[Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame

2012-05-31 Thread Greta Yorsh

Helper function for epilogue expansion. Emit RTL for APCS frame epilogue
(when -mapcs-frame command line option is specified).
This function is used by a later patch.

For APCS frame epilogue, the compiler currently generates LDM with SP as
both the base register
and one of the destination registers. For example:

@ APCS_FRAME epilogue
ldmfd   sp, {r4, fp, sp, pc}

@ non-APCS_FRAME epilogue
ldmfd sp!, {r4, fp, pc}

The use of SP in LDM register list is deprecated, but this patch does not
address the problem.

To generate the epilogue for APCS frame in RTL, this patch adds a new
alternative to arm_addsi2 insn in ARM mode only to generate sub sp, fp,
#imm. Previously, there was no pattern to generate sub with SP as the
destination register and not SP as the operand register.


ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.c (arm_expand_epilogue_apcs_frame): New function.
* config/arm/arm.md (arm_addsi3) Add an alternative.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 491ffea..d6b4c2e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22896,6 +22896,232 @@ thumb1_expand_epilogue (void)
 emit_use (gen_rtx_REG (SImode, LR_REGNUM));
 }
 
+/* Epilogue code for APCS frame.  */
+static void
+arm_expand_epilogue_apcs_frame (bool really_return)
+{
+  unsigned long func_type;
+  unsigned long saved_regs_mask;
+  int num_regs = 0;
+  int i;
+  int floats_from_frame = 0;
+  arm_stack_offsets *offsets;
+
+  gcc_assert (TARGET_APCS_FRAME  frame_pointer_needed  TARGET_ARM);
+  func_type = arm_current_func_type ();
+
+  /* Get frame offsets for ARM.  */
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets-saved_regs_mask;
+
+  /* Find the offset of the floating-point save area in the frame.  */
+  floats_from_frame = offsets-saved_args - offsets-frame;
+
+  /* Compute how many core registers saved and how far away the floats are.  */
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  {
+num_regs++;
+floats_from_frame += 4;
+  }
+
+  if (TARGET_HARD_FLOAT  TARGET_VFP)
+{
+  int start_reg;
+
+  /* The offset is from IP_REGNUM.  */
+  int saved_size = arm_get_vfp_saved_size ();
+  if (saved_size  0)
+{
+  floats_from_frame += saved_size;
+  emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
+ hard_frame_pointer_rtx,
+ GEN_INT (-floats_from_frame)));
+}
+
+  /* Generate VFP register multi-pop.  */
+  start_reg = FIRST_VFP_REGNUM;
+
+  for (i = FIRST_VFP_REGNUM; i  LAST_VFP_REGNUM; i += 2)
+/* Look for a case where a reg does not need restoring.  */
+if ((!df_regs_ever_live_p (i) || call_used_regs[i])
+ (!df_regs_ever_live_p (i + 1)
+|| call_used_regs[i + 1]))
+  {
+if (start_reg != i)
+  arm_emit_vfp_multi_reg_pop (start_reg,
+  (i - start_reg) / 2,
+  gen_rtx_REG (SImode,
+   IP_REGNUM));
+start_reg = i + 2;
+  }
+
+  /* Restore the remaining regs that we have discovered (or possibly
+ even all of them, if the conditional in the for loop never
+ fired).  */
+  if (start_reg != i)
+arm_emit_vfp_multi_reg_pop (start_reg,
+(i - start_reg) / 2,
+gen_rtx_REG (SImode, IP_REGNUM));
+}
+  else if (TARGET_FPA_EMU2)
+{
+  for (i = LAST_FPA_REGNUM; i = FIRST_FPA_REGNUM; i--)
+if (df_regs_ever_live_p (i)  !call_used_regs[i])
+  {
+rtx addr;
+rtx insn;
+floats_from_frame += 12;
+addr = gen_rtx_MEM (XFmode,
+gen_rtx_PLUS (SImode,
+  hard_frame_pointer_rtx,
+  GEN_INT (- floats_from_frame)));
+set_mem_alias_set (addr, get_frame_alias_set ());
+insn = emit_insn (gen_rtx_SET (XFmode,
+   gen_rtx_REG (XFmode, i),
+   addr));
+REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+   gen_rtx_REG (XFmode, i),
+   NULL_RTX);
+  }
+}
+  else
+{
+  int idx = 0;
+  rtx load_seq[4];
+  rtx dwarf = NULL_RTX;
+  rtx par;
+  rtx frame_mem;
+
+  for (i = LAST_FPA_REGNUM; i = FIRST_FPA_REGNUM; i--)
+{
+  /* We can't unstack more than four registers at once.  */
+  if (idx == 4

[Patch, ARM][5/8] Epilogue in RTL: expand

2012-05-31 Thread Greta Yorsh

The main function for epilogue RTL generation, used by expand epilogue
patterns.

ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (arm_expand_epilogue): New declaration.
* config/arm/arm.c (arm_expand_epilogue): New function.
* config/arm/arm.md (epilogue): Update condition and code.
(sibcall_epilogue): Likewise.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 7b25e37..f61feef 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsigned long);
 extern int arm_volatile_func (void);
 extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
+extern void arm_expand_epilogue (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
 extern void thumb2_asm_output_opcode (FILE *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d6b4c2e..c8642e2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23122,6 +23122,326 @@ arm_expand_epilogue_apcs_frame (bool really_return)
   emit_jump_insn (simple_return_rtx);
 }
 
+/* Generate RTL to represent ARM epilogue.  Really_return is true if the
+   function is not a sibcall.  */
+void
+arm_expand_epilogue (bool really_return)
+{
+  unsigned long func_type;
+  unsigned long saved_regs_mask;
+  int num_regs = 0;
+  int i;
+  int amount;
+  int floats_from_frame = 0;
+  arm_stack_offsets *offsets;
+
+  func_type = arm_current_func_type ();
+
+  /* Naked functions don't have epilogue.  Hence, generate return pattern, and
+ let output_return_instruction take care of instruction emition if any.  */
+  if (IS_NAKED (func_type)
+  || (IS_VOLATILE (func_type)  TARGET_ABORT_NORETURN))
+{
+  emit_jump_insn (simple_return_rtx);
+  return;
+}
+
+  /* If we are throwing an exception, then we really must be doing a
+ return, so we can't tail-call.  */
+  gcc_assert (!crtl-calls_eh_return || really_return);
+
+  if (TARGET_APCS_FRAME  frame_pointer_needed  TARGET_ARM)
+{
+  arm_expand_epilogue_apcs_frame (really_return);
+  return;
+}
+
+  /* Get frame offsets for ARM.  */
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets-saved_regs_mask;
+
+  /* Find offset of floating point register from frame pointer.
+ The initialization is done in this way to take care of frame pointer
+ and static-chain register, if stored.  */
+  floats_from_frame = offsets-saved_args - offsets-frame;
+  /* Compute how many registers saved and how far away the floats will be.  */
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  {
+num_regs++;
+floats_from_frame += 4;
+  }
+
+  if (frame_pointer_needed)
+{
+  /* Restore stack pointer if necessary.  */
+  if (TARGET_ARM)
+{
+  /* In ARM mode, frame pointer points to first saved register.
+ Restore stack pointer to last saved register.  */
+  amount = offsets-frame - offsets-saved_regs;
+
+  /* Force out any pending memory operations that reference stacked 
data
+ before stack de-allocation occurs.  */
+  emit_insn (gen_blockage ());
+  emit_insn (gen_addsi3 (stack_pointer_rtx,
+ hard_frame_pointer_rtx,
+ GEN_INT (amount)));
+
+  /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
+ deleted.  */
+  emit_insn (gen_prologue_use (stack_pointer_rtx));
+}
+  else
+{
+  /* In Thumb-2 mode, the frame pointer points to the last saved
+ register.  */
+  amount = offsets-locals_base - offsets-saved_regs;
+  if (amount)
+emit_insn (gen_addsi3 (hard_frame_pointer_rtx,
+   hard_frame_pointer_rtx,
+   GEN_INT (amount)));
+
+  /* Force out any pending memory operations that reference stacked 
data
+ before stack de-allocation occurs.  */
+  emit_insn (gen_blockage ());
+  emit_insn (gen_movsi (stack_pointer_rtx, hard_frame_pointer_rtx));
+  /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
+ deleted.  */
+  emit_insn (gen_prologue_use (stack_pointer_rtx));
+}
+}
+  else
+{
+  /* Pop off outgoing args and local frame to adjust stack pointer to
+ last saved register.  */
+  amount = offsets-outgoing_args - offsets-saved_regs;
+  if (amount)
+{
+  /* Force out any pending memory operations that reference stacked 
data
+ before stack de-allocation occurs.  */
+  emit_insn (gen_blockage

[Patch, ARM][6/8] Epilogue in RTL: simple return

2012-05-31 Thread Greta Yorsh

Add a new parameter to the function output_return_instruction to handle
simple cases of return when no epilogue needs to be printed out.

ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (output_return_instruction): New
parameter.
* config/arm/arm.c (output_return_instruction): New parameter.
* config/arm/arm.md (arm_simple_return): New pattern.
(arm_return, cond_return, cond_return_inverted): Add new arguments.
* config/arm/thumb2.md (thumb2_return): Update condition and code.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f61feef..01cd794 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -148,7 +148,7 @@ extern int arm_address_offset_is_imm (rtx);
 extern const char *output_add_immediate (rtx *);
 extern const char *arithmetic_instr (rtx, int);
 extern void output_ascii_pseudo_op (FILE *, const unsigned char *, int);
-extern const char *output_return_instruction (rtx, int, int);
+extern const char *output_return_instruction (rtx, int, int, int);
 extern void arm_poke_function_name (FILE *, const char *);
 extern void arm_final_prescan_insn (rtx);
 extern int arm_debugger_arg_offset (int, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c8642e2..e7a74e0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15592,9 +15592,11 @@ arm_get_vfp_saved_size (void)
 
 
 /* Generate a function exit sequence.  If REALLY_RETURN is false, then do
-   everything bar the final return instruction.  */
+   everything bar the final return instruction.  If simple_return is true,
+   then do not output epilogue, because it has already been emitted in RTL.  */
 const char *
-output_return_instruction (rtx operand, int really_return, int reverse)
+output_return_instruction (rtx operand, int really_return, int reverse,
+   int simple_return)
 {
   char conditional[10];
   char instr[100];
@@ -15637,7 +15639,7 @@ output_return_instruction (rtx operand, int 
really_return, int reverse)
   offsets = arm_get_frame_offsets ();
   live_regs_mask = offsets-saved_regs_mask;
 
-  if (live_regs_mask)
+  if (!simple_return  live_regs_mask)
 {
   const char * return_reg;
 
@@ -15765,7 +15767,7 @@ output_return_instruction (rtx operand, int 
really_return, int reverse)
{
  /* The return has already been handled
 by loading the LR into the PC.  */
- really_return = 0;
+  return ;
}
 }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index d1c1894..867dcbe 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8597,7 +8597,7 @@
 arm_ccfsm_state += 2;
 return \\;
   }
-return output_return_instruction (const_true_rtx, TRUE, FALSE);
+return output_return_instruction (const_true_rtx, TRUE, FALSE, FALSE);
   }
   [(set_attr type load1)
(set_attr length 12)
@@ -8618,7 +8618,7 @@
 arm_ccfsm_state += 2;
 return \\;
   }
-return output_return_instruction (operands[0], TRUE, FALSE);
+return output_return_instruction (operands[0], TRUE, FALSE, FALSE);
   }
   [(set_attr conds use)
(set_attr length 12)
@@ -8639,13 +8639,30 @@
 arm_ccfsm_state += 2;
 return \\;
   }
-return output_return_instruction (operands[0], TRUE, TRUE);
+return output_return_instruction (operands[0], TRUE, TRUE, FALSE);
   }
   [(set_attr conds use)
(set_attr length 12)
(set_attr type load1)]
 )
 
+(define_insn *arm_simple_return
+  [(simple_return)]
+  TARGET_ARM
+  *
+  {
+if (arm_ccfsm_state == 2)
+  {
+arm_ccfsm_state += 2;
+return \\;
+  }
+return output_return_instruction (const_true_rtx, TRUE, FALSE, TRUE);
+  }
+  [(set_attr type branch)
+   (set_attr length 4)
+   (set_attr predicable yes)]
+)
+
 ;; Generate a sequence of instructions to determine if the processor is
 ;; in 26-bit or 32-bit mode, and return the appropriate return address
 ;; mask.
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 39a2138..b7a8423 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -635,17 +635,12 @@
(set_attr length 20)]
 )
 
-;; Note: this is not predicable, to avoid issues with linker-generated
-;; interworking stubs.
 (define_insn *thumb2_return
-  [(return)]
-  TARGET_THUMB2  USE_RETURN_INSN (FALSE)
-  *
-  {
-return output_return_instruction (const_true_rtx, TRUE, FALSE);
-  }
-  [(set_attr type load1)
-   (set_attr length 12)]
+  [(simple_return)]
+  TARGET_THUMB2
+  * return output_return_instruction (const_true_rtx, TRUE, FALSE, TRUE);
+  [(set_attr type branch)
+   (set_attr length 4)]
 )
 
 (define_insn_and_split thumb2_eh_return

[Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return

2012-05-31 Thread Greta Yorsh

Generate RTL for return in Thumb2 mode. Used by expand of return insn.

ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (thumb2_expand_return): New declaration.
* config/arm/arm.c (thumb2_expand_return): New function.
* config/arm/arm.md (return): Update condition and code.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 01cd794..2fef0f2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -31,6 +31,7 @@ extern int arm_volatile_func (void);
 extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void thumb2_expand_return (void);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
 extern void thumb2_asm_output_opcode (FILE *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e7a74e0..8bc6dcc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22841,6 +22841,52 @@ thumb1_expand_prologue (void)
 cfun-machine-lr_save_eliminated = 0;
 }
 
+/* Generate pattern *pop_multiple_with_stack_update_and_return if single
+   POP instruction can be generated.  LR should be replaced by PC.  All
+   the checks required are already done by  USE_RETURN_INSN ().  Hence,
+   all we really need to check here is if single register is to be
+   returned, or multiple register return.  */
+void
+thumb2_expand_return (void)
+{
+  int i, num_regs;
+  unsigned long saved_regs_mask;
+  arm_stack_offsets *offsets;
+
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets-saved_regs_mask;
+
+  for (i = 0, num_regs = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  if (saved_regs_mask)
+{
+  if (num_regs == 1)
+{
+  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+  rtx reg = gen_rtx_REG (SImode, PC_REGNUM);
+  rtx addr = gen_rtx_MEM (SImode,
+  gen_rtx_POST_INC (SImode,
+stack_pointer_rtx));
+  set_mem_alias_set (addr, get_frame_alias_set ());
+  XVECEXP (par, 0, 0) = ret_rtx;
+  XVECEXP (par, 0, 1) = gen_rtx_SET (SImode, reg, addr);
+  RTX_FRAME_RELATED_P (XVECEXP (par, 0, 1)) = 1;
+  emit_jump_insn (par);
+}
+  else
+{
+  saved_regs_mask = ~ (1  LR_REGNUM);
+  saved_regs_mask |=   (1  PC_REGNUM);
+  arm_emit_multi_reg_pop (saved_regs_mask);
+}
+}
+  else
+{
+  emit_jump_insn (simple_return_rtx);
+}
+}
 
 void
 thumb1_expand_epilogue (void)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 867dcbe..387ca15 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8583,8 +8583,20 @@
 
 (define_expand return
   [(return)]
-  TARGET_32BIT  USE_RETURN_INSN (FALSE)
-  )
+  (TARGET_ARM || (TARGET_THUMB2
+ARM_FUNC_TYPE (arm_current_func_type ()) == ARM_FT_NORMAL
+!IS_STACKALIGN (arm_current_func_type (
+ USE_RETURN_INSN (FALSE)
+  
+  {
+if (TARGET_THUMB2)
+  {
+thumb2_expand_return ();
+DONE;
+  }
+  }
+  
+)
 
 ;; Often the return insn will be the same as loading from memory, so set attr
 (define_insn *arm_return

[Patch, ARM][8/8] Epilogue in RTL: remove dead code

2012-05-31 Thread Greta Yorsh

As a result of the previous changes, epilogue_insns pattern can only be
generated in Thumb1. After removing other cases in define_insn for
epilogue_insns, the function arm_output_epilogue becomes dead code and can
be eliminated, along with all its helper functions.


ChangeLog:

gcc

2012-05-31  Ian Bolton  ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-protos.h (arm_output_epilogue): Remove.
* config/arm/arm.c (print_multi_reg): Remove.
(vfp_output_fldmd): Likewise.
(arm_output_epilogue): Likewise.
* config/arm/arm.md (epilogue_insns): Update condition and code.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 34de513..b97773b 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,6 @@ extern int use_return_insn (int, rtx);
 extern enum reg_class arm_regno_class (int);
 extern void arm_load_pic_register (unsigned long);
 extern int arm_volatile_func (void);
-extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
 extern void thumb2_expand_return (void);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 903517d..712e38f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13886,86 +13886,6 @@ fp_const_from_val (REAL_VALUE_TYPE *r)
   gcc_unreachable ();
 }
 
-/* Output the operands of a LDM/STM instruction to STREAM.
-   MASK is the ARM register set mask of which only bits 0-15 are important.
-   REG is the base register, either the frame pointer or the stack pointer,
-   INSTR is the possibly suffixed load or store instruction.
-   RFE is nonzero if the instruction should also copy spsr to cpsr.  */
-
-static void
-print_multi_reg (FILE *stream, const char *instr, unsigned reg,
-unsigned long mask, int rfe)
-{
-  unsigned i;
-  bool not_first = FALSE;
-
-  gcc_assert (!rfe || (mask  (1  PC_REGNUM)));
-  fputc ('\t', stream);
-  asm_fprintf (stream, instr, reg);
-  fputc ('{', stream);
-
-  for (i = 0; i = LAST_ARM_REGNUM; i++)
-if (mask  (1  i))
-  {
-   if (not_first)
- fprintf (stream, , );
-
-   asm_fprintf (stream, %r, i);
-   not_first = TRUE;
-  }
-
-  if (rfe)
-fprintf (stream, }^\n);
-  else
-fprintf (stream, }\n);
-}
-
-
-/* Output a FLDMD instruction to STREAM.
-   BASE if the register containing the address.
-   REG and COUNT specify the register range.
-   Extra registers may be added to avoid hardware bugs.
-
-   We output FLDMD even for ARMv5 VFP implementations.  Although
-   FLDMD is technically not supported until ARMv6, it is believed
-   that all VFP implementations support its use in this context.  */
-
-static void
-vfp_output_fldmd (FILE * stream, unsigned int base, int reg, int count)
-{
-  int i;
-
-  /* Workaround ARM10 VFPr1 bug.  */
-  if (count == 2  !arm_arch6)
-{
-  if (reg == 15)
-   reg--;
-  count++;
-}
-
-  /* FLDMD may not load more than 16 doubleword registers at a time. Split the
- load into multiple parts if we have to handle more than 16 registers.  */
-  if (count  16)
-{
-  vfp_output_fldmd (stream, base, reg, 16);
-  vfp_output_fldmd (stream, base, reg + 16, count - 16);
-  return;
-}
-
-  fputc ('\t', stream);
-  asm_fprintf (stream, fldmfdd\t%r!, {, base);
-
-  for (i = reg; i  reg + count; i++)
-{
-  if (i  reg)
-   fputs (, , stream);
-  asm_fprintf (stream, d%d, i);
-}
-  fputs (}\n, stream);
-
-}
-
-
 /* OPERANDS[0] is the entire list of insns that constitute pop,
OPERANDS[1] is the base register, RETURN_PC is true iff return insn
is in the list, UPDATE is true iff the list contains explicit
@@ -16061,451 +15981,6 @@ arm_output_function_prologue (FILE *f, HOST_WIDE_INT 
frame_size)
 
 }
 
-const char *
-arm_output_epilogue (rtx sibling)
-{
-  int reg;
-  unsigned long saved_regs_mask;
-  unsigned long func_type;
-  /* Floats_offset is the offset from the virtual frame.  In an APCS
- frame that is $fp + 4 for a non-variadic function.  */
-  int floats_offset = 0;
-  rtx operands[3];
-  FILE * f = asm_out_file;
-  unsigned int lrm_count = 0;
-  int really_return = (sibling == NULL);
-  int start_reg;
-  arm_stack_offsets *offsets;
-
-  /* If we have already generated the return instruction
- then it is futile to generate anything else.  */
-  if (use_return_insn (FALSE, sibling) 
-  (cfun-machine-return_used_this_function != 0))
-return ;
-
-  func_type = arm_current_func_type ();
-
-  if (IS_NAKED (func_type))
-/* Naked functions don't have epilogues.  */
-return ;
-
-  if (IS_VOLATILE (func_type)  TARGET_ABORT_NORETURN)
-{
-  rtx op;
-
-  /* A volatile function should never return.  Call abort.  */
-  op = gen_rtx_SYMBOL_REF (Pmode, NEED_PLT_RELOC ? abort(PLT) : abort);
-  assemble_external_libcall (op

RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-05-30 Thread Greta Yorsh

I'm attaching an updated version of the patch, addressing the comments from
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01615.html

This patch adds arm32 to targets that support vect_char_mult. In addition,
the test is updated to prevent vectorization of the initialization loop. The
expected number of vectorized loops is adjusted accordingly. 

No regression with check-gcc on qemu for arm-none-eabi cortex-a9 neon softfp
arm/thumb.

OK for trunk?

Thanks,
Greta

ChangeLog

gcc/testsuite

2012-05-30  Greta Yorsh  Greta.Yorsh at arm.com

* gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization
of the initialization loop.
(dg-final): Adjust the expected number of vectorized loops
depending on vect_char_mult target selector.
* lib/target-supports.exp (check_effective_target_vect_char_mult):
Add
  arm32 to targets


 -Original Message-
 From: Richard Earnshaw [mailto:rearn...@arm.com]
 Sent: 25 April 2012 17:30
 To: Richard Guenther
 Cc: Greta Yorsh; gcc-patches@gcc.gnu.org; mikest...@comcast.net;
 r...@cebitec.uni-bielefeld.de
 Subject: Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-
 perm-8.c
 
 On 25/04/12 15:31, Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh
 greta.yo...@arm.com
  wrote:
  The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with
 neon
  enabled:
  FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
  vectorized
  1
  loops 2
 
  The test expects 2 loops to be vectorized, while gcc
 successfully
  vectorizes
  3 loops in this test using neon on arm. This patch adjusts the
  expected
  output. Fixed test passes on qemu for arm and powerpc.
 
  OK for trunk?
 
  I think the proper fix is to instead of
 
for (i = 0; i  N; i++)
  {
input[i] = i;
output[i] = 0;
if (input[i]  256)
  abort ();
  }
 
  use
 
for (i = 0; i  N; i++)
  {
input[i] = i;
output[i] = 0;
__asm__ volatile ();
  }
 
  to prevent vectorization of initialization loops.
 
  Actually, it looks like both arm and powerpc vectorize this
  initialization loop (line 31), because the control flow is hoisted
  outside the loop by previous optimizations. In addition, arm with
 neon
  vectorizes the second loop (line 39), but powerpc does not:
 
  39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 *
 9;
 
  If this is the expected behaviour for powerpc, then the patch I
  proposed is still needed to fix the test failure on arm. Also,
 there
  would be no need to disable vectorization of the initialization
 loop,
  right?
 
  Ah, I thought that was what changed.  Btw, the if () abort () tries
 to
  disable
  vectorization but does not succeed in doing so.
 
  Richard.
 
  Here is an updated patch. It prevents vectorization of the
 initialization
  loop, as Richard suggested, and updates the expected number of
 vectorized
  loops accordingly. This patch assumes that the second loop in main
 (line 39)
  should only be vectorized on arm with neon.  The test passes for arm
 and
  powerpc.
 
  OK for trunk?
 
  If arm cannot handle 9 * i then the approrpiate condition would be
  vect_int_mult, not arm_neon_ok.
 
 
 The issue is that arm has (well, should be marked has having)
 vect_char_mult.  The difference in count of vectorized loops is based
 on
 that.
 
 R.
 
  Ok with that change.
 
  Richard.
 
  Thank you,
  Greta
 
  gcc/testsuite/ChangeLog
 
  2012-04-25  Greta Yorsh  greta.yo...@arm.com
 
 * gcc.dg/vect/slp-perm-8.c (main): Prevent
 vectorization of initialization loop.
 (dg-final): Adjust the expected number of
 vectorized loops.
 
 
 
 
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..c4854d5 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -32,8 +32,7 @@ int main (int argc, const char* argv[])
 {
   input[i] = i;
   output[i] = 0;
-  if (input[i]  256)
-abort ();
+  __asm__ volatile ();
 }
 
   for (i = 0; i  N / 3; i++)
@@ -52,7 +51,8 @@ int main (int argc, const char* argv[])
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target 
vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_perm_byte  vect_char_mult } } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { 
vect_perm_byte  {! vect_char_mult } } } } } */
 /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { 
target vect_perm_byte } } } */
 /* { dg-final { cleanup-tree-dump vect } } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target

RE: [PATCH 12/13] Adjust relevant test cases wrt -ftrack-macro-expansion=[0|2]

2012-05-02 Thread Greta Yorsh

There are a couple more test that need adjusting:
gcc.dg/fixed-point/operator-bitwise.c
gcc.dg/fixed-point/composite-type.c

These tests fail on arm-none-eabi. 
Below is a patch that fixes them.

Thanks,
Greta

gcc/testsuite

2012-05-02  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/fixed-point/composite-type.c (dg-options): Add
option -ftrack-macro-expansion=0.
* gcc.dg/fixed-point/operator-bitwise.c (dg-options): Add
option -ftrack-macro-expansion=0.

diff --git a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
index 5ae1198..026bdaf 100644
--- a/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/composite-type.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -std=gnu99 -O -Wall -Wno-unused } */
+/* { dg-options -std=gnu99 -O -Wall -Wno-unused -ftrack-macro-expansion=0
} */
 
 /* C99 6.2.7: Compatible type and composite type.  */
 
diff --git a/gcc/testsuite/gcc.dg/fixed-point/operator-bitwise.c
b/gcc/testsuite/gcc.dg/fixed-point/operator-bitwise.c
index 31aecf5..6ba817d 100644
--- a/gcc/testsuite/gcc.dg/fixed-point/operator-bitwise.c
+++ b/gcc/testsuite/gcc.dg/fixed-point/operator-bitwise.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -std=gnu99 } */
+/* { dg-options -std=gnu99 -ftrack-macro-expansion=0 } */
 
 /* C99 6.5.10: Bitwise AND operator.
C99 6.5.11: Bitwise exclusive OR operator.


 -Original Message-
 From: Mike Stump [mailto:mikest...@comcast.net]
 Sent: 30 April 2012 17:09
 To: Dodji Seketeli
 Cc: Gabriel Dos Reis; GCC Patches; Tom Tromey; Jason Merrill; Paolo
 Carlini; Benjamin De Kosnik
 Subject: Re: [PATCH 12/13] Adjust relevant test cases wrt -ftrack-
 macro-expansion=[0|2]
 
 On Apr 29, 2012, at 10:38 AM, Dodji Seketeli wrote:
  While bootstrapping the tree again, it appeared that an output
  regression of the objc test objc.dg/foreach-7.m flew below my radar.
 
  This looks fairly obvious to me, but I am CC-ing Mike Stump, just in
  case.
 
 That's fine.

[Patch, testsuite] missing -ftrack-macro-expansion=0 option in gcc.dg/builtin-stringop-chk-1.c

2012-05-02 Thread Greta Yorsh

The test gcc.dg/builtin-stringop-chk-1.c fails on arm-none-eabi because the
command line option -ftrack-macro-expansion=0 is missing. 

This command-line option has recently been added to dg-options directive in
this test, but for arm targets the first dg-options directive in the test is
overwritten by a second dg-options that does not contain
-ftrack-macro-expansion=0.

This patch replaces the second dg-options directive with
dg-additional-options. Fixed test passes on qemu. 

OK for trunk?

Thanks,
Greta

gcc/testsuite

2012-05-02  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/builtin-stringop-chk-1.c (dg-options): Replace
dg-options for target arm with dg-additional-options.


diff --git a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
index beecab6..5cec6b3 100644
--- a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
@@ -2,7 +2,7 @@
are emitted properly.  */
 /* { dg-do compile } */
 /* { dg-options -O2 -std=gnu99 -ftrack-macro-expansion=0 } */
-/* { dg-options -mstructure-size-boundary=8 -O2 -std=gnu99 { target
arm*-*-* } } */
+/* { dg-additional-options -mstructure-size-boundary=8 { target arm*-*-*
} } */
 
 extern void abort (void);

Add myself to write-after-approval section of MAINTAINERS file

2012-05-01 Thread Greta Yorsh

I have just committed the patch below to add myself to the
write-after-approval section of the MAINTAINERS file.

Thanks,

Greta

ChangeLog:

2012-05-01  Greta Yorsh  greta.yo...@arm.com

* MAINTAINERS (Write After Approval): Add myself.


Index: MAINTAINERS
===
--- MAINTAINERS (revision 187012)
+++ MAINTAINERS (working copy)
@@ -530,6 +530,7 @@
 Canqun Yangcan...@nudt.edu.cn
 Jeffrey Yasskinjyass...@google.com
 Joey Yejoey...@arm.com
+Greta Yorshgreta.yo...@arm.com
 David Yustedavid.yu...@gmail.com
 Kirill Yukhin  kirill.yuk...@gmail.com
 Kenneth Zadeck zad...@naturalbridge.com

RE: [Patch, ARM] rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue

2012-05-01 Thread Greta Yorsh

Ping!
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01485.html

Thanks,
Greta

 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 24 April 2012 17:41
 To: gcc-patches@gcc.gnu.org
 Cc: p...@codesourcery.com; Ramana Radhakrishnan; Richard Earnshaw;
 ni...@redhat.com
 Subject: [Patch, ARM] rename thumb_unexpanded_epilogue to
 thumb1_unexpanded_epilogue
 
 Rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue.
 
 In preparation for epilogue generation in RTL and anyway it's the right
 name
 for this function.
 
 Ok for trunk?
 
 Thanks,
 Greta
 
 gcc/ChangeLog
 
 2012-04-24  Ian Bolton  ian.bolton at arm.com
 Sameera Deshpande  sameera.deshpande at arm.com
 Greta Yorsh  greta.yorsh at arm.com
 
 * config/arm/arm-protos.h (thumb_unexpanded_epilogue): Rename
 to...
 (thumb1_unexpanded_epilogue): ...this.
 * config/arm/arm.c (thumb_unexpanded_epilogue): Rename to...
 (thumb1_unexpanded_epilogue): ...this.
 * config/arm/arm.md (thumb_unexpanded_epilogue): Rename to...
 (thumb1_unexpanded_epilogue): ...this.

[PING][Patch, Testsuite] fix failure in test gcc.dg/pr52283.c

2012-04-25 Thread Greta Yorsh

PING! Here is the original post:
http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01235.html

This patch fixes the failure in gcc.dg/pr52283.c by adding the missing
dg-warning and dg-options.

OK for trunk?

Thanks,
Greta

gcc/testsuite/ChangeLog

2012-04-20  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/pr52283.c: Add missing dg-warning and dg-options.


diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c
index 33785a5..070e71a 100644
--- a/gcc/testsuite/gcc.dg/pr52283.c
+++ b/gcc/testsuite/gcc.dg/pr52283.c
@@ -1,6 +1,7 @@
 /* Test for case labels not integer constant expressions but folding
to integer constants (used in Linux kernel).  */
 /* { dg-do compile } */
+/* { dg-options -pedantic } */
 
 extern unsigned int u;
 
@@ -9,7 +10,7 @@ b (int c)
 {
   switch (c)
 {
-case (int) (2  | ((4  8) ? 8 : u)):
+case (int) (2  | ((4  8) ? 8 : u)): /* { dg-warning case label is not
an integer constant expression } */
   ;
 }
 }

[Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh

The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon enabled:
FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect vectorized 1
loops 2

The test expects 2 loops to be vectorized, while gcc successfully vectorizes
3 loops in this test using neon on arm. This patch adjusts the expected
output. Fixed test passes on qemu for arm and powerpc.

OK for trunk?

Thanks,
Greta

gcc/testsuite/ChangeLog

2012-04-23  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/vect/slp-perm-8.c (dg-final): Adjust expected number
of vectorized loops for arm with neon.
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..beaa96c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -52,7 +52,9 @@ int main (int argc, const char* argv[])
   return 0;
 }

-/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target 
vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { 
vect_perm_byte  arm_neon_ok } } } } */
+/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target { 
vect_perm_byte  arm_neon_ok } } } }*/
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_perm_byte  {! arm_neon_ok } } } } } */
 /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { 
target vect_perm_byte } } } */
 /* { dg-final { cleanup-tree-dump vect } } */

RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh

Richard Guenther wrote:
 On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
   The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with neon
  enabled:
   FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
 vectorized
  1
   loops 2
  
   The test expects 2 loops to be vectorized, while gcc successfully
  vectorizes
   3 loops in this test using neon on arm. This patch adjusts the
  expected
   output. Fixed test passes on qemu for arm and powerpc.
  
   OK for trunk?
 
  I think the proper fix is to instead of
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        if (input[i]  256)
          abort ();
      }
 
  use
 
    for (i = 0; i  N; i++)
      {
        input[i] = i;
        output[i] = 0;
        __asm__ volatile ();
      }
 
  to prevent vectorization of initialization loops.
 
  Actually, it looks like both arm and powerpc vectorize this
 initialization loop (line 31), because the control flow is hoisted
 outside the loop by previous optimizations. In addition, arm with neon
 vectorizes the second loop (line 39), but powerpc does not:
 
  39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 * 9;
 
  If this is the expected behaviour for powerpc, then the patch I
 proposed is still needed to fix the test failure on arm. Also, there
 would be no need to disable vectorization of the initialization loop,
 right?
 
 Ah, I thought that was what changed.  Btw, the if () abort () tries to
 disable
 vectorization but does not succeed in doing so.
 
 Richard.

Here is an updated patch. It prevents vectorization of the initialization
loop, as Richard suggested, and updates the expected number of vectorized
loops accordingly. This patch assumes that the second loop in main (line 39)
should only be vectorized on arm with neon.  The test passes for arm and
powerpc.

OK for trunk?

Thank you,
Greta

gcc/testsuite/ChangeLog

2012-04-25  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/vect/slp-perm-8.c (main): Prevent
vectorization of initialization loop. 
(dg-final): Adjust the expected number of 
vectorized loops.




diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..aaa6cbb 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -32,8 +32,7 @@ int main (int argc, const char* argv[])
 {
   input[i] = i;
   output[i] = 0;
-  if (input[i]  256)
-abort ();
+  __asm__ volatile ();
 }
 
   for (i = 0; i  N / 3; i++)
@@ -52,7 +51,8 @@ int main (int argc, const char* argv[])
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target 
vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_perm_byte  arm_neon_ok } } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target { 
vect_perm_byte  {! arm_neon_ok } } } } } */
 /* { dg-final { scan-tree-dump-times vectorizing stmts using SLP 1 vect { 
target vect_perm_byte } } } */
 /* { dg-final { cleanup-tree-dump vect } } */

RE: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-perm-8.c

2012-04-25 Thread Greta Yorsh

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: 25 April 2012 15:32
 To: Greta Yorsh
 Cc: gcc-patches@gcc.gnu.org; mikest...@comcast.net; r...@cebitec.uni-
 bielefeld.de; Richard Earnshaw
 Subject: Re: [Patch, testsuite] fix failure in test gcc.dg/vect/slp-
 perm-8.c

 On Wed, Apr 25, 2012 at 4:27 PM, Greta Yorsh greta.yo...@arm.com
 wrote:
  Richard Guenther wrote:
  On Wed, Apr 25, 2012 at 3:34 PM, Greta Yorsh greta.yo...@arm.com
  wrote:
   Richard Guenther wrote:
   On Wed, Apr 25, 2012 at 1:51 PM, Greta Yorsh
 greta.yo...@arm.com
   wrote:
The test gcc.dg/vect/slp-perm-8.c fails on arm-none-eabi with
 neon
   enabled:
FAIL: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect
  vectorized
   1
loops 2

The test expects 2 loops to be vectorized, while gcc
 successfully
   vectorizes
3 loops in this test using neon on arm. This patch adjusts the
   expected
output. Fixed test passes on qemu for arm and powerpc.

OK for trunk?

   I think the proper fix is to instead of

     for (i = 0; i  N; i++)
       {
         input[i] = i;
         output[i] = 0;
         if (input[i]  256)
           abort ();
       }

   use

     for (i = 0; i  N; i++)
       {
         input[i] = i;
         output[i] = 0;
         __asm__ volatile ();
       }

   to prevent vectorization of initialization loops.

   Actually, it looks like both arm and powerpc vectorize this
  initialization loop (line 31), because the control flow is hoisted
  outside the loop by previous optimizations. In addition, arm with
 neon
  vectorizes the second loop (line 39), but powerpc does not:

   39: not vectorized: relevant stmt not supported: D.2163_8 = i_40 *
 9;

   If this is the expected behaviour for powerpc, then the patch I
  proposed is still needed to fix the test failure on arm. Also, there
  would be no need to disable vectorization of the initialization
 loop,
  right?

  Ah, I thought that was what changed.  Btw, the if () abort () tries
 to
  disable
  vectorization but does not succeed in doing so.

  Richard.

  Here is an updated patch. It prevents vectorization of the
 initialization
  loop, as Richard suggested, and updates the expected number of
 vectorized
  loops accordingly. This patch assumes that the second loop in main
 (line 39)
  should only be vectorized on arm with neon.  The test passes for arm
 and
  powerpc.

  OK for trunk?

 If arm cannot handle 9 * i then the approrpiate condition would be
 vect_int_mult, not arm_neon_ok.

It's the other way around: arm can handle this multiplication, but powerpc
does not handle it for some reason. 

Thank you,
Greta

 Ok with that change.

 Richard.

  Thank you,
  Greta

  gcc/testsuite/ChangeLog

  2012-04-25  Greta Yorsh  greta.yo...@arm.com

         * gcc.dg/vect/slp-perm-8.c (main): Prevent
         vectorization of initialization loop.
         (dg-final): Adjust the expected number of
         vectorized loops.

[Patch, ARM][0/2] Prepartion for Epilogue in RTL

2012-04-24 Thread Greta Yorsh

The following patches perform code reorganization in preparation for
epilogue generation in RTL.

[1/2] move the code of the special predicates load_multiple_operation and
store_multiple_operation into a separate function ldm_stm_operation_p
[2/2] generalize ldm_stm_operation_p 

No regression on qemu for arm-none-eabi neon softfp arm/thumb.

Ok for trunk?

Thanks,
Greta

[Patch, ARM][1/2] add ldm_stm_operation_p

2012-04-24 Thread Greta Yorsh

Move the code of the special predicates load_multiple_operation and
store_multiple_operation into a separate function. No change in
functionality. 

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (ldm_stm_operation_p): New declaration.
* config/arm/arm.c (ldm_stm_operation_p): New function.
* config/arm/predicates.md (load_multiple_operation): Update
predicate.
(store_multiple_operation): Likewise.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 900d09a..7da0e90 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -62,6 +62,7 @@ extern bool arm_legitimize_reload_address (rtx *, enum 
machine_mode, int, int,
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
 extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
+extern bool ldm_stm_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e5779ce..74f4abf 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10138,6 +10138,150 @@ adjacent_mem_locations (rtx a, rtx b)
   return 0;
 }
 
+/* Return true if OP is a valid load or store multiple operation.  LOAD is true
+   for load operations, false for store operations.
+   The pattern we are trying to match for load is:
+ [(SET (R_d0) (MEM (PLUS (addr) (offset
+  (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment
+   :
+   :
+  (SET (R_dn) (MEM (PLUS (addr) (offset + n * reg_increment
+ ]
+ where
+ 1.  If offset is 0, first insn should be (SET (R_d0) (MEM (src_addr))).
+ 2.  REGNO (R_d0)  REGNO (R_d1)  ...  REGNO (R_dn).
+ 3.  If consecutive is TRUE, then for kth register being loaded,
+ REGNO (R_dk) = REGNO (R_d0) + k.
+   The pattern for store is similar.  */
+bool
+ldm_stm_operation_p (rtx op, bool load)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  rtx reg, mem, addr;
+  unsigned regno;
+  HOST_WIDE_INT i = 1, base = 0, offset = 0;
+  rtx elt;
+  bool addr_reg_in_reglist = false;
+  bool update = false;
+  int reg_increment;
+  int offset_adj;
+
+  reg_increment = 4;
+  offset_adj = 0;
+
+  if (count = 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || (load  !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj)
+return false;
+
+  /* Check if this is a write-back.  */
+  elt = XVECEXP (op, 0, offset_adj);
+  if (GET_CODE (SET_SRC (elt)) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment must be the number of registers being
+ popped times the size of a single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || (REGNO (SET_DEST (elt)) != REGNO (XEXP (SET_SRC (elt), 0)))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+ ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (count = i)
+return false;
+
+  elt = XVECEXP (op, 0, i - 1);
+  if (GET_CODE (elt) != SET)
+return false;
+
+  if (load)
+{
+  reg = SET_DEST (elt);
+  mem = SET_SRC (elt);
+}
+  else
+{
+  reg = SET_SRC (elt);
+  mem = SET_DEST (elt);
+}
+
+  if (!REG_P (reg) || !MEM_P (mem))
+return false;
+
+  regno = REGNO (reg);
+  addr = XEXP (mem, 0);
+  if (GET_CODE (addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (addr, 1)))
+   return false;
+
+  offset = INTVAL (XEXP (addr, 1));
+  addr = XEXP (addr, 0);
+}
+
+  if (!REG_P (addr))
+return false;
+
+  for (; i  count; i++)
+{
+  elt = XVECEXP (op, 0, i);
+  if (GET_CODE (elt) != SET)
+return false;
+
+  if (load)
+{
+  reg = SET_DEST (elt);
+  mem = SET_SRC (elt);
+}
+  else
+{
+  reg = SET_SRC (elt);
+  mem = SET_DEST (elt);
+}
+
+  if (!REG_P (reg)
+  || GET_MODE (reg) != SImode
+  || REGNO (reg) = regno
+  || !MEM_P (mem)
+  || GET_MODE (mem) != SImode
+  || ((GET_CODE (XEXP (mem, 0)) != PLUS
+  || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr)
+  || !CONST_INT_P (XEXP (XEXP (mem, 0), 1))
+  || (INTVAL (XEXP (XEXP (mem, 0), 1)) !=
+   offset + (i - base) * reg_increment))
+  (!REG_P (XEXP (mem, 0))
+ || offset + (i - base) * reg_increment != 0)))
+return false;
+
+  regno = REGNO (reg);
+  if (regno

[Patch, ARM][2/2] generalize ldm_stm_operation_p

2012-04-24 Thread Greta Yorsh

Generalize ldm_stm_operation_p with additional parameters that will be used
by epilogue patterns:
  * machine mode to support both SImode and DFmode registers
  * flag to request consecutive registers in the register list
  * flag to indicate whether PC in the register list

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (ldm_stm_operation_p): New parameters.
* config/arm/arm.c (ldm_stm_operation_p): New parameters.
* config/arm/predicates.md (load_multiple_operation): Add arguments.
(store_multiple_operation): Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 753e109..efb5b9f 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -62,7 +62,8 @@ extern bool arm_legitimize_reload_address (rtx *, enum 
machine_mode, int, int,
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
 extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
-extern bool ldm_stm_operation_p (rtx, bool);
+extern bool ldm_stm_operation_p (rtx, bool, enum machine_mode mode,
+ bool, bool);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4216d05..5477de2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10139,7 +10139,9 @@ adjacent_mem_locations (rtx a, rtx b)
 }
 
 /* Return true if OP is a valid load or store multiple operation.  LOAD is true
-   for load operations, false for store operations.
+   for load operations, false for store operations.  CONSECUTIVE is true
+   if the register numbers in the operation must be consecutive in the register
+   bank. RETURN_PC is true if value is to be loaded in PC.
The pattern we are trying to match for load is:
  [(SET (R_d0) (MEM (PLUS (addr) (offset
   (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment
@@ -10154,20 +10156,31 @@ adjacent_mem_locations (rtx a, rtx b)
  REGNO (R_dk) = REGNO (R_d0) + k.
The pattern for store is similar.  */
 bool
-ldm_stm_operation_p (rtx op, bool load)
+ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
+ bool consecutive, bool return_pc)
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
   rtx reg, mem, addr;
   unsigned regno;
+  unsigned first_regno;
   HOST_WIDE_INT i = 1, base = 0, offset = 0;
   rtx elt;
   bool addr_reg_in_reglist = false;
   bool update = false;
   int reg_increment;
   int offset_adj;
+  int regs_per_val;
 
-  reg_increment = 4;
-  offset_adj = 0;
+  /* If not in SImode, then registers must be consecutive
+ (e.g., VLDM instructions for DFmode).  */
+  gcc_assert ((mode == SImode) || consecutive);
+  /* Setting return_pc for stores is illegal.  */
+  gcc_assert (!return_pc || load);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = reg_increment / 4;
+  offset_adj = return_pc ? 1 : 0;
 
   if (count = 1
   || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
@@ -10195,9 +10208,11 @@ ldm_stm_operation_p (rtx op, bool load)
 
   i = i + offset_adj;
   base = base + offset_adj;
-  /* Perform a quick check so we don't blow up below.  */
-  if (count = i)
-return false;
+  /* Perform a quick check so we don't blow up below. If only one reg is 
loaded,
+ success depends on the type: VLDM can do just one reg,
+ LDM must do at least two.  */
+  if ((count = i)  (mode == SImode))
+  return false;
 
   elt = XVECEXP (op, 0, i - 1);
   if (GET_CODE (elt) != SET)
@@ -10218,6 +10233,7 @@ ldm_stm_operation_p (rtx op, bool load)
 return false;
 
   regno = REGNO (reg);
+  first_regno = regno;
   addr = XEXP (mem, 0);
   if (GET_CODE (addr) == PLUS)
 {
@@ -10249,10 +10265,13 @@ ldm_stm_operation_p (rtx op, bool load)
 }
 
   if (!REG_P (reg)
-  || GET_MODE (reg) != SImode
+  || GET_MODE (reg) != mode
   || REGNO (reg) = regno
+  || (consecutive
+   (REGNO (reg) !=
+  (unsigned int) (first_regno + regs_per_val * (i - base
   || !MEM_P (mem)
-  || GET_MODE (mem) != SImode
+  || GET_MODE (mem) != mode
   || ((GET_CODE (XEXP (mem, 0)) != PLUS
   || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr)
   || !CONST_INT_P (XEXP (XEXP (mem, 0), 1))
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 20a64ec..428f9e0 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -380,13 +380,17 @@
 (define_special_predicate load_multiple_operation
   (match_code parallel)
 {
- return ldm_stm_operation_p (op

[Patch, ARM] rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue

2012-04-24 Thread Greta Yorsh

Rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue.

In preparation for epilogue generation in RTL and anyway it's the right name
for this function.

Ok for trunk?

Thanks,
Greta

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.
* config/arm/arm.c (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.
* config/arm/arm.md (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 21a89aa..0c4bb96 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -178,7 +178,7 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5d9cbc5..0b7f3c8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22032,7 +22032,7 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offsets, 
bool for_prologue)
 
 /* The bits which aren't usefully expanded as rtl.  */
 const char *
-thumb_unexpanded_epilogue (void)
+thumb1_unexpanded_epilogue (void)
 {
   arm_stack_offsets *offsets;
   int regno;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4f6d965..ed33c9b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10673,7 +10673,7 @@
   if (TARGET_32BIT)
 return arm_output_epilogue (NULL);
   else /* TARGET_THUMB1 */
-return thumb_unexpanded_epilogue ();
+return thumb1_unexpanded_epilogue ();
   
   ; Length is absolute worst case
   [(set_attr length 44)

RE: PING: [PATCH] Fix PRs c/52283/37985

2012-04-20 Thread Greta Yorsh

Here is a patch to fix the failing test gcc.dg/pr52283.c. 
Adding the missing dg-warning and dg-options.

OK?


gcc/testsuite/ChangeLog

2012-04-20  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/pr52283.c: Add missing dg-warning and dg-options.


diff --git a/gcc/testsuite/gcc.dg/pr52283.c b/gcc/testsuite/gcc.dg/pr52283.c
index 33785a5..070e71a 100644
--- a/gcc/testsuite/gcc.dg/pr52283.c
+++ b/gcc/testsuite/gcc.dg/pr52283.c
@@ -1,6 +1,7 @@
 /* Test for case labels not integer constant expressions but folding
to integer constants (used in Linux kernel).  */
 /* { dg-do compile } */
+/* { dg-options -pedantic } */
 
 extern unsigned int u;
 
@@ -9,7 +10,7 @@ b (int c)
 {
   switch (c)
 {
-case (int) (2  | ((4  8) ? 8 : u)):
+case (int) (2  | ((4  8) ? 8 : u)): /* { dg-warning case label is not
an integer constant expression } */
   ;
 }
 }

 -Original Message-
 From: H.J. Lu [mailto:hjl.to...@gmail.com]
 Sent: 19 April 2012 15:32
 To: Manuel López-Ibáñez
 Cc: Christian Bruel; Richard Guenther; gcc-patches@gcc.gnu.org; Joseph
 S. Myers; Jason Merrill
 Subject: Re: PING: [PATCH] Fix PRs c/52283/37985
 
 On Thu, Apr 19, 2012 at 3:17 AM, Manuel López-Ibáñez
 lopeziba...@gmail.com wrote:
  On 19 April 2012 11:11, Christian Bruel christian.br...@st.com
 wrote:
 
 
  On 04/18/2012 11:51 AM, Richard Guenther wrote:
  On Wed, Apr 18, 2012 at 11:06 AM, Manuel López-Ibáñez
  lopeziba...@gmail.com wrote:
  On 18 April 2012 10:29, Christian Bruel christian.br...@st.com
 wrote:
 
  Is it OK for trunk, bootstrapped and regtested on x86
 
  I think Joseph Myers is on vacation, and there are no other C FE
  reviewers, but since this is c-common and convert.c, perhaps Jason
  and/or Richard can review it?
 
  The patch is ok if you put the PR52283 properly into a separate
 testcase,
  not by amending gcc.dg/case-const-2.c.
 
 
  Thanks, done at rev #186586. with this change.
 
  Great!
 
  Just a minor nit, for future patches. There is the unwritten rule of
  adding the Changelogs to the commit log, like follows:
 
  2012-04-19  Christian Bruel  christian.br...@st.com
                   Manuel López-Ibáñez  m...@gcc.gnu.org
 
        PR c/52283
        PR c/37985
        * stmt.c (warn_if_unused_value): Skip NOP_EXPR.
        * convert.c (convert_to_integer): Don't set TREE_NO_WARNING.
  testsuite/
        * gcc.dg/pr52283.c: New test.
        * gcc.dg/pr37985.c: New test.
 
 
 
 gcc.dg/pr52283.c failed on Linux/x86:
 
 FAIL: gcc.dg/pr52283.c (test for excess errors)
 
 --
 H.J.

RE: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Greta Yorsh

I'm attaching a new version of the patch. Fixed all comments and retested.
No regression on qemu --with-cpu cortex-a9.

Thank you,

Greta

gcc/ChangeLog

2012-02-29  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml (write_ldm_commutative_peephole):
Improved
conditions of peepholes generating LDM followed by commutative
operator.
* config/arm/ldmstm.md: Regenerated.


 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
 Sent: 29 February 2012 00:41
 To: Greta Yorsh
 Cc: gcc-patches@gcc.gnu.org; ram...@gcc.gnu.org; p...@codesourcery.com;
 ni...@redhat.com
 Subject: Re: [PATCH,ARM] Improve peepholes for LDM with commutative
 operators
 
 [Sorry about the duplicate mail. My mailer seems to have eaten up the
 original reply I sent. ]
 
 
 On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
  Is it OK for GCC 4.7 Stage 4 ?
 
 This is stage4 - I'd like to hear what the RM's think. Technically
 it's fixing a regression and is low risk to me.
 
 In any case there are a couple of changes that I'd like done
 as explained below.
 
 
  Thank you,
 
  Greta
 
  gcc/ChangeLog
 
  2012-02-28  Greta Yorsh  greta.yo...@arm.com
 
  * config/arm/arm-ldmstm.ml: Improved conditions of peepholes
 that
  generate
  LDM followed by a commutative operator.
  * config/arm/ldmstm.md: Regenerated.
 
 Can you mention which 2 peepholes are changed in some way.
 
  diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-
 ldmstm.ml
  index 221edd2..5f5a5e0 100644
  --- a/gcc/config/arm/arm-ldmstm.ml
  +++ b/gcc/config/arm/arm-ldmstm.ml
  @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
   Printf.printf %s  (match_operand:SI %d
 \s_register_operand\ \\)]))\n indent (nregs * 2 + 3);
   Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
 end;
  -  Printf.printf   \(((operands[%d] == operands[0]  operands[%d]
 == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
  -  Printf.printf  || (operands[%d] == operands[0] 
 operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
  -  Printf.printf  peep2_reg_dead_p (%d, operands[0]) 
 peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1);
  +  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2
 + 3);
  +  Printf.printf  || (rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2
 + 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) ||
 rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) ||
 rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
 begin
   if thumb then
 Printf.printf   [(set (match_dup %d) (match_op_dup %d
 [(match_dup %d) (match_dup %d)]))]\n
  diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
  index 5db4a32..5db3d57 100644
  --- a/gcc/config/arm/ldmstm.md
  +++ b/gcc/config/arm/ldmstm.md
  @@ -1160,9 +1160,10 @@
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))
 (clobber (reg:CC CC_REGNUM))])]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
  + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p
 (operands[0], operands[4]))
  + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p
 (operands[1], operands[4])))
 
 Line  80 characters -
 
 
 [(parallel
   [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup
 7)]))
(clobber (reg:CC CC_REGNUM))])]
  @@ -1180,9 +1181,10 @@
   (match_operator:SI 5 commutative_binary_operator
[(match_operand:SI 6 s_register_operand )
 (match_operand:SI 7 s_register_operand )]))]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
 
 Again line  80 characters.
 
 Instead of rtx_equal_p, check that the REGNOs are equal.
 That will be cheaper: we know these are register_operands.
 
 For bonus points you use peep2_regno_dead_p with REGNO (operands[n])
 instead of peep2_reg_dead_p. If we are accessing REGNO might as well
 reuse it :).
 
 regards
 Ramana
diff --git a/gcc/config/arm/arm-ldmstm.ml

[PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Greta Yorsh

This patch improves existing peephole optimizations that merge individual
LDRs into LDM, in the case that the order of registers in LDR instructions
is not ascending, but the loaded values can be reordered because their uses
commute. 

There are two changes:
* use rtx__equal_p to compare operands (instead of plain ==)
* identify more cases of dead registers in the pattern.

For example, the following sequence
LDR r1, [r2]
LDR r0, [r2, #4]
ADD r0, r0, r1
can be transformed into
LDRD r0, r1, [r2]
ADD r0, r0, r1
when r1 is dead after ADD. Such optimization opportunities are missed by the
existing peephole conditions, because r0 is not dead after ADD. This patch
enables such transformations.

No regression on qemu for --target=arm-none-eabi --with-cpu=cortex-a15 and
a9.

This patch was originally submitted as part of a sequence of patches
improving LDRD/STRD/LDM/STM generation:
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00920.html
but it is independent and it fixes a failures in one of the regression
tests:
PASS: gcc.target/arm/pr40457-1.c scan-assembler ldm

Is it OK for GCC 4.7 Stage 4 ?

Thank you,

Greta

gcc/ChangeLog

2012-02-28  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml: Improved conditions of peepholes that
generate
LDM followed by a commutative operator.
* config/arm/ldmstm.md: Regenerated.
diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml
index 221edd2..5f5a5e0 100644
--- a/gcc/config/arm/arm-ldmstm.ml
+++ b/gcc/config/arm/arm-ldmstm.ml
@@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
 Printf.printf %s  (match_operand:SI %d \s_register_operand\ 
\\)]))\n indent (nregs * 2 + 3);
 Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
   end;
-  Printf.printf   \(((operands[%d] == operands[0]  operands[%d] == 
operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
-  Printf.printf  || (operands[%d] == operands[0]  operands[%d] == 
operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
-  Printf.printf  peep2_reg_dead_p (%d, operands[0])  peep2_reg_dead_p 
(%d, operands[1]))\\n (nregs + 1) (nregs + 1);
+  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0])  rtx_equal_p 
(operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3);
+  Printf.printf  || (rtx_equal_p (operands[%d], operands[0])  
rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2);
+  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p 
(operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
+  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p 
(operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
   begin
 if thumb then
   Printf.printf   [(set (match_dup %d) (match_op_dup %d [(match_dup %d) 
(match_dup %d)]))]\n
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 5db4a32..5db3d57 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -1160,9 +1160,10 @@
 [(match_operand:SI 6 s_register_operand )
  (match_operand:SI 7 s_register_operand )]))
   (clobber (reg:CC CC_REGNUM))])]
-  (((operands[6] == operands[0]  operands[7] == operands[1])
- || (operands[7] == operands[0]  operands[6] == operands[1]))
- peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, operands[1]))
+  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
operands[1]))
+ || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
operands[1])))
+ (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
operands[4]))
+ (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
operands[4])))
   [(parallel
 [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
  (clobber (reg:CC CC_REGNUM))])]
@@ -1180,9 +1181,10 @@
 (match_operator:SI 5 commutative_binary_operator
  [(match_operand:SI 6 s_register_operand )
   (match_operand:SI 7 s_register_operand )]))]
-  (((operands[6] == operands[0]  operands[7] == operands[1])
- || (operands[7] == operands[0]  operands[6] == operands[1]))
- peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, operands[1]))
+  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
operands[1]))
+ || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
operands[1])))
+ (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
operands[4]))
+ (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
operands[4])))
   [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))]
 {
   if (!gen_ldm_seq (operands, 2, true))

RE: [libitm] Link with -litm and -pthread

2012-02-15 Thread Greta Yorsh

This patch causes all tm tests to fail on arm-none-eabi target that doesn't
support -pthread command line option:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52259

 -Original Message-
 From: Eric Botcazou [mailto:ebotca...@adacore.com]
 Sent: 11 February 2012 19:24
 To: Jack Howarth
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [libitm] Link with -litm and -pthread

 I missed the regeneration of libitm/configure the first time. The
  p2.diff with the regenerated libitm/configure passes make check in
 libitm
  now on x86_64-apple-darwin11...

 Great, thanks for the testing.

 --
 Eric Botcazou

[patch, testsuite] require target lto in gcc.dg/tm/lto-1.c

2012-01-25 Thread Greta Yorsh

The new test gcc.dg/tm/lto-1.c fails in configurations where lto support is
disabled. 

This patch adds the missing target requirement to the test:
/* { dg-require-effective-target lto } */

Checked on qemu for arm-none-eabi configured with and without --disable-lto.

gcc/testsuite/ChangeLog

2012-01-25  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/tm/lto-1.c: Require lto support in target.


test-lto.patch
Description: Binary data

RE: [patch, arm, testsuite] fix regression in test di-longlong64-sync-withldrexd.c

2012-01-25 Thread Greta Yorsh

On 25 January 2012, at 18:14, Mike Stump wrote:
 On Jan 25, 2012, at 7:35 AM, Greta Yorsh wrote:
  The test gcc.target/arm/di-longlong64-sync-withldrexd.c fails on
  arm-none-eabi target, because gcc generates 48 LDREXD and 48 STREXD
  instructions instead of the expected 46.
 
  FAIL: gcc.target/arm/di-longlong64-sync-withldrexd.c scan-assembler-
 times
  tldrexd 46
  FAIL: gcc.target/arm/di-longlong64-sync-withldrexd.c scan-assembler-
 times
  tstrexd 46
 
  The regression PASS-FAIL was introduced for target arm-none-eabi in
 the
  first week of November 2011.
 
 Gosh, that seems like a long time to notice and adjust the testcase for
 a port that seems popular enough and has a nice representation around
 here.  :-(
 
 * gcc.target/arm/di-longlong64-sync-withldrexd.c:
 
 Please say slightly more than this.

I'm sorry. Here is the full entry:

   * gcc.target/arm/di-longlong64-sync-withldrexd.c: Accept
 new code generated for __sync_lock_release.
 
 Also, if you mean to ask for a review, please include an Ok?  in there.
 
 I'll assume you meant to ask for a review.  
 Ok.  If you aren't
 absolutely sure about the codegen or have an doubts, please ping an arm
 person for review of the codegen.

Copying an arm person as well as the author of r18, which introduced the
change in __sync_lock_release. 

Thank you,
Greta

[patch, testsuite] fix test gcc.dg/pr50908-2.c

2012-01-24 Thread Greta Yorsh

The test gcc.dg/pr50908-2.c fails on arm and other targets where short enums
is the default. 

arm-none-eabi-gcc
/work/local-checkouts/main/gcc-fsf/gcc/testsuite/gcc.dg/pr50908-2.c
/work/local-checkouts/main/gcc-fsf/gcc/testsuite/gcc.dg/pr50908-2.c:39:8:
error: width of 'code' exceeds its type

The compile error has nothing to do with the intended functionality of this
test. The error is due to the use of a bitfield in the test: enum rtx_code
code:16.

This patch adds the missing compiler option -fno-short-enums to the test.

gcc/testsuite/ChangeLog

2012-01-24  Greta Yorsh  greta.yo...@arm.com

* gcc.dg/pr50908-2.c (dg-options): Add -fno-short-enums.

test-no-short-enums-v2.patch
Description: Binary data

RE: [PATCH, ARM, testsuite] fix test c-c++-common/tm/omp.c

2012-01-20 Thread Greta Yorsh

PING!
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01813.html


 -Original Message-
 From: Greta Yorsh [mailto:greta.yo...@arm.com]
 Sent: 17 November 2011 10:36
 To: gcc-patches@gcc.gnu.org
 Cc: 'ni...@redhat.com'; Richard Earnshaw; 'p...@codesourcery.com';
 'al...@redhat.com'
 Subject: [PATCH, ARM, testsuite] fix test c-c++-common/tm/omp.c
 
 The testcase c-c++-common/tm/omp.c fails on arm-none-eabi:
 
 Executing: arm-none-eabi-gcc /work/local-checkouts/gcc-
 fsf/gcc/testsuite/c-c++-common/tm/omp.c -fgnu-tm -fopenmp -S  -o omp.s
 arm-none-eabi-gcc: error: unrecognized command line option '-pthread'
 FAIL: c-c++-common/tm/omp.c
 
 The attached patch adds the appropriate directive to the test:
 /* { dg-require-effective-target pthread } */
 
 
 -- Greta
 
 
 gcc/testsuite/ChangeLog
 
 2011-11-16  Greta Yorsh  greta.yo...@arm.com
 
   * c-c++-common/tm/omp.c: Require target with pthread support.diff --git a/gcc/testsuite/c-c++-common/tm/omp.c 
b/gcc/testsuite/c-c++-common/tm/omp.c
index b9fcc76..b664a6f 100644
--- a/gcc/testsuite/c-c++-common/tm/omp.c
+++ b/gcc/testsuite/c-c++-common/tm/omp.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -fgnu-tm -fopenmp } */
+/* { dg-require-effective-target pthread } */
 
 __attribute__ ((transaction_pure))
 unsigned long rdtsc();

[PATCH, ARM][0/6] LDRD/STRD generation - introduction

2011-11-07 Thread Greta Yorsh

The following sequence of patches enables generation of LDRD/STRD
instructions for Cortex-A15 with -O2 and for all Cortex-A CPUs with -Os when
profitable. This almost always improves code size and is expected to improve
performance on Cortex-A15. 

[0/6] LDRD/STRD generation - introduction (this email)
[1/6] Merge LDR/STR into LDRD/STRD with -O2 in Thumb (1-ldrdstrd.patch)
[2/6] Merge LDR/STR into LDRD/STRD with -O2 in ARM (2-output-double.patch)
[3/6] Merge LDR/STR into LDRD/STRD with -Os (3-size.patch)
[4/6] Improve peepholes for generating LDM with commutative operators
(4-ldm-commute.patch)
[5/6] Generate LDRD/STRD for internal memcpy (5-internal-memcpy.patch)
[6/6] Tests for LDRD/STRD/LDM/STM generation for Cortex-A9/A15
(6-cortexa-tests.patch, creatests.py)

The patches are to be applied in the given order.

Testing and benchmarking is in progress:
* Passed all tests in check-gcc without regressions on qemu for target
arm-none-eabi built with newlib and various configurations of
A15/A9/Thumb/ARM (see ^note).
* Successful cross-build of arm-none-linux-gnueabi with eglibc and
A15/A9/Thumb/ARM
* Successful bootstrap on Cortex-A9 Tegra Linux Ubuntu board (A9 Thumb/ARM).
* Cross-built Spec2k using arm-none-linux-gnueabi compiler in several
configurations of A9/A15/Thumb/ARM.
* Ran CINT Spec2K on Cortex-A9 VE2 Linux board. No regression in runtime
between the files built with the trunk and the patched versions of the
compiler (A9 Thumb/ARM). There are some LDRD/STRD instructions in the
patched version.
* A brief look at CSiBE benchmark: the results show small improvement in
code size with a small increase in compilation time for most benchmarks.

I am working on more benchmark results and their detailed analysis.

-- Greta

(^note) The patch has accidently fixed or masked a failure in a regression
test of vector shuffle:
FAIL-PASS: gcc.dg/torture/vshuf-v8sf.c  -O2  execution test (an abort
statement is executed)
The test fails when the compiler is configured with cortex-a15 thumb neon
and softfp, gcc trunk r180197. After the patch is applied, the test passes.

[PATCH,ARM][3/6] Merge LDR/STR into LDRD/STRD with -Os

2011-11-07 Thread Greta Yorsh

This patch enables generation of LDRD/STRD for configurations in which
LDM/STM would be preferable (e.g., Cortex-A15 with -Os or Cortex-A9 with
-O2), but cannot be generated. There are several situations in which LDM/STM
cannot be generated, but LDRD/STRD can be generated, for example memory
addressing with large offset or order of registers. This patch handles both
ARM and Thumb modes.

The files ldrdstrd0.md and ldrdstrd1.md are automatically generated from
ldrdstrd.md.in by simple substitution. The generated files are included from
arm.md, before and after ldmstm.md. The patterns in both files are the same.
The only difference is a flag in the pattern conditions. 

This duplication is far from ideal, but I don't know how to avoid it,
because the order in which the patterns are listed is essential.

Note that this patch does not include the file ldrdstrd0.md, which has
already been introduced by patch no. 1 in this sequence (to make the review
of the patterns easier). 

gcc/ChangeLog

2011-10-28  Greta Yorsh  greta.yo...@arm.com

  * config/arm/t-arm: Update the build system to generate 
ldrdstrd0.md  and ldrdstrd1.md from ldrdstrd.md.in.
* config/arm/arm.md: Include ldrdstrd1.md after ldmstm.md.
  * config/arm/ldrdstrd.md.in: New file, template.
  * config/arm/ldrdstrd1.md: New file, automatically generated 
from ldrdstrd.md.
  
contrib/ChangeLog

2011-11-02  Greta Yorsh  greta.yo...@arm.com

* gcc_update (files_and_dependencies): Add ldrdstrd.md dependencies.

3-size.patch
Description: Binary data

[PATCH,ARM][4/6] Improve peepholes for generating LDM with commutative operators

2011-11-07 Thread Greta Yorsh

This patch improves existing peephole optimizations.
These peephole optimizations merge individual LDRs into LDM in the case that
the order of registers in LDR instructions is not ascending, but the loaded
values can be reordered because they are use of the loaded values is
commutative.

There are two changes:
* use rtx__equal_p to compare operands (instead of plain ==)
* identify more cases of dead registers in the pattern.

For example, the following sequence
LDR r1, [r2]
LDR r0, [r2, #4]
ADD r0, r0, r1
can be transformed into
LDRD r0, r1, [r2]
ADD r0, r0, r1
when r1 is dead after ADD. Such optimization opportunities are missed by the
existing peephole conditions, because r0 is not dead after ADD. This patch
enables such transformations.

This patch is independent from other patches in this sequence, but it was
tested only as part of the sequence.

gcc/ChangeLog

2011-10-28  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml: Improved conditions of peepholes that
generate
LDM followed by a commutative operator.
* config/arm/ldmstm.md: Regenerated.


4-ldm-commute.patch
Description: Binary data

1 2 >

1 - 100 of 104 matches

Mail list logo