Re: [PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC

2012-01-01 Thread Revital Eres
Hello,

 Yes, thanks for catching this. Shouldn't we prevent creating such
 regmoves for (the other case of) intra-loop anti-deps as well?

Right! sorry for missing that. I added an additional check in
create_ddg_dep_from_intra_loop_link.

Also, thanks to Bernhard Rosenkraenzer for opening PR 879725 for that
ICE in Linaro GCC DB.

Currently re-testing the patch on ppc64-redhat-linux and will commit
it once testing completes if that's OK .

Thanks,
Revital

gcc/
* ddg.c (def_has_ccmode_p): New function.
(add_cross_iteration_register_deps,
create_ddg_dep_from_intra_loop_link): Call it.

testsuite/
* gcc.dg/sms-11.c: New file.
Index: ddg.c
===
--- ddg.c   (revision 182479)
+++ ddg.c   (working copy)
@@ -166,6 +166,24 @@ autoinc_var_is_used_p (rtx def_insn, rtx
   return false;
 }
 
+/* Return true if one of the definitions in INSN has MODE_CC.  Otherwise
+   return false.  */
+static bool
+def_has_ccmode_p (rtx insn)
+{
+  df_ref *def;
+
+  for (def = DF_INSN_DEFS (insn); *def; def++)
+{
+  enum machine_mode mode = GET_MODE (DF_REF_REG (*def));
+
+  if (GET_MODE_CLASS (mode) == MODE_CC)
+   return true;
+}
+
+  return false;
+}
+
 /* Computes the dependence parameters (latency, distance etc.), creates
a ddg_edge and adds it to the given DDG.  */
 static void
@@ -202,6 +220,7 @@ create_ddg_dep_from_intra_loop_link (ddg
  whose register has multiple defs in the loop.  */
   if (flag_modulo_sched_allow_regmoves 
(t == ANTI_DEP  dt == REG_DEP)
+   !def_has_ccmode_p (dest_node-insn)
!autoinc_var_is_used_p (dest_node-insn, src_node-insn))
 {
   rtx set;
@@ -335,7 +354,8 @@ add_cross_iteration_register_deps (ddg_p
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
   || !flag_modulo_sched_allow_regmoves
  || JUMP_P (use_node-insn)
-  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
+  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn)
+ || def_has_ccmode_p (DF_REF_INSN (last_def)))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);
 
Index: testsuite/gcc.dg/sms-11.c
===
--- testsuite/gcc.dg/sms-11.c   (revision 0)
+++ testsuite/gcc.dg/sms-11.c   (revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+
+extern void abort (void);
+
+float out[4][4] = { 6, 6, 7, 5, 6, 7, 5, 5, 6, 4, 4, 4, 6, 2, 3, 4 };
+
+void
+invert (void)
+{
+  int i, j, k = 0, swap;
+  float tmp[4][4] = { 5, 6, 7, 5, 6, 7, 5, 5, 4, 4, 4, 4, 3, 2, 3, 4 };
+
+  for (i = 0; i  4; i++)
+{
+  for (j = i + 1; j  4; j++)
+   if (tmp[j][i]  tmp[i][i])
+ swap = j;
+
+  if (swap != i)
+   tmp[i][k] = tmp[swap][k];
+}
+
+  for (i = 0; i  4; i++)
+for (j = 0; j  4; j++)
+  if (tmp[i][j] != out[i][j])
+   abort ();
+}
+
+int
+main ()
+{
+  invert ();
+  return 0;
+}
+
+/* { dg-final { cleanup-rtl-dump sms } } */


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-01 Thread Revital Eres
Hello,

Thanks for the comments! I incorporated them in the attached patch.

Currently testing and bootstrap with the other patch in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Thanks again,
Revital

2012-01-01  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182766)
+++ doc/invoke.texi (working copy)
@@ -374,6 +374,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6476,6 +6477,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h
===
--- modulo-sched.h  (revision 0)
+++ modulo-sched.h  (revision 0)
@@ -0,0 +1,120 @@
+/* Swing Modulo Scheduling implementation.
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 
+   Free Software Foundation, Inc.
+   Contributed by Revital Eres revital.e...@linaro.org 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#ifndef GCC_SMS_H
+#define GCC_SMS_H
+
+#include ddg.h
+
+extern HARD_REG_SET eliminable_regset;
+
+typedef struct partial_schedule *partial_schedule_ptr;
+
+typedef struct ps_insn *ps_insn_ptr;
+
+/* A single instruction in the partial schedule.  */
+struct ps_insn
+{
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
+  int id;
+
+  /* The (absolute) cycle in which the PS instruction is scheduled.
+ Same as SCHED_TIME (node).  */
+  int cycle;
+
+  /* The next/prev PS_INSN in the same row.  */
+  ps_insn_ptr next_in_row,
+ prev_in_row;
+
+};
+
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* The number of consecutive stages

Patches ping

2011-12-26 Thread Revital Eres
Hello,

[PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01459.html

[PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01330.html

Thanks,
Revital


[PATCH, SMS] Prevent the creation of reg-moves for non allocatable definition​s (re-submission)

2011-12-21 Thread Revital Eres
Hello,

Following Richard's comment
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01469.html attached is
a new version of the patch to prevent reg-moves for non allocatable definitions.

Currently testing and bootstrap on ppc64-redhat-linux, enabling SMS on
loops with SC 1.

OK for 4.7 once testing completes?

Thanks,
Revital

Changelog:

gcc/
* ddg.c (def_non_allocatable_p): New function.
(add_cross_iteration_register_deps): Call it.

testsuite/
 * gcc.dg/sms-11.c: New file.
Index: ddg.c
===
--- ddg.c   (revision 182479)
+++ ddg.c   (working copy)
@@ -263,6 +263,23 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_n
 add_edge_to_ddg (g, e);
 }
 
+/* Return true if one of the definitions in INSN is not allocatable.
+   Otherwise return false.  */
+static bool
+def_non_allocatable_p (rtx insn)
+{
+  df_ref *def;
+
+  for (def = DF_INSN_DEFS (insn); *def; def++)
+{
+  enum machine_mode mode = GET_MODE (DF_REF_REG (*def));
+
+  if (!have_regs_of_mode[mode])
+   return true;
+}
+
+  return false;
+}
 
 /* Given a downwards exposed register def LAST_DEF (which is the last
definition of that register in the bb), add inter-loop true dependences
@@ -335,7 +352,8 @@ add_cross_iteration_register_deps (ddg_p
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
   || !flag_modulo_sched_allow_regmoves
  || JUMP_P (use_node-insn)
-  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
+  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn)
+ || def_non_allocatable_p (DF_REF_INSN (last_def)))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);
 
Index: testsuite/gcc.dg/sms-11.c
===
--- testsuite/gcc.dg/sms-11.c   (revision 0)
+++ testsuite/gcc.dg/sms-11.c   (revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+
+extern void abort (void);
+
+float out[4][4] = { 6, 6, 7, 5, 6, 7, 5, 5, 6, 4, 4, 4, 6, 2, 3, 4 };
+
+void
+invert (void)
+{
+  int i, j, k = 0, swap;
+  float tmp[4][4] = { 5, 6, 7, 5, 6, 7, 5, 5, 4, 4, 4, 4, 3, 2, 3, 4 };
+
+  for (i = 0; i  4; i++)
+{
+  for (j = i + 1; j  4; j++)
+   if (tmp[j][i]  tmp[i][i])
+ swap = j;
+
+  if (swap != i)
+   tmp[i][k] = tmp[swap][k];
+}
+
+  for (i = 0; i  4; i++)
+for (j = 0; j  4; j++)
+  if (tmp[i][j] != out[i][j])
+   abort ();
+}
+
+int
+main ()
+{
+  invert ();
+  return 0;
+}
+
+/* { dg-final { cleanup-rtl-dump sms } } */


[PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC

2011-12-20 Thread Revital Eres
Hello,

The testcase attached causes ICE when compiling with
-fmodulo-sched-allow-regmoves on ARM due to reg-moves created for the
definition of mode MODE_CC.

The following is a snippet from the ddg of the definition and use of vfpcc
which triggers the creation of the reg-move:

Node num: 1
(insn 151 77 152 6 (set (reg:CCFP 127 vfpcc)
(compare:CCFP (reg:SF 202 [ MEM[base: D.5306_32, offset: 0B] ])
(reg:SF 183 [ D.5284 ]))) test_new.c:8 694 {*cmpsf_vfp}
 (expr_list:REG_DEAD (reg:SF 202 [ MEM[base: D.5306_32, offset: 0B] ])
(nil)))
OUT ARCS:  [151 -(T,4,0)- 152]
IN ARCS:  [77 -(T,3,0)- 151]
Node num: 2
(insn 152 151 120 6 (set (reg:CCFP 24 cc)
(reg:CCFP 127 vfpcc)) test_new.c:8 689 {*movcc_vfp}
 (expr_list:REG_DEAD (reg:CCFP 127 vfpcc)
(nil)))
OUT ARCS:  [152 -(O,0,0)- 144]  [152 -(T,0,0)- 120]
IN ARCS:  [145 -(A,0,1)- 152]  [151 -(T,4,0)- 152]

The attached patch prevents the creation of reg-moves for definitions
with MODE_CC and thus solves this ICE.

Currently testing and bootstrap on ppc64-redhat-linux, enabling SMS on
loops with SC 1.

OK for 4.7 once testing completes?

Thanks,
Revital

Changelog:

gcc/
* ddg.c (def_has_ccmode_p): New function.
(add_cross_iteration_register_deps): Call it.

testsuite/
 * gcc.dg/sms-11.c: New file.
Index: ddg.c
===
--- ddg.c   (revision 182482)
+++ ddg.c   (working copy)
@@ -263,6 +263,23 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_n
 add_edge_to_ddg (g, e);
 }
 
+/* Return true if one of the definitions in INSN has MODE_CC.  Otherwise
+   return false.  */
+static bool
+def_has_ccmode_p (rtx insn)
+{
+  df_ref *def;
+
+  for (def = DF_INSN_DEFS (insn); *def; def++)
+{
+  enum machine_mode mode = GET_MODE (DF_REF_REG (*def));
+
+  if (GET_MODE_CLASS (mode) == MODE_CC)
+   return true;
+}
+
+  return false;
+}
 
 /* Given a downwards exposed register def LAST_DEF (which is the last
definition of that register in the bb), add inter-loop true dependences
@@ -335,7 +352,8 @@ add_cross_iteration_register_deps (ddg_p
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
   || !flag_modulo_sched_allow_regmoves
  || JUMP_P (use_node-insn)
-  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
+  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn)
+ || def_has_ccmode_p (DF_REF_INSN (last_def)))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);
 
Index: testsuite/gcc.dg/sms-11.c
===
--- testsuite/gcc.dg/sms-11.c   (revision 0)
+++ testsuite/gcc.dg/sms-11.c   (revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+
+extern void abort (void);
+
+float out[4][4] = { 6, 6, 7, 5, 6, 7, 5, 5, 6, 4, 4, 4, 6, 2, 3, 4 };
+
+void
+invert (void)
+{
+  int i, j, k = 0, swap;
+  float tmp[4][4] = { 5, 6, 7, 5, 6, 7, 5, 5, 4, 4, 4, 4, 3, 2, 3, 4 };
+
+  for (i = 0; i  4; i++)
+{
+  for (j = i + 1; j  4; j++)
+   if (tmp[j][i]  tmp[i][i])
+ swap = j;
+
+  if (swap != i)
+   tmp[i][k] = tmp[swap][k];
+}
+
+  for (i = 0; i  4; i++)
+for (j = 0; j  4; j++)
+  if (tmp[i][j] != out[i][j])
+   abort ();
+}
+
+int
+main ()
+{
+  invert ();
+  return 0;
+}
+
+/* { dg-final { cleanup-rtl-dump sms } } */


Re: [PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC

2011-12-20 Thread Revital Eres
Hello,

 FWIW, an alternative might be to test have_regs_of_mode[(int) mode].
 That says whether there are any allocatable (non-fixed) registers
 of the given mode.

Thanks, I'll prepare a new version of the patch using have_regs_of_mode.

Revital


 Richard


[PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2011-12-17 Thread Revital Eres
Hello,

The attached patch is a resubmission following comments made by Ayal
and Richard.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital


   2011-12-18  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182357)
+++ doc/invoke.texi (working copy)
@@ -373,6 +373,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6474,6 +6475,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h
===
--- modulo-sched.h  (revision 0)
+++ modulo-sched.h  (revision 0)
@@ -0,0 +1,120 @@
+/* Swing Modulo Scheduling implementation.
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 
+   Free Software Foundation, Inc.
+   Contributed by Revital Eres revital.e...@linaro.org 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#ifndef GCC_SMS_H
+#define GCC_SMS_H
+
+#include ddg.h
+
+extern HARD_REG_SET eliminable_regset;
+
+typedef struct partial_schedule *partial_schedule_ptr;
+
+typedef struct ps_insn *ps_insn_ptr;
+
+/* A single instruction in the partial schedule.  */
+struct ps_insn
+{
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
+  int id;
+
+  /* The (absolute) cycle in which the PS instruction is scheduled.
+ Same as SCHED_TIME (node).  */
+  int cycle;
+
+  /* The next/prev PS_INSN in the same row.  */
+  ps_insn_ptr next_in_row,
+ prev_in_row;
+
+};
+
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* The number

Re: [PATCH SMS 1/2, RFC] Support traversing PS in reverse order

2011-12-17 Thread Revital Eres
Hello,

 This patch support the estimation of register pressure in SMS.
 Although GCC is in stage 3 I would appreciate comments on it.
 Thanks to Richard and Ayal for discussing the implementation and their 
 insights.

 This part of the patch enables iterating on the partial schedule in the
 reverse order (from the last instruction the the first).

 Tested and bootstrap with the other patches in the series on
 ppc64-redhat-linux,
 enabling SMS on loops with SC 1.

 Comments are welcome.


 This looks fine. Please rename rows_reverse to rows_last as discussed,
 and simplify the bit that tracks last_in_row in ps_insn_find_column().
 Thanks,
 Ayal.


Thanks, attached is the new version of the patch following your comments.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Revital

* modulo-sched.c (rows_last): New field in struct partial_schedule.
(create_partial_schedule, free_partial_schedule,
ps_insert_empty_row, ps_insn_advance_column, remove_node_from_ps,
reset_partial_schedule, rotate_partial_schedule,
verify_partial_schedule): Update the new field.
(ps_insn_find_column): Likewise and remove last_in_row.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 181763)
+++ modulo-sched.c  (working copy)
@@ -177,6 +177,10 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0=iii).  */
   ps_insn_ptr *rows;
 
+  /* rows_last[i] points to the last insn in the linked list pointed
+ by rows[i].  */
+  ps_insn_ptr *rows_last;
+  
   /* All the moves added for this partial schedule.  Index X has
  a ps_insn id of X + g-num_nodes.  */
   VEC (ps_reg_move_info, heap) *reg_moves;
@@ -2272,6 +2276,7 @@ ps_insert_empty_row (partial_schedule_pt
 {
   ps_insn_ptr crr_insn;
   ps_insn_ptr *rows_new;
+  ps_insn_ptr *rows_last_new;
   int ii = ps-ii;
   int new_ii = ii + 1;
   int row;
@@ -2290,10 +2295,12 @@ ps_insert_empty_row (partial_schedule_pt
   rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 
   rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
+  rows_last_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
   rows_length_new = (int *) xcalloc (new_ii, sizeof (int));
   for (row = 0; row  split_row; row++)
 {
   rows_new[row] = ps-rows[row];
+  rows_last_new[row] = ps-rows_last[row];
   rows_length_new[row] = ps-rows_length[row];
   ps-rows[row] = NULL;
   for (crr_insn = rows_new[row];
@@ -2315,6 +2322,7 @@ ps_insert_empty_row (partial_schedule_pt
   for (row = split_row; row  ii; row++)
 {
   rows_new[row + 1] = ps-rows[row];
+  rows_last_new[row + 1] = ps-rows_last[row];
   rows_length_new[row + 1] = ps-rows_length[row];
   ps-rows[row] = NULL;
   for (crr_insn = rows_new[row + 1];
@@ -2337,6 +2345,8 @@ ps_insert_empty_row (partial_schedule_pt
 + (SMODULO (ps-max_cycle, ii) = split_row ? 1 : 0);
   free (ps-rows);
   ps-rows = rows_new;
+  free (ps-rows_last);
+  ps-rows_last = rows_last_new;
   free (ps-rows_length);
   ps-rows_length = rows_length_new;
   ps-ii = new_ii;
@@ -2428,6 +2438,9 @@ verify_partial_schedule (partial_schedul
 popcount (sched_nodes) == number of insns in ps.  */
  gcc_assert (SCHED_TIME (u) = ps-min_cycle);
  gcc_assert (SCHED_TIME (u) = ps-max_cycle);
+ if (ps-rows_length[row] == length)
+   gcc_assert (ps-rows_last[row] == crr_insn);
+
}
   
   gcc_assert (ps-rows_length[row] == length);
@@ -2837,6 +2850,7 @@ create_partial_schedule (int ii, ddg_ptr
 {
   partial_schedule_ptr ps = XNEW (struct partial_schedule);
   ps-rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
+  ps-rows_last = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
   ps-rows_length = (int *) xcalloc (ii, sizeof (int));
   ps-reg_moves = NULL;
   ps-ii = ii;
@@ -2885,6 +2899,7 @@ free_partial_schedule (partial_schedule_
 
   free_ps_insns (ps);
   free (ps-rows);
+  free (ps-rows_last);
   free (ps-rows_length);
   free (ps);
 }
@@ -2903,6 +2918,9 @@ reset_partial_schedule (partial_schedule
   ps-rows = (ps_insn_ptr *) xrealloc (ps-rows, new_ii
 * sizeof (ps_insn_ptr));
   memset (ps-rows, 0, new_ii * sizeof (ps_insn_ptr));
+  ps-rows_last = (ps_insn_ptr *) xrealloc (ps-rows_last, new_ii
+  * sizeof (ps_insn_ptr));
+  memset (ps-rows_last, 0, new_ii * sizeof (ps_insn_ptr));
   ps-rows_length = (int *) xrealloc (ps-rows_length, new_ii * sizeof (int));
   memset (ps-rows_length, 0, new_ii * sizeof (int));
   ps-ii = new_ii;
@@ -2960,6 +2978,10 @@ remove_node_from_ps (partial_schedule_pt
   gcc_assert (ps  ps_i);
   
   row = SMODULO (ps_i-cycle, ps-ii);
+
+  if (! ps_i-next_in_row)
+ps-rows_last[row] = ps_i-prev_in_row;
+  
   if (! 

[PATCH, SMS] Add missing free operation in mark_loop_unsched

2011-12-11 Thread Revital Eres
Hello,

The patch below adds a missing free operation in mark_loop_unsched.

Tested (bootstrap and regtest) ppc64-redhat-linux.

OK for 3.7?

Thanks,
Revital


Changelog:

* modulo-sched.c (mark_loop_unsched): Free bbs.

Index: modulo-sched.c
===
--- modulo-sched.c  (revision 182198)
+++ modulo-sched.c  (working copy)
@@ -1204,6 +1204,8 @@ mark_loop_unsched (struct loop *loop)

   for (i = 0; i  loop-num_nodes; i++)
 bbs[i]-flags |= BB_DISABLE_SCHEDULE;
+
+  free (bbs);
 }

 /* Return true if all the BBs of the loop are empty except the


Re: [PATCH, SMS] Add missing free operation in mark_loop_unsched

2011-12-11 Thread Revital Eres
Hello,

 OK for 3.7?

Sorry, I meant GCC 4.7.0...

Thanks,
Revital


[PING][PR testsuite/47013] Fix SMS testsuite faliures

2011-12-05 Thread Revital Eres
Hello,

Ping:  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02444.html

Thanks,
Revital


[PR testsuite/47013] Fix SMS testsuite faliures (re-submission)

2011-11-26 Thread Revital Eres
Hello,

Attached is a new version of the patch.

Thanks to Dominique Dhumieres for testing on
powerpc-apple-darwin9.
Tested ppc64-redhat-linux on with both -m32,-m64 and SPU.

OK for mainline?

Thanks,
Revital

testsuite/Changelog

PR rtl-optimization/47013
* gcc.dg/sms-2.c: Change scan-tree-dump-times and the code itself
to preserve the function.
* gcc.dg/sms-6.c: Add --param sms-min-sc=1. Add dg-options for
powerpc*-*-*.  Avoid superfluous spaces in dg-final.
* gcc.dg/sms-3.c: Add --param sms-min-sc=1 and
-fmodulo-sched-allow-regmoves flags.
* gcc.dg/sms-7.c: Likewise. Remove dg-final for powerpc*-*-*
and avoid superfluous spaces in dg-final for spu-*-*.
* gcc.dg/sms-4.c: Add dg-options for powerpc*-*-*.
* gcc.dg/sms-8.c: Add --param sms-min-sc=1.  Add dg-options and
change scan-rtl-dump-times for powerpc*-*-*.
* gcc.dg/sms-5.c: Add --param sms-min-sc=1 flag, remove
powerpc*-*-* from dg-final and avoid superfluous spaces in
dg-final.
* gcc.dg/sms-9.c: Remove -fno-auto-inc-dec.
Index: testsuite/gcc.dg/sms-2.c
===
--- testsuite/gcc.dg/sms-2.c(revision 181698)
+++ testsuite/gcc.dg/sms-2.c(working copy)
@@ -4,12 +4,11 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
 
-
+int th, h, em, nlwm, nlwS, nlw, sy;
 void
 fun (nb)
  int nb;
 {
-  int th, h, em, nlwm, nlwS, nlw, sy;
 
   while (nb--)
 while (h--)
@@ -33,5 +32,5 @@ fun (nb)
   }
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS loop many exits 1 sms { target 
spu-*-* powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
Index: testsuite/gcc.dg/sms-6.c
===
--- testsuite/gcc.dg/sms-6.c(revision 181698)
+++ testsuite/gcc.dg/sms-6.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms --param sms-min-sc=1 } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms --param sms-min-sc=1 
-fmodulo-sched-allow-regmoves { target powerpc*-*-* } } */
 
 extern void abort (void);
 
@@ -43,7 +44,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms  { target spu-*-* 
} } } */
-/* { dg-final { scan-rtl-dump-times SMS succeeded 3  sms { target 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* } 
} } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 3 sms { target 
powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
Index: testsuite/gcc.dg/sms-3.c
===
--- testsuite/gcc.dg/sms-3.c(revision 181698)
+++ testsuite/gcc.dg/sms-3.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-7.c
===
--- testsuite/gcc.dg/sms-7.c(revision 181698)
+++ testsuite/gcc.dg/sms-7.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms } */
+/* { dg-options -O3 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms 
-fmodulo-sched-allow-regmoves --param sms-min-sc=1 } */
 
 extern void abort (void);
 
@@ -44,7 +44,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms  { target spu-*-* 
} } } */
-/* { dg-final { scan-rtl-dump-times SMS succeeded 3  sms { target 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* } 
} } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
Index: testsuite/gcc.dg/sms-4.c
===
--- testsuite/gcc.dg/sms-4.c(revision 181698)
+++ testsuite/gcc.dg/sms-4.c(working copy)
@@ -1,6 +1,7 @@
 /* Inspired from sbitmap_a_or_b_and_c_cg function in sbitmap.c.  */
 /* { dg-do run } */
 /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 { target powerpc*-*-* } } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-8.c
===
--- testsuite/gcc.dg/sms-8.c(revision 181698)
+++ testsuite/gcc.dg/sms-8.c(working copy)
@@ -3,7 +3,8 @@
 that was not fixed by reg-moves.  */
 
  /* { dg-do run } */
- /* { dg-options -O2 -fmodulo-sched 

[PATCH SMS 1/2, RFC] Support traversing PS in reverse order

2011-11-20 Thread Revital Eres
Hello,

This patch support the estimation of register pressure in SMS.
Although GCC is in stage 3 I would appreciate comments on it.
Thanks to Richard and Ayal for discussing the implementation and their insights.

This part of the patch enables iterating on the partial schedule in the
reverse order (from the last instruction the the first).

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux,
enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital

Changelog:
* modulo-sched.c (rows_reverse): New field in struct partial_schedule.
(create_partial_schedule, free_partial_schedule,
ps_insert_empty_row, ps_insn_advance_column,
ps_insn_find_column, remove_node_from_ps, reset_partial_schedule,
rotate_partial_schedule, verify_partial_schedule): Update the
new field.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 181149)
+++ modulo-sched.c  (working copy)
@@ -177,6 +177,10 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0=iii).  */
   ps_insn_ptr *rows;
 
+  /* rows_reverse[i] points to the last insn in the linked list pointed
+ by rows[i].  */
+  ps_insn_ptr *rows_reverse;
+  
   /* All the moves added for this partial schedule.  Index X has
  a ps_insn id of X + g-num_nodes.  */
   VEC (ps_reg_move_info, heap) *reg_moves;
@@ -2272,6 +2276,7 @@ ps_insert_empty_row (partial_schedule_pt
 {
   ps_insn_ptr crr_insn;
   ps_insn_ptr *rows_new;
+  ps_insn_ptr *rows_reverse_new;
   int ii = ps-ii;
   int new_ii = ii + 1;
   int row;
@@ -2290,10 +2295,12 @@ ps_insert_empty_row (partial_schedule_pt
   rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 
   rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
+  rows_reverse_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
   rows_length_new = (int *) xcalloc (new_ii, sizeof (int));
   for (row = 0; row  split_row; row++)
 {
   rows_new[row] = ps-rows[row];
+  rows_reverse_new[row] = ps-rows_reverse[row];
   rows_length_new[row] = ps-rows_length[row];
   ps-rows[row] = NULL;
   for (crr_insn = rows_new[row];
@@ -2315,6 +2322,7 @@ ps_insert_empty_row (partial_schedule_pt
   for (row = split_row; row  ii; row++)
 {
   rows_new[row + 1] = ps-rows[row];
+  rows_reverse_new[row + 1] = ps-rows_reverse[row];
   rows_length_new[row + 1] = ps-rows_length[row];
   ps-rows[row] = NULL;
   for (crr_insn = rows_new[row + 1];
@@ -2337,6 +2345,8 @@ ps_insert_empty_row (partial_schedule_pt
 + (SMODULO (ps-max_cycle, ii) = split_row ? 1 : 0);
   free (ps-rows);
   ps-rows = rows_new;
+  free (ps-rows_reverse);
+  ps-rows_reverse = rows_reverse_new;
   free (ps-rows_length);
   ps-rows_length = rows_length_new;
   ps-ii = new_ii;
@@ -2428,6 +2438,9 @@ verify_partial_schedule (partial_schedul
 popcount (sched_nodes) == number of insns in ps.  */
  gcc_assert (SCHED_TIME (u) = ps-min_cycle);
  gcc_assert (SCHED_TIME (u) = ps-max_cycle);
+ if (ps-rows_length[row] == length)
+   gcc_assert (ps-rows_reverse[row] == crr_insn);
+
}
   
   gcc_assert (ps-rows_length[row] == length);
@@ -2837,6 +2850,7 @@ create_partial_schedule (int ii, ddg_ptr
 {
   partial_schedule_ptr ps = XNEW (struct partial_schedule);
   ps-rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
+  ps-rows_reverse = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
   ps-rows_length = (int *) xcalloc (ii, sizeof (int));
   ps-reg_moves = NULL;
   ps-ii = ii;
@@ -2885,6 +2899,7 @@ free_partial_schedule (partial_schedule_
 
   free_ps_insns (ps);
   free (ps-rows);
+  free (ps-rows_reverse);
   free (ps-rows_length);
   free (ps);
 }
@@ -2903,6 +2918,9 @@ reset_partial_schedule (partial_schedule
   ps-rows = (ps_insn_ptr *) xrealloc (ps-rows, new_ii
 * sizeof (ps_insn_ptr));
   memset (ps-rows, 0, new_ii * sizeof (ps_insn_ptr));
+  ps-rows_reverse = (ps_insn_ptr *) xrealloc (ps-rows_reverse, new_ii
+  * sizeof (ps_insn_ptr));
+  memset (ps-rows_reverse, 0, new_ii * sizeof (ps_insn_ptr));
   ps-rows_length = (int *) xrealloc (ps-rows_length, new_ii * sizeof (int));
   memset (ps-rows_length, 0, new_ii * sizeof (int));
   ps-ii = new_ii;
@@ -2960,6 +2978,15 @@ remove_node_from_ps (partial_schedule_pt
   gcc_assert (ps  ps_i);
   
   row = SMODULO (ps_i-cycle, ps-ii);
+
+  if (! ps_i-next_in_row)
+{
+  if (ps_i-prev_in_row)
+   ps-rows_reverse[row] = ps_i-prev_in_row;
+  else
+   ps-rows_reverse[row] = NULL;
+}
+  
   if (! ps_i-prev_in_row)
 {
   gcc_assert (ps_i == ps-rows[row]);
@@ -3048,6 +3075,8 @@ ps_insn_find_column (partial_schedule_pt
}
   else
ps-rows[row] = ps_i;
+
+  ps-rows_reverse[row] = ps_i;
   return 

[PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule

2011-11-20 Thread Revital Eres
Hello,

The attached patch adds register pressure estimation of the partial schedule.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux,
enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital

Changelog:
* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c (get_regno_pressure_class): Here.
* common.opt (fmodulo-sched-reg-pressure): New flag.
* doc/invoke.texi (fmodulo-sched-reg-pressure): Document it.
* ira.h (get_regno_pressure_class): Declare.
* rtl.h (set_reg_allocno_class): Declare.
* reginfo.c (set_reg_allocno_class): New function.
* Makefile.in (modulo-sched.o): Include ira.h.
* modulo-sched.c (ira.h): New include.
(rtl_insn_ps, undo_reg_moves, mark_def_regs, mark_reg_use,
mark_reg_use_1, insn_exists_in_epilog_p, calc_lr_out_regs,
change_pressure, update_reg_moves_pressure_info,
initiate_reg_pressure_info, mark_regno_live, mark_reg_birth_1,
mark_reg_birth, mark_regno_death, mark_ref_regs,
calc_insn_reg_pressure_info, calc_reg_pressure, free_loop_data,
free_reg_pressure_info, ps_reg_pressure_p): New functions.
(apply_reg_moves): Add parameter.
(curr_regs_live, curr_reg_pressure, curr_loop): New
data-structures.
(loop_data): New struct.
(LOOP_DATA): New definition.
(sms_schedule): Use register pressure estimation.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 181149)
+++ doc/invoke.texi (working copy)
@@ -373,6 +373,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6457,6 +6458,11 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Perform SMS based modulo scheduling with register pressure estimation.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: loop-invariant.c
===
--- loop-invariant.c(revision 181149)
+++ loop-invariant.c(working copy)
@@ -1619,34 +1619,6 @@ static rtx regs_set[(FIRST_PSEUDO_REGIST
 /* Number of regs stored in the previous array.  */
 static int n_regs_set;
 
-/* Return pressure class and number of needed hard registers (through
-   *NREGS) of register REGNO.  */
-static enum reg_class
-get_regno_pressure_class (int regno, int *nregs)
-{
-  if (regno = FIRST_PSEUDO_REGISTER)
-{
-  enum reg_class pressure_class;
-
-  pressure_class = reg_allocno_class (regno);
-  pressure_class = ira_pressure_class_translate[pressure_class];
-  *nregs
-   = ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
-  return pressure_class;
-}
-  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
-   ! TEST_HARD_REG_BIT (eliminable_regset, regno))
-{
-  *nregs = 1;
-  return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
-}
-  else
-{
-  *nregs = 0;
-  return NO_REGS;
-}
-}
-
 /* Increase (if INCR_P) or decrease current register pressure for
register REGNO.  */
 static void
Index: common.opt
===
--- common.opt  (revision 181149)
+++ common.opt  (working copy)
@@ -1457,6 +1457,10 @@ fmodulo-sched-allow-regmoves
 Common Report Var(flag_modulo_sched_allow_regmoves)
 Perform SMS based modulo scheduling with register moves allowed
 
+fmodulo-sched-reg-pressure
+Common Report Var(flag_modulo_sched_reg_pressure)
+Perform SMS based modulo scheduling with regsiter pressure estimation.
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops
Index: ira.c
===
--- ira.c   (revision 181149)
+++ ira.c   (working copy)
@@ -3784,6 +3784,34 @@ ira (FILE *f)
   timevar_pop (TV_IRA);
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno = FIRST_PSEUDO_REGISTER)
+{
+  enum reg_class pressure_class;
+
+  pressure_class = reg_allocno_class (regno);
+   

Re: [PATCH, SMS] Fix marking of SMSed loops as BB_DISABLE_SCHEDULE

2011-10-28 Thread Revital Eres
Hello,

 Tested and bootstrap on all languages except java (PR50879) on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.

 OK for mainline?


 OK, seems reasonable.

 Please fix typo (in original comment):
 -            scheduling passes doesn't touch it.  */
 +            scheduling passes don't touch it.  */

I realize that I forgot to guard the marking in the epilogue and
prologue with -fresched-modulo-sched, sorry about that. I am testing
the attached patch and will commit it after testing completes if there
are no further comments.

Thanks,
Revital
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 180557)
+++ modulo-sched.c  (working copy)
@@ -1173,6 +1173,8 @@ generate_prolog_epilog (partial_schedule
   /* Put the prolog on the entry edge.  */
   e = loop_preheader_edge (loop);
   split_edge_and_insert (e, get_insns ());
+  if (!flag_resched_modulo_sched)
+e-dest-flags |= BB_DISABLE_SCHEDULE;
 
   end_sequence ();
 
@@ -1186,9 +1188,24 @@ generate_prolog_epilog (partial_schedule
   gcc_assert (single_exit (loop));
   e = single_exit (loop);
   split_edge_and_insert (e, get_insns ());
+  if (!flag_resched_modulo_sched)
+e-dest-flags |= BB_DISABLE_SCHEDULE;
+
   end_sequence ();
 }
 
+/* Mark LOOP as software pipelined so the later
+   scheduling passes don't touch it.  */
+static void
+mark_loop_unsched (struct loop *loop)
+{
+  unsigned i;
+  basic_block *bbs = get_loop_body (loop);
+
+  for (i = 0; i  loop-num_nodes; i++)
+bbs[i]-flags |= BB_DISABLE_SCHEDULE;
+}
+
 /* Return true if all the BBs of the loop are empty except the
loop header.  */
 static bool
@@ -1714,9 +1731,10 @@ sms_schedule (void)
  permute_partial_schedule (ps, g-closing_branch-first_note);
 
   /* Mark this loop as software pipelined so the later
-scheduling passes doesn't touch it.  */
+scheduling passes don't touch it.  */
  if (! flag_resched_modulo_sched)
-   g-bb-flags |= BB_DISABLE_SCHEDULE;
+   mark_loop_unsched (loop);
+ 
  /* The life-info is not valid any more.  */
  df_set_bb_dirty (g-bb);
 


[PATCH, SMS] Fix marking of SMSed loops as BB_DISABLE_SCHEDULE

2011-10-27 Thread Revital Eres
Hello,

The attach patch fixes the current marking of SMS loops to prevent
further scheduling as follows: it marks *all* the loop's bbs with
BB_DISABLE_SCHEDULE which prevents them from been scheduled later.
(with the current implementation if the loop has non empty latch then
it will be considered for scheduling based on
sched_is_disabled_for_current_region_p () in in sched-rgn.c).
It also marks the epilogue and prologue as BB_DISABLE_SCHEDULE which
was shown in my experiments as a good influence on performance as
scheduling those regions after SMS increased register pressure in some
cases.

Tested and bootstrap on all languages except java (PR50879) on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

OK for mainline?

Thanks,
Revital


Changelog:

* modulo-sched.c (generate_prolog_epilog): Mark prolog and epilog
as BB_DISABLE_SCHEDULE.
(mark_loop_unsched): New function.
(sms_schedule): Call it.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 180557)
+++ modulo-sched.c  (working copy)
@@ -1173,6 +1173,7 @@ generate_prolog_epilog (partial_schedule
   /* Put the prolog on the entry edge.  */
   e = loop_preheader_edge (loop);
   split_edge_and_insert (e, get_insns ());
+  e-dest-flags |= BB_DISABLE_SCHEDULE;
 
   end_sequence ();
 
@@ -1186,9 +1187,23 @@ generate_prolog_epilog (partial_schedule
   gcc_assert (single_exit (loop));
   e = single_exit (loop);
   split_edge_and_insert (e, get_insns ());
+  e-dest-flags |= BB_DISABLE_SCHEDULE;
+
   end_sequence ();
 }
 
+/* Mark LOOP as software pipelined so the later
+   scheduling passes doesn't touch it.  */
+static void
+mark_loop_unsched (struct loop *loop)
+{
+  unsigned i;
+  basic_block *bbs = get_loop_body (loop);
+
+  for (i = 0; i  loop-num_nodes; i++)
+bbs[i]-flags |= BB_DISABLE_SCHEDULE;
+}
+
 /* Return true if all the BBs of the loop are empty except the
loop header.  */
 static bool
@@ -1716,7 +1731,8 @@ sms_schedule (void)
   /* Mark this loop as software pipelined so the later
 scheduling passes doesn't touch it.  */
  if (! flag_resched_modulo_sched)
-   g-bb-flags |= BB_DISABLE_SCHEDULE;
+   mark_loop_unsched (loop);
+ 
  /* The life-info is not valid any more.  */
  df_set_bb_dirty (g-bb);
 


Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-30 Thread Revital Eres
Hello,

 This
 +  /* Skip instructions that do not set a register.  */
 +  if (set  !REG_P (SET_DEST (set)))
 +    continue;
 is ok. Can you also prevent !set insns from having reg_moves? (To be updated
 once auto_inc insns will be supported, if they'll deserve reg_moves too.)

I added a check to verify that no reg-moves are created for !set instructions.

Currently re-testing on ppc64-redhat-linux (bootstrap and regtest) and
arm-linux-gnueabi (bootstrap c).

OK to commit once tesing completes?

Thanks,
Revital

gcc/
* modulo-sched.c (generate_reg_moves): Skip instructions that
do not set a register and verify no regmoves are created for
!single_set instructions.


testsuite/
 * gcc.dg/sms-10.c: New file.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 179138)
+++ modulo-sched.c  (working copy)
@@ -476,7 +476,12 @@ generate_reg_moves (partial_schedule_ptr
   sbitmap *uses_of_defs;
   rtx last_reg_move;
   rtx prev_reg, old_reg;
-
+  rtx set = single_set (u-insn);
+  
+  /* Skip instructions that do not set a register.  */
+  if ((set  !REG_P (SET_DEST (set
+continue;
+ 
   /* Compute the number of reg_moves needed for u, by looking at life
 ranges started at u (excluding self-loops).  */
   for (e = u-out; e; e = e-next_out)
@@ -493,6 +498,16 @@ generate_reg_moves (partial_schedule_ptr
 SCHED_COLUMN (e-dest)  SCHED_COLUMN (e-src))
  nreg_moves4e--;
 
+if (nreg_moves4e = 1)
+ {
+   /* !single_set instructions are not supported yet and
+  thus we do not except to encounter them in the loop
+  except from the doloop part.  For the later case
+  we assume no regmoves are generated as the doloop
+  instructions are tied to the branch with an edge.  */
+   gcc_assert (set);
+ }
+   
nreg_moves = MAX (nreg_moves, nreg_moves4e);
  }
 
 /* { dg-do run } */
 /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */


typedef __SIZE_TYPE__ size_t;
extern void *malloc (size_t);
extern void free (void *);
extern void abort (void);

struct regstat_n_sets_and_refs_t
{
  int sets;
  int refs;
};

struct regstat_n_sets_and_refs_t *regstat_n_sets_and_refs;

struct df_reg_info
{
  unsigned int n_refs;
};

struct df_d
{
  struct df_reg_info **def_regs;
  struct df_reg_info **use_regs;
};
struct df_d *df;

static inline int
REG_N_SETS (int regno)
{
  return regstat_n_sets_and_refs[regno].sets;
}

__attribute__ ((noinline))
 int max_reg_num (void)
{
  return 100;
}

__attribute__ ((noinline))
 void regstat_init_n_sets_and_refs (void)
{
  unsigned int i;
  unsigned int max_regno = max_reg_num ();

  for (i = 0; i  max_regno; i++)
{
  (regstat_n_sets_and_refs[i].sets = (df-def_regs[(i)]-n_refs));
  (regstat_n_sets_and_refs[i].refs =
   (df-use_regs[(i)]-n_refs) + REG_N_SETS (i));
}
}

int a_sets[100] =
  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
  40, 41, 42,
  43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
  62, 63, 64,
  65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
  84, 85, 86,
  87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99
};

int a_refs[100] =
  { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,
  40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76,
  78, 80, 82,
  84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116,
  118, 120,
  122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
  152, 154, 156,
  158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186,
  188, 190, 192,
  194, 196, 198
};

int
main ()
{
  struct df_reg_info *b[100], *c[100];
  struct df_d df1;
  size_t s = sizeof (struct df_reg_info);
  struct regstat_n_sets_and_refs_t a[100];

  df = df1;
  regstat_n_sets_and_refs = a;
  int i;

  for (i = 0; i  100; i++)
{
  b[i] = (struct df_reg_info *) malloc (s);
  b[i]-n_refs = i;
  c[i] = (struct df_reg_info *) malloc (s);
  c[i]-n_refs = i;
}

  df1.def_regs = b;
  df1.use_regs = c;
  regstat_init_n_sets_and_refs ();

  for (i = 0; i  100; i++)
if ((a[i].sets != a_sets[i]) || (a[i].refs != a_refs[i]))
  abort ();

  for (i = 0; i  100; i++)
{
  free (b[i]);
  free (c[i]);
}

  return 0;
}

/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target 
powerpc*-*-* } } } */
/* { dg-final { cleanup-rtl-dump sms } } */


Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-30 Thread Revital Eres
Hello,

 OK, with the following comments:

 Make sure reg_moves are generated for the correct (result, not addr)
 register, in generate_reg_moves().

 beenbeing (multiple appearances).

 Add a note that autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
 doesn't need to consider the specific address register; no reg_moves
 will be allowed for any life range defined by def_insn and used by
 use_insn, if use_insn uses an address register auto-inc'ed by
 def_insn.

Attached is the version of the patch which addresses your comments.

Currently re-testing on ppc64-redhat-linux (bootstrap and regtest) and
arm-linux-gnueabi (bootstrap c).

I'll commit it once tesing completes if there if no further changes required.

Thanks,
Revital

* ddg.c (autoinc_var_is_used_p): New function.
(create_ddg_dep_from_intra_loop_link,
add_cross_iteration_register_deps): Call it.
* ddg.h (autoinc_var_is_used_p): Declare.
* modulo-sched.c (generate_reg_moves): Call autoinc_var_is_used_p.
(sms_schedule): Handle instructions with REG_INC.
Index: ddg.c
===
--- ddg.c   (revision 179138)
+++ ddg.c   (working copy)
@@ -145,6 +145,27 @@ mem_access_insn_p (rtx insn)
   return rtx_mem_access_p (PATTERN (insn));
 }
 
+/* Return true if DEF_INSN contains address being auto-inc or auto-dec
+   which is used in USE_INSN.  Otherwise return false.  The result is
+   being used to decide whether to remove the edge between def_insn and
+   use_insn when -fmodulo-sched-allow-regmoves is set.  This function
+   doesn't need to consider the specific address register; no reg_moves
+   will be allowed for any life range defined by def_insn and used
+   by use_insn, if use_insn uses an address register auto-inc'ed by
+   def_insn.  */
+bool
+autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
+{
+  rtx note;
+
+  for (note = REG_NOTES (def_insn); note; note = XEXP (note, 1))
+if (REG_NOTE_KIND (note) == REG_INC
+reg_referenced_p (XEXP (note, 0), PATTERN (use_insn)))
+  return true;
+
+  return false;
+}
+
 /* Computes the dependence parameters (latency, distance etc.), creates
a ddg_edge and adds it to the given DDG.  */
 static void
@@ -173,10 +194,15 @@ create_ddg_dep_from_intra_loop_link (ddg
  compensate for that by generating reg-moves based on the life-range
  analysis.  The anti-deps that will be deleted are the ones which
  have true-deps edges in the opposite direction (in other words
- the kernel has only one def of the relevant register).  TODO:
- support the removal of all anti-deps edges, i.e. including those
+ the kernel has only one def of the relevant register).
+ If the address that is being auto-inc or auto-dec in DEST_NODE
+ is used in SRC_NODE then do not remove the edge to make sure
+ reg-moves will not be created for this address.  
+ TODO: support the removal of all anti-deps edges, i.e. including those
  whose register has multiple defs in the loop.  */
-  if (flag_modulo_sched_allow_regmoves  (t == ANTI_DEP  dt == REG_DEP))
+  if (flag_modulo_sched_allow_regmoves 
+   (t == ANTI_DEP  dt == REG_DEP)
+   !autoinc_var_is_used_p (dest_node-insn, src_node-insn))
 {
   rtx set;
 
@@ -302,10 +328,14 @@ add_cross_iteration_register_deps (ddg_p
  gcc_assert (first_def_node);
 
  /* Always create the edge if the use node is a branch in
-order to prevent the creation of reg-moves.  */
+order to prevent the creation of reg-moves.  
+If the address that is being auto-inc or auto-dec in LAST_DEF
+is used in USE_INSN then do not remove the edge to make sure
+reg-moves will not be created for that address.  */
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
   || !flag_modulo_sched_allow_regmoves
- || JUMP_P (use_node-insn))
+ || JUMP_P (use_node-insn)
+  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);
 
Index: ddg.h
===
--- ddg.h   (revision 179138)
+++ ddg.h   (working copy)
@@ -186,4 +186,6 @@ void free_ddg_all_sccs (ddg_all_sccs_ptr
 int find_nodes_on_paths (sbitmap result, ddg_ptr, sbitmap from, sbitmap to);
 int longest_simple_path (ddg_ptr, int from, int to, sbitmap via);
 
+bool autoinc_var_is_used_p (rtx, rtx);
+
 #endif /* GCC_DDG_H */
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 179138)
+++ modulo-sched.c  (working copy)
@@ -506,6 +506,10 @@ generate_reg_moves (partial_schedule_ptr
   we assume no regmoves are generated as the doloop
   instructions are tied to the branch with an edge. 

Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-27 Thread Revital Eres
Hello,

 ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
 addr register and another (say, result) register, we want to allow the
 result register to have life ranges in excess of ii (by eliminating
 anti-dep edges of distance 1 from uses to def, and generating
 reg_moves if/where needed), but avoid having such life ranges of addr
 (by retaining such anti-dep edges). Right?

Yes.

 Are these all the edges? We have only one True dependence edge from
 insn 1 to insn 2, but insn 1 is setting two registers both used by
 insn 2 (regardless of what we decide to do with Anti-deps). As for
 Anti-deps of distance 1, we have only one going back from insn 2 to
 insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
 But, it won't help def1, because this other Anti-dep will force them
 to be scheduled w/o reg_moves.

Please ignore the edges in the previous example. It indeed was a mistake,
sorry about the confusion.  Here are two examples taken from bootstrap
on PPC of how the address is used; with the current patch applied and
running with -fmodulo-sched-allow-regmoves:

Node num: 2
(insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
offset: 0B] ])
(mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
358 {*movqi_internal}
 (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
(nil)))
OUT ARCS:  [3681 -(T,2,1)- 3681]  [3681 -(T,2,0)- 3682]
IN ARCS:  [3682 -(A,0,1)- 3681]  [3681 -(T,2,1)- 3681]  [3682
-(A,0,1)- 3681]  [3682 -(T,2,1)- 3681]
Node num: 3
(insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
(const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
3B]+0 S1 A8])
(reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
 (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
(nil)))
OUT ARCS:  [3682 -(A,0,1)- 3681]  [3682 -(A,0,1)- 3681]  [3682
-(O,0,0)- 7263]  [3682 -(A,0,0)- 3683]  [3682 -(T,2,1)- 3681]
IN ARCS:  [3681 -(T,2,0)- 3682]

Another example of usage is as follows (the address register is not
used in MEM):

Node num: 0
(insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
(reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
{*movdi_internal64}
 (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
(nil)))
OUT ARCS:  [1419 -(T,2,1)- 1419]  [1419 -(O,0,0)- 5932]  [1419
-(O,0,0)- 1449]  [1419 -(T,2,1)- 1434]  [1419 -(T,2,0)- 1434]
[1419 -(T,2,0)- 1432]  [1419 -(O,0,0)- 1431]  [1419 -(O,0,0)- 1427]
 [1419 -(O,0,0)- 1423]
IN ARCS:  [1419 -(T,2,1)- 1419]  [1432 -(A,0,1)- 1419]  [1449
-(O,0,1)- 1419]  [1434 -(A,0,1)- 1419]  [1431 -(O,0,1)- 1419]
[1427 -(O,0,1)- 1419]  [1423 -(O,0,1)- 1419]
Node num: 4
(insn 1432 1431 1433 9 (set (reg:DI 2632)
(plus:DI (reg/v/f:DI 1058 [ reg_info ])
(reg:DI 1882 [ ivtmp.1636 ])))
../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
 (nil))
OUT ARCS:  [1432 -(A,0,1)- 1419]  [1432 -(T,1,0)- 1433]
IN ARCS:  [1419 -(T,2,0)- 1432]

 OK for mainline?


 OK, with the following comments:

Thanks, will address the comments and re-submit.

 In other words, one would expect to see two Anti-dep edges from insn 2
 to insn 1, right?

Yes, that's indeed the case in the first example above.

Thanks,
Revital


Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-27 Thread Revital Eres
Hello,

 This
 +  /* Skip instructions that do not set a register.  */
 +  if (set  !REG_P (SET_DEST (set)))
 +    continue;
 is ok. Can you also prevent !set insns from having reg_moves? (To be updated
 once auto_inc insns will be supported, if they'll deserve reg_moves too.)

Do you mean leaving any anti-dep edges to !set instructions similar to
what is done for auto_inc addresses in part 2 of this patch?

Thanks,
Revital


Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-27 Thread Revital Eres
Hello,

 Ok, this does have two anti-dep edges. But still, only a single true
 dependence(?) ... can you see why?

The intra edge [3681 -(T,2,0)- 3682] was created by haifa-sched and I guess
that because both of the expected true-dep edges (one for the target
and one for the address) are identical only one is created.  The rest
of the inter edges were created in ddg.c where we do not check for
multiply identical edges.

Thanks,
Revital




 Thanks,
 Ayal.


 Node num: 2
 (insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
 offset: 0B] ])
        (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
 D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
 358 {*movqi_internal}
     (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
        (nil)))
 OUT ARCS:  [3681 -(T,2,1)- 3681]  [3681 -(T,2,0)- 3682]
 IN ARCS:  [3682 -(A,0,1)- 3681]  [3681 -(T,2,1)- 3681]  [3682
 -(A,0,1)- 3681]  [3682 -(T,2,1)- 3681]
 Node num: 3
 (insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
                (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
 3B]+0 S1 A8])
        (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
 ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
     (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
        (nil)))
 OUT ARCS:  [3682 -(A,0,1)- 3681]  [3682 -(A,0,1)- 3681]  [3682
 -(O,0,0)- 7263]  [3682 -(A,0,0)- 3683]  [3682 -(T,2,1)- 3681]
 IN ARCS:  [3681 -(T,2,0)- 3682]

 Another example of usage is as follows (the address register is not
 used in MEM):

 Node num: 0
 (insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
 ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
        (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
 {*movdi_internal64}
     (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
        (nil)))
 OUT ARCS:  [1419 -(T,2,1)- 1419]  [1419 -(O,0,0)- 5932]  [1419
 -(O,0,0)- 1449]  [1419 -(T,2,1)- 1434]  [1419 -(T,2,0)- 1434]
 [1419 -(T,2,0)- 1432]  [1419 -(O,0,0)- 1431]  [1419 -(O,0,0)- 1427]
  [1419 -(O,0,0)- 1423]
 IN ARCS:  [1419 -(T,2,1)- 1419]  [1432 -(A,0,1)- 1419]  [1449
 -(O,0,1)- 1419]  [1434 -(A,0,1)- 1419]  [1431 -(O,0,1)- 1419]
 [1427 -(O,0,1)- 1419]  [1423 -(O,0,1)- 1419]
 Node num: 4
 (insn 1432 1431 1433 9 (set (reg:DI 2632)
        (plus:DI (reg/v/f:DI 1058 [ reg_info ])
            (reg:DI 1882 [ ivtmp.1636 ])))
 ../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
     (nil))
 OUT ARCS:  [1432 -(A,0,1)- 1419]  [1432 -(T,1,0)- 1433]
 IN ARCS:  [1419 -(T,2,0)- 1432]

 OK for mainline?


 OK, with the following comments:

 Thanks, will address the comments and re-submit.

 In other words, one would expect to see two Anti-dep edges from insn 2
 to insn 1, right?

 Yes, that's indeed the case in the first example above.

 Thanks,
 Revital




[PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-25 Thread Revital Eres
Hello,

The attached patch contains a fix to generate_reg_moves
function. Currently we can generate reg-moves for stores which are later
eliminated.  This happens when we have mem dependency with distance 1
and as a result the number of regmoves is at least 1 based on the
following
calculation taken from generate_reg_moves ():

if (e-distance == 1)
   nreg_moves4e = (SCHED_TIME (e-dest) - SCHED_TIME (e-src) + ii) / ii;

This is an example of register move generated in such cases:

 reg_move = (insn 152 119 75 4 (set (reg:SI 231)
(mem:SI (pre_modify:DI (reg:DI 215)
(plus:DI (reg:DI 215)
(reg:DI 171 [ ivtmp.42 ]))) [3 MEM[base:
pretmp.27_65, index: ivtmp.42_9, offset: 0B]+0 S4 A32])) -1
 (nil))

When not handling REG_INC instructions this was not a problem as these
reg-moves were removes by dead code elimination.
for example:

insn 1) mem[x]  = ...
insn 2) .. = mem[y]

When reg-move reg1 = mem [x] was generated mem[x] is not been used in
insn 2 and thus reg1 could be eliminated.
But with REG_INC this is different because the reg-move instruction
remains and leads to bad gen.
The attached tescase capture this case.

Tested and bootstrap with patch 2 on ppc64-redhat-linux
enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap c on
top of the set of patches that support do-loop pattern
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
the bootstrap failure on ARM with SMS flags.

OK for mainline?

Thanks,
Revital

gcc/
* modulo-sched.c (generate_reg_moves): Skip instructions that
do not set a register.


testsuite/
 * gcc.dg/sms-10.c: New file.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 179138)
+++ modulo-sched.c  (working copy)
@@ -476,7 +476,12 @@ generate_reg_moves (partial_schedule_ptr
   sbitmap *uses_of_defs;
   rtx last_reg_move;
   rtx prev_reg, old_reg;
-
+  rtx set = single_set (u-insn);
+  
+  /* Skip instructions that do not set a register.  */
+  if (set  !REG_P (SET_DEST (set)))
+continue;
+  
   /* Compute the number of reg_moves needed for u, by looking at life
 ranges started at u (excluding self-loops).  */
   for (e = u-out; e; e = e-next_out)
 /* { dg-do run } */
 /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */


typedef __SIZE_TYPE__ size_t;
extern void *malloc (size_t);
extern void free (void *);
extern void abort (void);

struct regstat_n_sets_and_refs_t
{
  int sets;
  int refs;
};

struct regstat_n_sets_and_refs_t *regstat_n_sets_and_refs;

struct df_reg_info
{
  unsigned int n_refs;
};

struct df_d
{
  struct df_reg_info **def_regs;
  struct df_reg_info **use_regs;
};
struct df_d *df;

static inline int
REG_N_SETS (int regno)
{
  return regstat_n_sets_and_refs[regno].sets;
}

__attribute__ ((noinline))
 int max_reg_num (void)
{
  return 100;
}

__attribute__ ((noinline))
 void regstat_init_n_sets_and_refs (void)
{
  unsigned int i;
  unsigned int max_regno = max_reg_num ();

  for (i = 0; i  max_regno; i++)
{
  (regstat_n_sets_and_refs[i].sets = (df-def_regs[(i)]-n_refs));
  (regstat_n_sets_and_refs[i].refs =
   (df-use_regs[(i)]-n_refs) + REG_N_SETS (i));
}
}

int a_sets[100] =
  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
  40, 41, 42,
  43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
  62, 63, 64,
  65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
  84, 85, 86,
  87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99
};

int a_refs[100] =
  { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38,
  40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76,
  78, 80, 82,
  84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116,
  118, 120,
  122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
  152, 154, 156,
  158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186,
  188, 190, 192,
  194, 196, 198
};

int
main ()
{
  struct df_reg_info *b[100], *c[100];
  struct df_d df1;
  size_t s = sizeof (struct df_reg_info);
  struct regstat_n_sets_and_refs_t a[100];

  df = df1;
  regstat_n_sets_and_refs = a;
  int i;

  for (i = 0; i  100; i++)
{
  b[i] = (struct df_reg_info *) malloc (s);
  b[i]-n_refs = i;
  c[i] = (struct df_reg_info *) malloc (s);
  c[i]-n_refs = i;
}

  df1.def_regs = b;
  df1.use_regs = c;
  regstat_init_n_sets_and_refs ();

  for (i = 0; i  100; i++)
if ((a[i].sets != a_sets[i]) || (a[i].refs != a_refs[i]))
  abort ();

  for (i = 0; i  100; i++)
{
  free (b[i]);
  free (c[i]);
}

  return 0;
}

/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target 

[PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-25 Thread Revital Eres
Hello,

This patch extends the implementation to support instructions with
REG_INC notes.
It addresses the comments from the previous submission:
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01299.html.

btw, regarding your previous question about the usage of the address
register been auto inc, apparently it can be used as follows:

insn 1) def1 = MEM [ pre_dec (addr) ]
out edges: [1 -(T,2,1)- 1]  [1 -(T,2,0)- 2]
in edges: [1 -(T,2,1)- 1]  [2 -(T,2,1)- 1]  [2 -(A,0,1)-1]

insn 2) MEM [ addr + 3 ] = def1
out edges:  [2 -(T,2,1)- 1]  [2 -(A,0,1)-1]
in edges: [1 -(T,2,0)- 2]

Reg-moves were not created for the address when testing on ppc.

Tested and bootstrap with patch 1 on ppc64-redhat-linux
enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap
c on top of the set of patches that support do-loop pattern
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
the bootstrap failure on ARM with SMS flags.

OK for mainline?

Thanks,
Revital


* ddg.c (autoinc_var_is_used_p): New function.
(create_ddg_dep_from_intra_loop_link,
add_cross_iteration_register_deps): Call it.
* modulo-sched.c (sms_schedule): Handle instructions with REG_INC.
Index: ddg.c
===
--- ddg.c   (revision 179138)
+++ ddg.c   (working copy)
@@ -145,6 +145,23 @@ mem_access_insn_p (rtx insn)
   return rtx_mem_access_p (PATTERN (insn));
 }
 
+/* Return true if DEF_INSN contains address been auto-inc or auto-dec
+   which is used in USE_INSN.  Otherwise return false.  The result is
+   been used to decide whether to remove the edge between def_insn and
+   use_insn when -fmodulo-sched-allow-regmoves is set.  */
+static bool
+autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
+{
+  rtx note;
+
+  for (note = REG_NOTES (def_insn); note; note = XEXP (note, 1))
+if (REG_NOTE_KIND (note) == REG_INC
+reg_referenced_p (XEXP (note, 0), PATTERN (use_insn)))
+  return true;
+
+  return false;
+}
+
 /* Computes the dependence parameters (latency, distance etc.), creates
a ddg_edge and adds it to the given DDG.  */
 static void
@@ -173,10 +190,15 @@ create_ddg_dep_from_intra_loop_link (ddg
  compensate for that by generating reg-moves based on the life-range
  analysis.  The anti-deps that will be deleted are the ones which
  have true-deps edges in the opposite direction (in other words
- the kernel has only one def of the relevant register).  TODO:
- support the removal of all anti-deps edges, i.e. including those
+ the kernel has only one def of the relevant register).
+ If the address that is been auto-inc or auto-dec in DEST_NODE
+ is used in SRC_NODE then do not remove the edge to make sure
+ reg-moves will not be created for this address.  
+ TODO: support the removal of all anti-deps edges, i.e. including those
  whose register has multiple defs in the loop.  */
-  if (flag_modulo_sched_allow_regmoves  (t == ANTI_DEP  dt == REG_DEP))
+  if (flag_modulo_sched_allow_regmoves 
+   (t == ANTI_DEP  dt == REG_DEP)
+   !autoinc_var_is_used_p (dest_node-insn, src_node-insn))
 {
   rtx set;
 
@@ -302,10 +324,14 @@ add_cross_iteration_register_deps (ddg_p
  gcc_assert (first_def_node);
 
  /* Always create the edge if the use node is a branch in
-order to prevent the creation of reg-moves.  */
+order to prevent the creation of reg-moves.  
+If the address that is been auto-inc or auto-dec in LAST_DEF
+is used in USE_INSN then do not remove the edge to make sure
+reg-moves will not be created for that address.  */
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
   || !flag_modulo_sched_allow_regmoves
- || JUMP_P (use_node-insn))
+ || JUMP_P (use_node-insn)
+  || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);
 
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 179138)
+++ modulo-sched.c  (working copy)
@@ -1266,12 +1271,10 @@ sms_schedule (void)
continue;
   }
 
-  /* Don't handle BBs with calls or barriers or auto-increment insns 
-(to avoid creating invalid reg-moves for the auto-increment insns),
+  /* Don't handle BBs with calls or barriers
 or !single_set with the exception of instructions that include
 count_reg---these instructions are part of the control part
 that do-loop recognizes.
- ??? Should handle auto-increment insns.
  ??? Should handle insns defining subregs.  */
  for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))
   {
@@ -1282,7 +1285,6 @@ sms_schedule (void)
 || (NONDEBUG_INSN_P (insn) 

Re: [PATCH, SMS] Minor misc. fixes

2011-09-12 Thread Revital Eres
Hello,

 OK.
 While we're at it, an alternative would be to have
 remove_node_from_ps() assert its own (parameters and) return value.
 That is, replace if (c) return false by assert (!c) and have it
 return void if successful. There's not much you can do if it returns
 false. That would check its other invocation too.

OK, that's indeed seems reasonable.
The attached patch implements it.
Will commit it after re-testing completes if there is not objection.

Thanks,
Revital

Changelog:

   modulo-sched.c (remove_node_from_ps): Return void instead of bool.
(optimize_sc): Adjust call to remove_node_from_ps.
(sms_schedule): Add print info.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 178755)
+++ modulo-sched.c  (working copy)
@@ -211,7 +211,7 @@ static int get_sched_window (partial_sch
 static bool try_scheduling_node_in_cycle (partial_schedule_ptr, ddg_node_ptr,
  int, int, sbitmap, int *, sbitmap,
  sbitmap);
-static bool remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr);
+static void remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr);
 
 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)-aux.info)-asap)
 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)-aux.info)-time)
@@ -834,8 +834,7 @@ optimize_sc (partial_schedule_ptr ps, dd
if (next_ps_i-node-cuid == g-closing_branch-cuid)
  break;
 
-  gcc_assert (next_ps_i);
-  gcc_assert (remove_node_from_ps (ps, next_ps_i));
+  remove_node_from_ps (ps, next_ps_i);
   success =
try_scheduling_node_in_cycle (ps, g-closing_branch,
  g-closing_branch-cuid, c,
@@ -1485,8 +1484,8 @@ sms_schedule (void)
   if (dump_file)
 {
  fprintf (dump_file,
-  SMS succeeded %d %d (with ii, sc)\n, ps-ii,
-  stage_count);
+  %s:%d SMS succeeded %d %d (with ii, sc)\n,
+  insn_file (tail), insn_line (tail), ps-ii, stage_count);
  print_partial_schedule (ps, dump_file);
}
  
@@ -2719,22 +2718,18 @@ create_ps_insn (ddg_node_ptr node, int c
 }
 
 
-/* Removes the given PS_INSN from the partial schedule.  Returns false if the
-   node is not found in the partial schedule, else returns true.  */
-static bool
+/* Removes the given PS_INSN from the partial schedule.  */  
+static void 
 remove_node_from_ps (partial_schedule_ptr ps, ps_insn_ptr ps_i)
 {
   int row;
 
-  if (!ps || !ps_i)
-return false;
-
+  gcc_assert (ps  ps_i);
+  
   row = SMODULO (ps_i-cycle, ps-ii);
   if (! ps_i-prev_in_row)
 {
-  if (ps_i != ps-rows[row])
-   return false;
-
+  gcc_assert (ps_i == ps-rows[row]);
   ps-rows[row] = ps_i-next_in_row;
   if (ps-rows[row])
ps-rows[row]-prev_in_row = NULL;
@@ -2748,7 +2743,7 @@ remove_node_from_ps (partial_schedule_pt

   ps-rows_length[row] -= 1; 
   free (ps_i);
-  return true;
+  return;
 }
 
 /* Unlike what literature describes for modulo scheduling (which focuses


[PATCH, SMS] Minor misc. fixes

2011-09-08 Thread Revital Eres
Hello,

The attached patch contains minor fixes.

Currently testing and bootstrap on ppc64-redhat-linux enabling SMS on
loops with SC 1.

OK for mainline once testing completes?

Thanks,
Revital


Changelog

* modulo-sched.c (optimize_sc): Call remove_node_from_ps outside
of gcc_assert.
(sms_schedule): Add print info.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 178632)
+++ modulo-sched.c  (working copy)
@@ -773,7 +773,7 @@ optimize_sc (partial_schedule_ptr ps, dd
   if (get_sched_window (ps, g-closing_branch, sched_nodes, ii, start,
step, end) == 0)
 {
-  bool success;
+  bool success, remove_branch_p;
   ps_insn_ptr next_ps_i;
   int branch_cycle = SCHED_TIME (g-closing_branch);
   int row = SMODULO (branch_cycle, ps-ii);
@@ -835,7 +835,8 @@ optimize_sc (partial_schedule_ptr ps, dd
  break;
 
   gcc_assert (next_ps_i);
-  gcc_assert (remove_node_from_ps (ps, next_ps_i));
+  remove_branch_p = remove_node_from_ps (ps, next_ps_i);
+  gcc_assert (remove_branch_p);
   success =
try_scheduling_node_in_cycle (ps, g-closing_branch,
  g-closing_branch-cuid, c,
@@ -1485,8 +1486,8 @@ sms_schedule (void)
   if (dump_file)
 {
  fprintf (dump_file,
-  SMS succeeded %d %d (with ii, sc)\n, ps-ii,
-  stage_count);
+  %s:%d SMS succeeded %d %d (with ii, sc)\n,
+  insn_file (tail), insn_line (tail), ps-ii, stage_count);
  print_partial_schedule (ps, dump_file);
}
  


Re: [PATCH, SMS] Support instructions with REG_INC_NOTE (re-submisson)

2011-08-17 Thread Revital Eres
Hello,

On 16 August 2011 03:32, Ayal Zaks ayal.z...@gmail.com wrote:
 Ok, so this extends the infrastructure to support insns which set an
 arbitrary number of registers, but currently specifically handles only
 REG_INC situations (which set two registers). I'm not against
 {0,1,infinity}, but wonder if this case really deserves the
 complexity: post/pre-inc/decrementing load insns may need regmoves for
 the register loaded, due to the latency of the load and desire to
 schedule associated uses farther than ii cycles away (as do regular
 loads), but do they also need regmoves for the address register being
 post/pre-inc/decremented? Its latency should not be long, and it's
 often feeding only itself so regmoves are not needed/won't help. If
 not, perhaps a simpler solution is to allow REG_INC insns but disallow
 their address register from being regmove'd, dedicating the single
 regmove info for the value loaded.

 Are there actually cases where you need the address register to regmove?

Bootstrap on PowerPC did not reveal such cases so I'll try to
implement a simpler solution as you suggested.

Thanks,
Revital


Patches ping

2011-07-20 Thread Revital Eres
Hello,

[PATCH, SMS 3/4] Optimize stage count
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

[PATCH, SMS 4/4] Misc. fixes
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html

[PATCH, SMS] Fix calculation of issue_rate
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

Thanks,
Revital


Re: [PATCH, SMS] Fix violation of memory dependence

2011-06-15 Thread Revital Eres
Hello,

 better do
   else if (!mem_read_insn_p (to-insn))

 +       create_ddg_dep_no_link (g, from, to, ANTI_DEP, MEM_DEP, 0);
 +    }

Done. Committed to -r175090.

Thanks,
Revital


[PATCH, SMS] Fix violation of memory dependence

2011-06-13 Thread Revital Eres
Hello,

The attached patch fixes violation of memory dependencies. The
problematic scenario happens when -fmodulo-sched-allow-regmoves flag
is set and certain anti-dep edges are not created.

For example, consider the following three instructions and the edges
between them.  When -fmodulo-sched-allow-regmoves is set the edge (63 -
Anti, 0 - 64) is not created. (probably due to transitivity)

Insn 63)  r168 = MEM[176]
Out edges: (63 - Anti, 0 - 64)
In edges: (64 - True, 1 - 63), (68 - True, 1 - 63)

insn 64)  176 = 176 + 4
Out edges: (64 - True, 1 - 63), (64 - True, 0- 68)
In edges: (63 - Anti, 0 - 64)

insn 68)  MEM[176 – 4] =  193
Out edges: (68 - True, 1 - 63)
In edges: (64 - True, 0- 68)

This anti-dep edge is on the path from one memory instruction to another
--- from 63 to 68; such that removing the edge caused a violation of
the memory dependencies as insn 63 was scheduled after insn 68.

This patch adds intra edges between every two memory instructions in
this case.  It fixes recent bootstrap failure on ARM. (with SMS flags)

The patch was tested as follows:
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1
and currently regression testing on c,c++.

OK for mainline once regtest on arm-linux-gnueabi completes?

Thanks,
Revital

Changelog:

gcc/
* ddg.c (add_intra_loop_mem_dep): New function.
(build_intra_loop_deps): Call it.

testsuite/
* gcc.dg/sms-9.c: New file.
Index: ddg.c
===
--- ddg.c   (revision 174906)
+++ ddg.c   (working copy)
@@ -390,6 +390,36 @@ insns_may_alias_p (rtx insn1, rtx insn2)
 PATTERN (insn2));
 }
 
+/* Given two nodes, analyze their RTL insns and add intra-loop mem deps
+   to ddg G.  */
+static void
+add_intra_loop_mem_dep (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to)
+{
+
+  if (!insns_may_alias_p (from-insn, to-insn))
+/* Do not create edge if memory references have disjoint alias sets.  */
+return;
+
+  if (mem_write_insn_p (from-insn))
+{
+  if (mem_read_insn_p (to-insn))
+   create_ddg_dep_no_link (g, from, to,
+   DEBUG_INSN_P (to-insn)
+   ? ANTI_DEP : TRUE_DEP, MEM_DEP, 0);
+  else if (from-cuid != to-cuid)
+   create_ddg_dep_no_link (g, from, to,
+   DEBUG_INSN_P (to-insn)
+   ? ANTI_DEP : OUTPUT_DEP, MEM_DEP, 0);
+}
+  else
+{
+  if (mem_read_insn_p (to-insn))
+   return;
+  else if (from-cuid != to-cuid)
+   create_ddg_dep_no_link (g, from, to, ANTI_DEP, MEM_DEP, 0);
+}
+}
+
 /* Given two nodes, analyze their RTL insns and add inter-loop mem deps
to ddg G.  */
 static void
@@ -477,10 +507,22 @@ build_intra_loop_deps (ddg_ptr g)
  if (DEBUG_INSN_P (j_node-insn))
continue;
  if (mem_access_insn_p (j_node-insn))
-   /* Don't bother calculating inter-loop dep if an intra-loop dep
-  already exists.  */
+   {
+ /* Don't bother calculating inter-loop dep if an intra-loop 
dep
+already exists.  */
  if (! TEST_BIT (dest_node-successors, j))
add_inter_loop_mem_dep (g, dest_node, j_node);
+ /* If -fmodulo-sched-allow-regmoves
+is set certain anti-dep edges are not created.
+It might be that these anti-dep edges are on the
+path from one memory instruction to another such that
+removing these edges could cause a violation of the
+memory dependencies.  Thus we add intra edges between
+every two memory instructions in this case.  */
+ if (flag_modulo_sched_allow_regmoves
+  !TEST_BIT (dest_node-predecessors, j))
+   add_intra_loop_mem_dep (g, j_node, dest_node);
+   }
 }
 }
 }
Index: testsuite/gcc.dg/sms-9.c
===
--- testsuite/gcc.dg/sms-9.c(revision 0)
+++ testsuite/gcc.dg/sms-9.c(revision 0)
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options -O2 -fmodulo-sched -fno-auto-inc-dec -O2 
-fmodulo-sched-allow-regmoves } */
+
+#include stdlib.h
+#include stdarg.h
+
+struct df_ref_info
+{
+  unsigned int *begin;
+  unsigned int *count;
+};
+
+extern void *memset (void *s, int c, __SIZE_TYPE__ n);
+
+
+__attribute__ ((noinline))
+int
+df_reorganize_refs_by_reg_by_insn (struct df_ref_info *ref_info,
+  int num, unsigned int start)
+{
+  unsigned int m = num;
+  unsigned int offset = 77;
+  unsigned int r;
+
+  for (r = start; r  m; r++)
+   

Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count

2011-05-30 Thread Revital Eres
Hello,

 Please add the following:
 o A clarification that rows_length is used only (as an optimization) to
 back off quickly from trying to schedule a node in a full row; that is, to
 avoid running through futile DFA state transitions.
 o An assert that ps-rows_length[i] equals the number of nodes in ps-rows
 [i] (e.g., in verify_partial_schedule(); and then recheck...).

OK, I'm now testing a patch with these additions.

Thanks,
Revital


Re: [PATCH, SMS 2/4] Move the creation of anti-dep edge

2011-05-30 Thread Revital Eres
Hello,


 OK, this makes sense. Just to re-confirm, the exact same edges are created
 in both cases, right?

No, sorry for not been more clear on this. The previous implementation
will actually create redundant anti-dep edges.  This happens when
the definition that is used in jump_insn is defined more than once
in bb.  The previous implementation created an edge between jump_insn
and it's def in this case.  This edge is redundant as anti-dep edge
between jump_insn to the first definition in the bb is also created (in
add_cross_iteration_register) and will prevent the creation of reg-moves
if -fmodulo-sched-allow-regmoves is set, which is why we created this
edge in the first place.  Here is the full explanation:

In add_cross_iteration_register an inter iteration anti-dep edges
are created between instructions that have intra true dep edge
in the opposite direction (def insn - True dep - use insn).
When -fmodulo-sched-allow-regmoves is set certain inter anti dep
edge are not been created.  These edges are avoided when there is
only one definition in bb for the register defined in the def insn.
The previous implementation added anti-dep edge between jump_insn and
def insn even when there is more than one definition to def insn in bb
although add_cross_iteration_register does not abort the creation of
anti-dep edges in this case (this edge will be created between jump_insn
to the first_def insn). The new patch implements a different approach
-- instead of creating additional anti-deps edge it will not abort the
creation of anti-dep edges when use insn is jump_insn.

Thanks,
Revital


Patches ping

2011-05-26 Thread Revital Eres
Hello,

[PATCH, SMS 1/4] Fix calculation of row_rest_count
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01339.html

[PATCH, SMS 2/4] Move the creation of anti-dep edge
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01340.html

[PATCH, SMS 3/4] Optimize stage count
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

[PATCH, SMS 4/4] Misc. fixes
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html

[PATCH, SMS] Fix calculation of issue_rate
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

Thanks,
Revital


[PATCH, SMS] Fix calculation of issue_rate

2011-05-19 Thread Revital Eres
Hello,

The issue rate is currently been set in SMS by calling
targetm.sched.issue_rate  function if it is defined.  For rs6000 the
issue_rate is 1 if !reload_completed  !flag_sched_pressure.
To bypass that, SMS sets reload_completed to 1 before calling
targetm.sched.issue_rate and restores it's original value after the call.
The problem is that the issue rate is changed again to 1 because of the
following chain of calls which occurs right after setting issue_rate
in targetm.sched.issue_rate  ():

sms_schedule - haifa_sched_init - sched_init () -
targetm.sched.issue_rate () (in haifa-sched.c:3474)

This time, when calling targetm.sched.issue_rate the issue_rate is
set to 1 as reload_completed  contains it's original value (zero).
The attached patch tries to fix that.

Tested (bootstrap and regtest) on ppc64-redhat-linux.

OK for mainline?

Thanks,
Revital


Changelog:

* modulo-sched.c (sms_schedule): Fix stage_count calculation.



Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173786)
+++ modulo-sched.c  (working copy)
@@ -924,6 +924,7 @@ sms_schedule (void)
   basic_block condition_bb = NULL;
   edge latch_edge;
   gcov_type trip_count = 0;
+  int temp;

   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
   | LOOPS_HAVE_RECORDED_EXITS);
@@ -933,22 +934,19 @@ sms_schedule (void)
   return;  /* There are no loops to schedule.  */
 }

+  temp = reload_completed;
+  reload_completed = 1;
   /* Initialize issue_rate.  */
   if (targetm.sched.issue_rate)
-{
-  int temp = reload_completed;
-
-  reload_completed = 1;
-  issue_rate = targetm.sched.issue_rate ();
-  reload_completed = temp;
-}
+issue_rate = targetm.sched.issue_rate ();
   else
 issue_rate = 1;
-
+
   /* Initialize the scheduler.  */
   setup_sched_infos ();
   haifa_sched_init ();
-
+  reload_completed = temp;
+
   /* Allocate memory to hold the DDG array one entry for each loop.
  We use loop-num as index into this array.  */
   g_arr = XCNEWVEC (ddg_ptr, number_of_loops ());


Re: [PATCH, SMS] Fix calculation of issue_rate

2011-05-19 Thread Revital Eres
Hello,


 * modulo-sched.c (sms_schedule): Fix stage_count calculation.


Sorry, just noticed that the changelog entry is wrong, here
a correction:

* modulo-sched.c (sms_schedule): Fix issue_rate calculation.

Thanks,
Revital


[PATCH, SMS 1/4] Fix calculation of row_rest_count

2011-05-18 Thread Revital Eres
Hello,

The calculation of the number of instructions in a row is currently
done by updating row_rest_count field in struct ps_insn on the fly
while creating a new instruction.  It is used to make sure we do not
exceed
the issue_rate.
This calculation assumes the instruction is inserted in the beginning of a
row thus does not take into account the cases where it must follow other
instructions.  Also, it's not been property updated when an instruction
is removed.
To avoid the overhead of maintaining this row_rest_count count in every
instruction in each row as is currently done; this patch maintains one
count per row which holds the number of instructions in the row.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

* modulo-sched.c (struct ps_insn): Remove row_rest_count field.
(struct partial_schedule): Add rows_length field.
(ps_insert_empty_row): Handle rows_length.
(create_partial_schedule): Likewise.
(free_partial_schedule): Likewise.
(reset_partial_schedule): Likewise.
(create_ps_insn): Remove rest_count argument.
(remove_node_from_ps): Update rows_length.
(add_node_to_ps): Update rows_length and call create_ps_insn
without passing row_rest_count.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173814)
+++ modulo-sched.c  (working copy)
@@ -134,8 +135,6 @@ struct ps_insn
   ps_insn_ptr next_in_row,
  prev_in_row;
 
-  /* The number of nodes in the same row that come after this node.  */
-  int row_rest_count;
 };
 
 /* Holds the partial schedule as an array of II rows.  Each entry of the
@@ -149,6 +148,9 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0=iii).  */
   ps_insn_ptr *rows;
 
+  /*  rows_length[i] holds the number of instructions in the row.  */
+  int *rows_length;
+
   /* The earliest absolute cycle of an insn in the partial schedule.  */
   int min_cycle;
 
@@ -1908,6 +2140,7 @@ ps_insert_empty_row (partial_schedule_pt
   int ii = ps-ii;
   int new_ii = ii + 1;
   int row;
+  int *rows_length_new;
 
   verify_partial_schedule (ps, sched_nodes);
 
@@ -1922,6 +2155,7 @@ ps_insert_empty_row (partial_schedule_pt
   rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 
   rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
+  rows_length_new = (int *) xcalloc (new_ii, sizeof (int));
   for (row = 0; row  split_row; row++)
 {
   rows_new[row] = ps-rows[row];
@@ -1966,6 +2200,8 @@ ps_insert_empty_row (partial_schedule_pt
 + (SMODULO (ps-max_cycle, ii) = split_row ? 1 : 0);
   free (ps-rows);
   ps-rows = rows_new;
+  free (ps-rows_length);
+  ps-rows_length = rows_length_new;
   ps-ii = new_ii;
   gcc_assert (ps-min_cycle = 0);
 
@@ -2456,6 +2692,7 @@ create_partial_schedule (int ii, ddg_ptr
 {
   partial_schedule_ptr ps = XNEW (struct partial_schedule);
   ps-rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
+  ps-rows_length = (int *) xcalloc (ii, sizeof (int));
   ps-ii = ii;
   ps-history = history;
   ps-min_cycle = INT_MAX;
@@ -2494,6 +2731,7 @@ free_partial_schedule (partial_schedule_
 return;
   free_ps_insns (ps);
   free (ps-rows);
+  free (ps-rows_length);
   free (ps);
 }
 
@@ -2511,6 +2749,8 @@ reset_partial_schedule (partial_schedule
   ps-rows = (ps_insn_ptr *) xrealloc (ps-rows, new_ii
 * sizeof (ps_insn_ptr));
   memset (ps-rows, 0, new_ii * sizeof (ps_insn_ptr));
+  ps-rows_length = (int *) xrealloc (ps-rows_length, new_ii * sizeof (int));
+  memset (ps-rows_length, 0, new_ii * sizeof (int));
   ps-ii = new_ii;
   ps-min_cycle = INT_MAX;
   ps-max_cycle = INT_MIN;
@@ -2539,14 +2784,13 @@ print_partial_schedule (partial_schedule
 
 /* Creates an object of PS_INSN and initializes it to the given parameters.  */
 static ps_insn_ptr
-create_ps_insn (ddg_node_ptr node, int rest_count, int cycle)
+create_ps_insn (ddg_node_ptr node, int cycle)
 {
   ps_insn_ptr ps_i = XNEW (struct ps_insn);
 
   ps_i-node = node;
   ps_i-next_in_row = NULL;
   ps_i-prev_in_row = NULL;
-  ps_i-row_rest_count = rest_count;
   ps_i-cycle = cycle;
 
   return ps_i;
@@ -2579,6 +2823,8 @@ remove_node_from_ps (partial_schedule_pt
   if (ps_i-next_in_row)
ps_i-next_in_row-prev_in_row = ps_i-prev_in_row;
 }
+   
+  ps-rows_length[row] -= 1; 
   free (ps_i);
   return true;
 }
@@ -2735,17 +2981,12 @@ add_node_to_ps (partial_schedule_ptr ps,
sbitmap must_precede, sbitmap must_follow)
 {
   ps_insn_ptr ps_i;
-  int rest_count = 1;
   int row = SMODULO (cycle, ps-ii);
 
-  

[PATCH, SMS 2/4] Move the creation of anti-dep edge

2011-05-18 Thread Revital Eres
Hello,

The attached patch moves the creation of anti-dep edge from a
branch to it's def from create_ddg_dep_from_intra_loop_link () to
add_cross_iteration_register_deps  () due to the fact the edge is with
distance 1 and thus should be in the later function.
The edge was added to avoid creating reg-moves.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

* ddg.c (create_ddg_dep_from_intra_loop_link): Remove the creation
of anti-dep edge from a branch.
(add_cross_iteration_register_deps): Create anti-dep edge from
a branch.


Index: ddg.c
===
--- ddg.c   (revision 173785)
+++ ddg.c   (working copy)
@@ -197,11 +197,6 @@ create_ddg_dep_from_intra_loop_link (ddg
 }
 }

-  /* If a true dep edge enters the branch create an anti edge in the
- opposite direction to prevent the creation of reg-moves.  */
-  if ((DEP_TYPE (link) == REG_DEP_TRUE)  JUMP_P (dest_node-insn))
-create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1);
-
latency = dep_cost (link);
e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
add_edge_to_ddg (g, e);
@@ -306,8 +301,11 @@ add_cross_iteration_register_deps (ddg_p

  gcc_assert (first_def_node);

+ /* Always create the edge if the use node is a branch in
+order to prevent the creation of reg-moves.  */
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
-  || !flag_modulo_sched_allow_regmoves)
+  || !flag_modulo_sched_allow_regmoves
+ || (flag_modulo_sched_allow_regmoves  JUMP_P (use_node-insn)))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);


[PATCH, SMS 3/4] Optimize stage count

2011-05-18 Thread Revital Eres
Hello,

The attach patch tries to achieve optimised SC by normalizing the partial
schedule (having the cycles start from cycle zero). The branch location
must be placed in row ii-1 in the final scheduling.  If that's not the
case after the normalization then it tries to move the branch to that
row if possible, while preserving the scheduling of the rest of the
instructions.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

   * modulo-sched.c (calculate_stage_count,
calculate_must_precede_follow, get_sched_window,
try_scheduling_node_in_cycle, remove_node_from_ps): Add
declaration.
(update_node_sched_params, set_must_precede_follow, optimize_sc):
New functions.
(reset_sched_times): Call update_node_sched_params.
(sms_schedule): Call optimize_sc.
(get_sched_window): Change function arguments.
(sms_schedule_by_order): Update call to get_sched_window.
all set_must_precede_follow.
(calculate_stage_count): Add function argument.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173786)
+++ modulo-sched.c  (working copy)
@@ -198,7 +198,16 @@ static void generate_prolog_epilog (part
 rtx, rtx);
 static void duplicate_insns_of_cycles (partial_schedule_ptr,
   int, int, int, rtx);
-static int calculate_stage_count (partial_schedule_ptr ps);
+static int calculate_stage_count (partial_schedule_ptr, int);
+static void calculate_must_precede_follow (ddg_node_ptr, int, int,
+  int, int, sbitmap, sbitmap, sbitmap);
+static int get_sched_window (partial_schedule_ptr, ddg_node_ptr, 
+sbitmap, int, int *, int *, int *);
+static bool try_scheduling_node_in_cycle (partial_schedule_ptr, ddg_node_ptr,
+ int, int, sbitmap, int *, sbitmap,
+ sbitmap);
+static bool remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr);
+
 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)-aux.info)-asap)
 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)-aux.info)-time)
 #define SCHED_FIRST_REG_MOVE(x) \
@@ -572,6 +581,33 @@ free_undo_replace_buff (struct undo_repl
 }
 }
 
+/* Update the sched_params for node U using the II,
+   the CYCLE of U and MIN_CYCLE.  */
+static void
+update_node_sched_params (ddg_node_ptr u, int ii, int cycle, int min_cycle)
+{
+  int sc_until_cycle_zero;
+  int stage;
+
+  SCHED_TIME (u) = cycle;
+  SCHED_ROW (u) = SMODULO (cycle, ii);
+
+  /* The calculation of stage count is done adding the number
+ of stages before cycle zero and after cycle zero.  */
+  sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii);
+
+  if (SCHED_TIME (u)  0)
+{
+  stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
+  SCHED_STAGE (u) = sc_until_cycle_zero - stage;
+}
+  else
+{
+  stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
+  SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
+}
+}
+
 /* Bump the SCHED_TIMEs of all nodes by AMOUNT.  Set the values of
SCHED_ROW and SCHED_STAGE.  Instruction scheduled on cycle AMOUNT
will move to cycle zero.  */
@@ -588,7 +624,6 @@ reset_sched_times (partial_schedule_ptr 
ddg_node_ptr u = crr_insn-node;
int normalized_time = SCHED_TIME (u) - amount;
int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
-int sc_until_cycle_zero, stage;
 
 if (dump_file)
   {
@@ -604,23 +639,9 @@ reset_sched_times (partial_schedule_ptr 

gcc_assert (SCHED_TIME (u) = ps-min_cycle);
gcc_assert (SCHED_TIME (u) = ps-max_cycle);
-   SCHED_TIME (u) = normalized_time;
-   SCHED_ROW (u) = SMODULO (normalized_time, ii);
-  
-/* The calculation of stage count is done adding the number
-   of stages before cycle zero and after cycle zero.  */
-   sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii);
-   
-   if (SCHED_TIME (u)  0)
- {
-   stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
-   SCHED_STAGE (u) = sc_until_cycle_zero - stage;
- }
-   else
- {
-   stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
-   SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
- }
+
+   crr_insn-cycle = normalized_time;
+   update_node_sched_params (u, ii, normalized_time, new_min_cycle);
   }
 }
  
@@ -657,6 +678,206 @@ permute_partial_schedule (partial_schedu

[PATCH, SMS 4/4] Misc. fixes

2011-05-18 Thread Revital Eres
Hello,

The attached patch contains misc. fixes and changes.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital


Changelog:

* modulo-sched.c: Change comment.
(reset_sched_times): Fix print message.
(print_partial_schedule): Add print info.


Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173786)
+++ modulo-sched.c  (working copy)
@@ -84,13 +84,14 @@ along with GCC; see the file COPYING3.
   II cycles (i.e. use register copies to prevent a def from overwriting
   itself before reaching the use).

-SMS works with countable loops whose loop count can be easily
-adjusted.  This is because we peel a constant number of iterations
-into a prologue and epilogue for which we want to avoid emitting
-the control part, and a kernel which is to iterate that constant
-number of iterations less than the original loop.  So the control
-part should be a set of insns clearly identified and having its
-own iv, not otherwise used in the loop (at-least for now), which
+SMS works with countable loops (1) whose control part can be easily
+decoupled from the rest of the loop and (2) whose loop count can
+be easily adjusted.  This is because we peel a constant number of
+iterations into a prologue and epilogue for which we want to avoid
+emitting the control part, and a kernel which is to iterate that
+constant number of iterations less than the original loop.  So the
+control part should be a set of insns clearly identified and having
+its own iv, not otherwise used in the loop (at-least for now), which
 initializes a register before the loop to the number of iterations.
 Currently SMS relies on the do-loop pattern to recognize such loops,
 where (1) the control part comprises of all insns defining and/or
@@ -595,8 +596,8 @@ reset_sched_times (partial_schedule_ptr
 /* Print the scheduling times after the rotation.  */
 fprintf (dump_file, crr_insn-node=%d (insn id %d), 
  crr_insn-cycle=%d, min_cycle=%d, crr_insn-node-cuid,
- INSN_UID (crr_insn-node-insn), SCHED_TIME (u),
- normalized_time);
+ INSN_UID (crr_insn-node-insn), normalized_time,
+ new_min_cycle);
 if (JUMP_P (crr_insn-node-insn))
   fprintf (dump_file,  (branch));
 fprintf (dump_file, \n);
@@ -2530,8 +2531,13 @@ print_partial_schedule (partial_schedule
   fprintf (dump, \n[ROW %d ]: , i);
   while (ps_i)
{
- fprintf (dump, %d, ,
-  INSN_UID (ps_i-node-insn));
+ if (JUMP_P (ps_i-node-insn))
+   fprintf (dump, %d (branch), ,
+INSN_UID (ps_i-node-insn));
+ else
+   fprintf (dump, %d, ,
+INSN_UID (ps_i-node-insn));
+   
  ps_i = ps_i-next_in_row;
}
 }


Re: [PR testsuite/47013] Fix SMS testsuite faliures

2011-05-16 Thread Revital Eres
Hello,

Thanks for testing the patch.

 FAIL: gcc.dg/sms-8.c scan-rtl-dump-times sms SMS loop with subreg in lhs 1

Does the attached patch resolve the failure with sms-8.c?
If so I'll re-submit it.

Thanks,
Revital
Index: testsuite/gcc.dg/sms-2.c
===
--- testsuite/gcc.dg/sms-2.c(revision 173659)
+++ testsuite/gcc.dg/sms-2.c(working copy)
@@ -4,12 +4,11 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
 
-
+int th, h, em, nlwm, nlwS, nlw, sy;
 void
 fun (nb)
  int nb;
 {
-  int th, h, em, nlwm, nlwS, nlw, sy;
 
   while (nb--)
 while (h--)
@@ -33,5 +32,5 @@ fun (nb)
   }
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS loop many exits 1 sms { target 
spu-*-* powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
Index: testsuite/gcc.dg/sms-6.c
===
--- testsuite/gcc.dg/sms-6.c(revision 173659)
+++ testsuite/gcc.dg/sms-6.c(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  --param sms-min-sc=1  } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  -mno-update --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves  { target powerpc*-*-*} } */
+
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-3.c
===
--- testsuite/gcc.dg/sms-3.c(revision 173659)
+++ testsuite/gcc.dg/sms-3.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-7.c
===
--- testsuite/gcc.dg/sms-7.c(revision 173659)
+++ testsuite/gcc.dg/sms-7.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms } */
+/* { dg-options -O3 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms 
-fmodulo-sched-allow-regmoves --param sms-min-sc=1 } */
+/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms --param 
sms-min-sc=1 -mno-update -fmodulo-sched-allow-regmoves { target powerpc*-*-*} 
} */
 
 extern void abort (void);
 
@@ -44,7 +45,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms  { target spu-*-* 
} } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 2 sms  { target spu-*-* 
} } } */
 /* { dg-final { scan-rtl-dump-times SMS succeeded 3  sms { target 
powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
Index: testsuite/gcc.dg/sms-4.c
===
--- testsuite/gcc.dg/sms-4.c(revision 173659)
+++ testsuite/gcc.dg/sms-4.c(working copy)
@@ -1,6 +1,7 @@
 /* Inspired from sbitmap_a_or_b_and_c_cg function in sbitmap.c.  */
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms   } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 -mno-update { target powerpc*-*-*} } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-8.c
===
--- testsuite/gcc.dg/sms-8.c(revision 173659)
+++ testsuite/gcc.dg/sms-8.c(working copy)
@@ -3,7 +3,8 @@
 that was not fixed by reg-moves.  */
 
  /* { dg-do run } */
- /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+ /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 } */
+ /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms { target powerpc*-*-*} } */
 
 extern void abort (void);
 
@@ -35,7 +36,7 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 0 sms { target 
powerpc-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
 
Index: testsuite/gcc.dg/sms-5.c
===
--- testsuite/gcc.dg/sms-5.c(revision 173659)
+++ testsuite/gcc.dg/sms-5.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-funroll-loops -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-funroll-loops -fdump-rtl-sms --param sms-min-sc=1 } */
+ /* { dg-options -O2 -fmodulo-sched 

[PR testsuite/47013] Fix SMS testsuite faliures (re-submission)

2011-05-16 Thread Revital Eres
Hello,

Attached is a new version of the patch.

Thanks to Dominique Dhumieres for testing on
powerpc-apple-darwin9 and x86_64-apple-darwin10.
Tested ppc64-redhat-linux on both with -m32,-m64 and SPU.

OK for mainline?

Thanks,
Revital

testsuite/Changelog

PR rtl-optimization/47013
* gcc.dg/sms-2.c: Change scan-tree-dump-times and the code itself
to preserve the function.
* gcc.dg/sms-6.c: Add --param sms-min-sc=1
 -fmodulo-sched-allow-regmoves,
 and -mno-update for PowerPC.
* gcc.dg/sms-3.c: Add --param sms-min-sc=1 and
-fmodulo-sched-allow-regmoves flags.
* gcc.dg/sms-7.c: Add -fmodulo-sched-allow-regmoves and
--param sms-min-sc=1 flags and -mno-update for
PowerPC. Increase the SMS succeeded loops for the SPU.
* gcc.dg/sms-4.c: Add --param sms-min-sc=1 -mno-update
for PowerPC.
* gcc.dg/sms-8.c: Add --param sms-min-sc=1.
Change scan-rtl-dump-times for PowerPC.
* gcc.dg/sms-5.c: Add --param sms-min-sc=1 flag and
-mno-update for PowerPC.
Index: testsuite/gcc.dg/sms-2.c
===
--- testsuite/gcc.dg/sms-2.c(revision 173659)
+++ testsuite/gcc.dg/sms-2.c(working copy)
@@ -4,12 +4,11 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
 
-
+int th, h, em, nlwm, nlwS, nlw, sy;
 void
 fun (nb)
  int nb;
 {
-  int th, h, em, nlwm, nlwS, nlw, sy;
 
   while (nb--)
 while (h--)
@@ -33,5 +32,5 @@ fun (nb)
   }
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS loop many exits 1 sms { target 
spu-*-* powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
Index: testsuite/gcc.dg/sms-6.c
===
--- testsuite/gcc.dg/sms-6.c(revision 173659)
+++ testsuite/gcc.dg/sms-6.c(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  --param sms-min-sc=1  } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  -mno-update --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves  { target powerpc*-*-*} } */
+
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-3.c
===
--- testsuite/gcc.dg/sms-3.c(revision 173659)
+++ testsuite/gcc.dg/sms-3.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-7.c
===
--- testsuite/gcc.dg/sms-7.c(revision 173659)
+++ testsuite/gcc.dg/sms-7.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms } */
+/* { dg-options -O3 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms 
-fmodulo-sched-allow-regmoves --param sms-min-sc=1 } */
+/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms --param 
sms-min-sc=1 -mno-update -fmodulo-sched-allow-regmoves { target powerpc*-*-*} 
} */
 
 extern void abort (void);
 
@@ -44,7 +45,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms  { target spu-*-* 
} } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 2 sms  { target spu-*-* 
} } } */
 /* { dg-final { scan-rtl-dump-times SMS succeeded 3  sms { target 
powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
Index: testsuite/gcc.dg/sms-4.c
===
--- testsuite/gcc.dg/sms-4.c(revision 173659)
+++ testsuite/gcc.dg/sms-4.c(working copy)
@@ -1,6 +1,7 @@
 /* Inspired from sbitmap_a_or_b_and_c_cg function in sbitmap.c.  */
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms   } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 -mno-update { target powerpc*-*-*} } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-8.c
===
--- testsuite/gcc.dg/sms-8.c(revision 173659)
+++ testsuite/gcc.dg/sms-8.c(working copy)
@@ -3,7 +3,8 @@
 that was not fixed by reg-moves.  */
 
  /* { dg-do run } */
- /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+ /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 } */
+ /* { dg-options -O2 -fmodulo-sched 

Re: [PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop

2011-05-12 Thread Revital Eres
Hello,

 +    if (reg_mentioned_p (reg, insn)  !DEBUG_INSN_P (insn))

 It probably makes sense to test for !DEBUG_INSN_P first, since it's much
 cheaper.

Thanks, will commit the following to fix that:

* modulo-sched.c (doloop_register_get): Check !DEBUG_INSN_P
 first.

Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173693)
+++ modulo-sched.c  (working copy)
@@ -316,7 +316,7 @@ doloop_register_get (rtx head ATTRIBUTE_
  : prev_nondebug_insn (tail));

   for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))
-if (reg_mentioned_p (reg, insn)  !DEBUG_INSN_P (insn))
+if (!DEBUG_INSN_P (insn)  reg_mentioned_p (reg, insn))
   {
 if (dump_file)
 {

Revital


[PR testsuite/47013] Fix SMS testsuite faliures

2011-05-12 Thread Revital Eres
Hello,

The attached patch fixes SMS testsuite failures seen on PowerPC and SPU.

Tested ppc64-redhat-linux on both with -m32,-m64 and SPU.

OK for mainline?

Thanks,
Revital

testsuite/Changelog

PR rtl-optimization/47013
* gcc.dg/sms-2.c: Change scan-tree-dump-times and the code itself
to preserve the function.
* gcc.dg/sms-6.c: Add --param sms-min-sc=1
 -fmodulo-sched-allow-regmoves,
 and -mno-update for PowerPC.
* gcc.dg/sms-3.c: Add --param sms-min-sc=1 and
-fmodulo-sched-allow-regmoves flags.
* gcc.dg/sms-7.c: Add -fmodulo-sched-allow-regmoves and
--param sms-min-sc=1 flags and -mno-update for
PowerPC. Increase the SMS succeeded loops for the SPU.
* gcc.dg/sms-4.c: Add --param sms-min-sc=1 -mno-update
for PowerPC.
* gcc.dg/sms-8.c: Add --param sms-min-sc=1 and -m32 for
PowerPC. Change scan-rtl-dump-times for PowerPC.
* gcc.dg/sms-5.c: Add --param sms-min-sc=1 flag and
-mno-update for PowerPC.
Index: testsuite/gcc.dg/sms-2.c
===
--- testsuite/gcc.dg/sms-2.c(revision 173693)
+++ testsuite/gcc.dg/sms-2.c(working copy)
@@ -4,12 +4,11 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
 
-
+int th, h, em, nlwm, nlwS, nlw, sy;
 void
 fun (nb)
  int nb;
 {
-  int th, h, em, nlwm, nlwS, nlw, sy;
 
   while (nb--)
 while (h--)
@@ -33,5 +32,5 @@ fun (nb)
   }
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms { target spu-*-* 
powerpc*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times SMS loop many exits 1 sms { target 
spu-*-* powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
Index: testsuite/gcc.dg/sms-6.c
===
--- testsuite/gcc.dg/sms-6.c(revision 173693)
+++ testsuite/gcc.dg/sms-6.c(working copy)
@@ -1,5 +1,7 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  --param sms-min-sc=1  } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms  -mno-update --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves  { target powerpc*-*-*} } */
+
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-3.c
===
--- testsuite/gcc.dg/sms-3.c(revision 173693)
+++ testsuite/gcc.dg/sms-3.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -funroll-loops -fdump-rtl-sms --param 
sms-min-sc=1 -fmodulo-sched-allow-regmoves } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-7.c
===
--- testsuite/gcc.dg/sms-7.c(revision 173693)
+++ testsuite/gcc.dg/sms-7.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms } */
+/* { dg-options -O3 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms 
-fmodulo-sched-allow-regmoves --param sms-min-sc=1 } */
+/* { dg-options -O2 -fmodulo-sched -fstrict-aliasing -fdump-rtl-sms --param 
sms-min-sc=1 -mno-update -fmodulo-sched-allow-regmoves { target powerpc*-*-*} 
} */
 
 extern void abort (void);
 
@@ -44,7 +45,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms  { target spu-*-* 
} } } */
+/* { dg-final { scan-rtl-dump-times SMS succeeded 2 sms  { target spu-*-* 
} } } */
 /* { dg-final { scan-rtl-dump-times SMS succeeded 3  sms { target 
powerpc*-*-* } } } */
 /* { dg-final { cleanup-rtl-dump sms } } */
 
Index: testsuite/gcc.dg/sms-4.c
===
--- testsuite/gcc.dg/sms-4.c(revision 173693)
+++ testsuite/gcc.dg/sms-4.c(working copy)
@@ -1,6 +1,7 @@
 /* Inspired from sbitmap_a_or_b_and_c_cg function in sbitmap.c.  */
 /* { dg-do run } */
-/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms   } */
+/* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 -mno-update { target powerpc*-*-*} } */
 
 extern void abort (void);
 
Index: testsuite/gcc.dg/sms-8.c
===
--- testsuite/gcc.dg/sms-8.c(revision 173693)
+++ testsuite/gcc.dg/sms-8.c(working copy)
@@ -3,7 +3,8 @@
 that was not fixed by reg-moves.  */
 
  /* { dg-do run } */
- /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms } */
+ /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param sms-min-sc=1 } */
+ /* { dg-options -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves 
-fdump-rtl-sms --param 

Re: [PATCH, SMS 1/3] Support closing_branch_deps (second try)

2011-05-12 Thread Revital Eres
Hello Ramana,

Following our conversation; here is the patch again without
the arm specific flags. Tested on ARM machine configured with
[--with-arch=armv7-a] --with-mode=thumb.

OK for mainline?

Thanks,
Revital

testsuite/Changlog:

        * gcc.target/arm/sms-9.c: New file.
        * gcc.target/arm/sms-10.c: New file.
Index: arm/sms-9.c
===
--- arm/sms-9.c (revision 0)
+++ arm/sms-9.c (revision 0)
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms -fno-auto-inc-dec } */
+
+extern void abort (void);
+
+int filter1[8][4] = {
+  {
+   23170, -23170, -23170, 23170,},
+  {
+   22005, -26319, -16846, 29621,},
+  {
+   22005, -26319, -16846, 29621,},
+  {
+   5, -26319, -16846, 29621,},
+  {
+   55, -26319, -16846, 29621,},
+  {
+   77, -26319, -16846, 29621,},
+  {
+   22005, -26319, -16846, 29621,},
+  {
+   22005, -26319, -16846, 29621,},
+
+};
+
+
+int out[32] = {
+  22, -22, -22, 22, 21, -25, -16, 28, 21, -25, -16, 28, 0, -25, -16, 28, 0,
+-25, -16, 28, 0, -25, -16, 28, 21, -25, -16, 28, 21, -25, -16, 28
+};
+
+__attribute__ ((noinline))
+static void
+foo (int *arr, int *accums)
+{
+  typedef int NN[8][4];
+  static NN *filter;
+  int i;
+  filter = filter1;
+
+  int *filterp;
+  int *arrp;
+  arrp = arr;
+  filterp = (int *) ((*filter)[0]);
+  i = 32;
+
+  while (i--)
+{
+  *accums++ = (arrp[0] * filterp[0] + arrp[8] * filterp[0]) / 32768;
+  filterp += 1;
+}
+}
+
+int
+main ()
+{
+  int inarr[32];
+  int accums[32];
+  int i;
+  for (i = 0; i  32; i++)
+inarr[i] = i  2;
+  foo (inarr, accums);
+  for (i = 0; i  32; i++)
+if (out[i] != accums[i])
+  abort ();
+  return 0;
+}
+
+/* { dg-final { scan-rtl-dump-times SMS succeeded 1 sms } }  */
+/* { dg-final { cleanup-rtl-dump sms } } */
+
+
Index: arm/sms-10.c
===
--- arm/sms-10.c(revision 0)
+++ arm/sms-10.c(revision 0)
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-options -O2 -fmodulo-sched -fdump-rtl-sms -fno-auto-inc-dec 
-fmodulo-sched-allow-regmoves -gtoggle } */
+
+extern void abort (void);
+
+unsigned char filter1[8] = { 2, 3, 1, 2, 3, 2, 2, 1 };
+
+
+void
+foo (int val, unsigned int size, unsigned char *dest)
+{
+  while (size != 0)
+{
+  *dest++ = val  0xff;
+  --size;
+}
+}
+
+
+int
+main ()
+{
+  int i;
+  foo (50, 4, filter1);
+  for (i = 0; i  4; i++)
+if (filter1[i] != 50)
+  abort ();
+  return 0;
+}
+
+/* { dg-final { scan-rtl-dump-times OK 1 sms } }  */
+/* { dg-final { cleanup-rtl-dump sms } } */
+
+


Re: [PATCH, SMS 1/3] Support closing_branch_deps (second try)

2011-05-11 Thread Revital Eres
Hello,

 please clarify that, e.g., instruction scheduled on cycle AMOUNT will move
 to cycle zero.

OK, done.

 shouldn't normalized_time be used here instead of SCHED_TIME (u)?

SCHED_TIME (u) is been set to normalized_time just before
using it.

Thanks,
Revital


Re: [PATCH, SMS 1/3] Support closing_branch_deps (second try)

2011-05-11 Thread Revital Eres
Hello,

 please clarify that, e.g., instruction scheduled on cycle AMOUNT will move
 to cycle zero.

OK, done.

 shouldn't normalized_time be used here instead of SCHED_TIME (u)?

SCHED_TIME (u) is been set to normalized_time just before
using it.

Thanks,
Revital


[PATCH, SMS 3/3] Skip DEBUG_INSN in loop-doloo​​​p.

2011-05-08 Thread Revital Eres
Hello,

(sorry for multiple copies of this email)

This small fix was inserted to skip DEBUG_INSNs while
recognizing doloop pattern in loop-doloop.c file.  It's a fix
for the already approved do-loop patch (not in mainline yet,
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c

The patch was tested together with the rest of the patches in this series
and on top of the patch to support do-loop for ARM (not yet in mainline,
but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

* loop-doloop.c (doloop_condition_get): Use prev_nondebug_insn
   instead of PREV_INSN.
--- loop-doloop.c   2011-05-07 16:08:27.0 +0300
+++ loop-doloop_new.c   2011-05-07 16:07:48.0 +0300
@@ -151,7 +151,7 @@ doloop_condition_get (rtx doloop_pat)
 inc = XVECEXP (PATTERN (prev_insn), 0, 1);
   }
  else
-inc = PATTERN (PREV_INSN (doloop_pat));
+inc = PATTERN (prev_insn);
  /* We expect the condition to be of the form (reg != 0)  */
  cond = XEXP (SET_SRC (cmp), 0);
  if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)

[PATCH, SMS 1/3] Support closing_branch_deps (second try)

2011-05-07 Thread Revital Eres
Hello,

The attached patch includes enhancements for SMS to support targets
that their doloop part is not decoupled from the rest of the loop's
instructions, as SMS currently requires. In this case, the branch can
not be placed wherever we want (as is currently done) due to the fact
it must honor dependencies and thus we schedule the branch instruction
with the rest of the loop's instructions and rotate it to be in row
ii-1 at the end of the scheduling procedure to make sure it's the last
instruction in the iteration.

The attached patch changes the current implementation to always schedule
the branch in order to support the above case.

As explained in http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00250.html
by always scheduling the branch the code size might be effected due to
the fact SC can be increased by 1, which means adding instructions from
at most one iteration to the prologue and epilogue.  Also, it might
be that ii will be increased by one due to resources constraints --
unavailability of free slots to schedule the branch.

The patch was tested together with the rest of the patches in this series
and on top of the patch to support do-loop for ARM (not yet in mainline,
but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

ChangeLog:

* ddg.c (create_ddg_dep_from_intra_loop_link): If a true dep edge
enters the branch create an anti edge in the opposite direction
to prevent the creation of reg-moves.
* modulo-sched.c: Adjust comment to reflect the fact we are
scheduling closing branch.
(PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT and redefine.
(stage_count): New field in struct partial_schedule.
(calculate_stage_count): New function.
(normalize_sched_times): Rename to reset_sched_times and handle
incrementing the sched time of the nodes by a constant value
passed as parameter.
(duplicate_insns_of_cycles): Skip closing branch.
(sms_schedule_by_order): Schedule closing branch.
(ps_insn_find_column): Handle closing branch.
(sms_schedule): Call reset_sched_times and adjust the code to
support scheduling of the closing branch.
(ps_insert_empty_row): Update calls to normalize_sched_times
and rotate_partial_schedule functions.

testsuite Changlog:

* gcc.target/arm/sms-9.c: New file.
* gcc.target/arm/sms-10.c: New file.
Index: ddg.c
===
--- ddg.c   (revision 173296)
+++ ddg.c   (working copy)
@@ -197,6 +197,11 @@ create_ddg_dep_from_intra_loop_link (ddg
 }
 }
 
+  /* If a true dep edge enters the branch create an anti edge in the
+ opposite direction to prevent the creation of reg-moves.  */
+  if ((DEP_TYPE (link) == REG_DEP_TRUE)  JUMP_P (dest_node-insn))
+create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1);
+
latency = dep_cost (link);
e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
add_edge_to_ddg (g, e);
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173296)
+++ modulo-sched.c  (working copy)
@@ -84,14 +84,13 @@ along with GCC; see the file COPYING3.  
   II cycles (i.e. use register copies to prevent a def from overwriting
   itself before reaching the use).
 
-SMS works with countable loops (1) whose control part can be easily
-decoupled from the rest of the loop and (2) whose loop count can
-be easily adjusted.  This is because we peel a constant number of
-iterations into a prologue and epilogue for which we want to avoid
-emitting the control part, and a kernel which is to iterate that
-constant number of iterations less than the original loop.  So the
-control part should be a set of insns clearly identified and having
-its own iv, not otherwise used in the loop (at-least for now), which
+SMS works with countable loops whose loop count can be easily
+adjusted.  This is because we peel a constant number of iterations
+into a prologue and epilogue for which we want to avoid emitting
+the control part, and a kernel which is to iterate that constant
+number of iterations less than the original loop.  So the control
+part should be a set of insns clearly identified and having its
+own iv, not otherwise used in the loop (at-least for now), which
 initializes a register before the loop to the number of iterations.
 Currently SMS relies on the do-loop pattern to recognize such loops,
 where (1) the control part comprises of all insns defining 

[PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop

2011-05-07 Thread Revital Eres
Hello,

The attached patch adds code to skip DEBUG_INSNs while recognizing
doloop pattern.

The patch was tested together with the rest of the patches in this series
and on top of the patch to support do-loop for ARM (not yet in mainline,
but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

* modulo-sched.c (doloop_register_get): Ignore DEBUG_INSNs while
recognizing doloop.


Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173368)
+++ modulo-sched.c  (working copy)
@@ -310,10 +313,10 @@ doloop_register_get (rtx head ATTRIBUTE_
  either a single (parallel) branch-on-count or a (non-parallel)
  branch immediately preceded by a single (decrement) insn.  */
   first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
- : PREV_INSN (tail));
+ : prev_nondebug_insn (tail));

   for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn))
-if (reg_mentioned_p (reg, insn))
+if (reg_mentioned_p (reg, insn)  !DEBUG_INSN_P (insn))
   {
 if (dump_file)
 {


Re: [PATCH, SMS] Avoid considering debug_insn when calculating SCCs

2011-05-04 Thread Revital Eres
Hello,

The following is a summary of discussion I had with Ayal regarding the patch:

Some background: currently, SMS supports only  targets where the doloop
pattern is decoupled from the rest of the loop's instructions (for example
PowerPC) (we'll call it 'case decoupled' for simplicity) In this case,
the branch is not been scheduled with the rest of the instructions but
rather placed in row ii-1 at the end of the scheduling procedure after
all the rest of the instructions had been scheduled. The resulting kernel
is optimal with respect to the Stage Count because min_cycle placed in
row 0 after normalizing the cycles to start from cycle zero.
This patch tries to extend SMS to support targets where the doloop
pattern is not decoupled from the rest of the loop's instructions (name
it 'case NEW' for simplicity). In this case the branch can not be placed
wherever we want due to the fact it must honor dependencies and thus we
schedule the branch instruction with the rest of the loop's instructions
and rotate it to be in row ii-1 at the end of the scheduling procedure
to make sure it's the last instruction in the iteration.

The suggestion was to simplify the patch by always schedule the branch
with the rest of the instructions.
This should not effect performance but rather code size by increasing
the SC by at most 1, which means adding instructions from at most one
iteration to the prologue and epilogue; for case decoupled. (where we
have the alternative of normalizing the cycles and achieve optimal SC).

The following is my attempt to prove that the SC can increase by
at most one:
If the distance between min_cycle and max_cycle remains the same when
considering the same loop with decoupled branch part, once scheduling
the branch instruction with the rest of the loop's instructions and
once ignoring it; it means that the SC is at most +1 for the first case.
This is true in one direction as the branch instruction should not effect
the scheduling window of any other instruction which is what we expect
for case decoupled. The question is if there are cases where the branch
can be scheduled outside the range of min_cycle and max_cycle.  I think
there is no such case because the branch will be scheduled in asap =
0 which means that it will fall in the range of min_cycle max_cycle.
In practice there is edge between memory references and the branch
instruction with latency zero which is inserted by haifa sched. Also,
it might be that the branch will be scheduled outside the range of
min_cycle and max_cycle due to resources constraints. For example, in
PowerPC the issue rate in SMS in always 1 which forces the branch to be
scheduled in a new cycle (and might also influence ii in artifact way).

Example of resulting SMS kernel for the same loop:

The SMS kernel for case NEW, resulting in SC of 3 and ii 5:

cycle  node

-30
-2
-11
- start of SC 2
0
1
2 3
3
4   5 the branch
- start of SC 3
5  4

The SMS kernel for case decoupled resulting in SC of 2 and ii 5:

cycle  node

00
1
2 1
3 3
4 2
- start of SC 2
5
6  3


Thanks,
Revital


Re: [PATCH, SMS] Avoid considering debug_insn when calculating SCCs

2011-05-04 Thread Revital Eres
My apologies, the previous email refers to the patch:

http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00350.html
- Support closing_branch_deps

and not to  'Avoid considerin​g debug_insn when calculatin​g SCCs' as
the title implies.

Thanks,
Revital


[PATCH, SMS] Avoid considering debug_insn when calculating SCCs

2011-04-17 Thread Revital Eres
Hello,

The attached patch avoids considering debug_insn when calculating SCCs.
With this change the existence of debug_insn does not influence
the scheduling order and rec_MII.

Bootstrap and regtest on ppc64-redhat-linux and regtest on arm-linux-gnueabi.

OK for mainline?

Thanks,
Revital

Changelog:

* ddg.c (find_nodes_on_paths): Ignore DEBUG_INSNs.
=== modified file 'gcc/ddg.c'
--- gcc/ddg.c   2011-03-27 07:11:08 +
+++ gcc/ddg.c   2011-04-11 12:04:54 +
@@ -1116,13 +1116,19 @@ find_nodes_on_paths (sbitmap result, ddg
{
  ddg_edge_ptr e;
  ddg_node_ptr u_node = g-nodes[u];
-
+ 
+ /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+influence on the scheduling order and rec_mii.  */
+ if (DEBUG_INSN_P (u_node-insn))
+   continue;
+ 
  for (e = u_node-out; e != (ddg_edge_ptr) 0; e = e-next_out)
{
  ddg_node_ptr v_node = e-dest;
  int v = v_node-cuid;
 
- if (!TEST_BIT (reachable_from, v))
+ /* Ignore DEBUG_INSN when calculating the SCCs.  */
+ if (!TEST_BIT (reachable_from, v)  !DEBUG_INSN_P (v_node-insn))
{
  SET_BIT (reachable_from, v);
  SET_BIT (tmp, v);
@@ -1146,12 +1152,18 @@ find_nodes_on_paths (sbitmap result, ddg
  ddg_edge_ptr e;
  ddg_node_ptr u_node = g-nodes[u];
 
+ /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+influence on the scheduling order and rec_mii.  */
+ if (DEBUG_INSN_P (u_node-insn))
+   continue;
+ 
  for (e = u_node-in; e != (ddg_edge_ptr) 0; e = e-next_in)
{
  ddg_node_ptr v_node = e-src;
  int v = v_node-cuid;
 
- if (!TEST_BIT (reach_to, v))
+ /* Ignore DEBUG_INSN when calculating the SCCs.  */
+ if (!TEST_BIT (reach_to, v)  !DEBUG_INSN_P (v_node-insn))
{
  SET_BIT (reach_to, v);
  SET_BIT (tmp, v);



Re: [PATCH] New flag to apply SMS when SC equals 1 (Second try)

2011-04-17 Thread Revital Eres
Hello,


 New params need documentation in doc/invoke.texi.  Please also change the
 maximum value to 2, not 1.  Ok with that changes.

When changing the max value to 2, any value that is greater than
2 is rejected due to the following comment in params.def:

 - The maximum acceptable value for the parameter (if greater than
 the minimum).

So I'll leave the maximum value to be 1 if that's OK.
I added the documentation in invoke.taxi.

Thanks,
Revital

* params.def (sms-min-sc): New param flag.
* modulo-sched.c (sms_schedule): Use it.
* doc/invoke.texi (sms-min-sc): Document it.
=== modified file 'gcc/doc/invoke.texi'
--- gcc/doc/invoke.texi 2011-03-07 03:00:04 +
+++ gcc/doc/invoke.texi 2011-04-17 11:28:33 +
@@ -8718,6 +8718,10 @@ through which the instruction may be pip
 The maximum number of best instructions in the ready list that are considered
 for renaming in the selective scheduler.  The default value is 2.
 
+@item sms-min-sc
+The minimum value of stage count that swing modulo scheduler will
+generate.  The default value is 2.
+
 @item max-last-value-rtl
 The maximum size measured as number of RTLs that can be recorded in an 
expression
 in combiner for a pseudo register as last known value of that register.  The 
default

=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c  2011-03-27 07:11:08 +
+++ gcc/modulo-sched.c  2011-04-17 10:53:03 +
@@ -1222,9 +1222,10 @@ sms_schedule (void)
  PS_STAGE_COUNT(ps) = stage_count;
}
   
-  /* Stage count of 1 means that there is no interleaving between
- iterations, let the scheduling passes do the job.  */
-  if (stage_count = 1
+  /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
+1 means that there is no interleaving between iterations thus
+we let the scheduling passes do the job in this case.  */
+  if (stage_count  (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
  || (count_init  (loop_count = stage_count))
  || (flag_branch_probabilities  (trip_count = stage_count)))
{

=== modified file 'gcc/params.def'
--- gcc/params.def  2011-02-02 15:52:08 +
+++ gcc/params.def  2011-04-17 11:16:03 +
@@ -344,6 +344,11 @@ DEFPARAM(PARAM_SMS_MAX_II_FACTOR,
 sms-max-ii-factor,
 A factor for tuning the upper bound that swing modulo scheduler uses 
for scheduling a loop,
 100, 0, 0)
+/* The minimum value of stage count that swing modulo scheduler will generate. 
 */
+DEFPARAM(PARAM_SMS_MIN_SC,
+sms-min-sc,
+The minimum value of stage count that swing modulo scheduler will 
generate.,
+2, 1, 1)
 DEFPARAM(PARAM_SMS_DFA_HISTORY,
 sms-dfa-history,
 The number of cycles the swing modulo scheduler considers when 
checking conflicts using DFA,



[PATCH, SMS] Support instructions with REG_INC_NOTE

2011-04-17 Thread Revital Eres
Hello,

The attached patch extends the current implementation to analyze
instructions with REG_INC_NOTE.

Tested on ppc64-redhat-linux (bootstrap and regtest) SPU (only regtest)
and arm-linux-gnueabi (bootstrap c and regtest) configured with
--with-arch=armv7-a --with-mode=thumb.

OK for mainline?

Thanks,
Revital

Changelog:

* modulo-sched.c (record_inc_dec_insn_info,
free_node_sched_params): New functions.
(SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES): Remove.
(struct regmove_info): New.
(insn_regmove_info): New field in node_sched_params.
(print_node_sched_params): Print information for all the
definitions in the instructions.
(generate_reg_moves, duplicate_insns_of_cycles,
set_node_sched_params): Adjust the code to handle instructions
that have multiple definitions.
(sms_schedule): Handle loops that contain instructions with
FIND_REG_INC_NOTE and call free_node_sched_params.
=== modified file 'gcc/modulo-sched.c'
--- gcc/modulo-sched.c  2011-03-27 07:11:08 +
+++ gcc/modulo-sched.c  2011-04-17 10:29:24 +
@@ -201,32 +201,50 @@ static void duplicate_insns_of_cycles (p
   int, int, int, rtx);
 static int calculate_stage_count (partial_schedule_ptr ps);
 
+static int record_inc_dec_insn_info (rtx, rtx, rtx, rtx, rtx, void *);
+
+
 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)-aux.info)-asap)
 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)-aux.info)-time)
-#define SCHED_FIRST_REG_MOVE(x) \
-   (((node_sched_params_ptr)(x)-aux.info)-first_reg_move)
-#define SCHED_NREG_MOVES(x) \
-   (((node_sched_params_ptr)(x)-aux.info)-nreg_moves)
 #define SCHED_ROW(x) (((node_sched_params_ptr)(x)-aux.info)-row)
 #define SCHED_STAGE(x) (((node_sched_params_ptr)(x)-aux.info)-stage)
 #define SCHED_COLUMN(x) (((node_sched_params_ptr)(x)-aux.info)-column)
 
-/* The scheduling parameters held for each node.  */
-typedef struct node_sched_params
+/* Information about register-move generated for a definition.  */
+struct regmove_info
 {
-  int asap;/* A lower-bound on the absolute scheduling cycle.  */
-  int time;/* The absolute scheduling cycle (time = asap).  */
-
+  /* The definition for which the register-move is generated for.  */
+  rtx def;
+  
   /* The following field (first_reg_move) is a pointer to the first
- register-move instruction added to handle the modulo-variable-expansion
- of the register defined by this node.  This register-move copies the
- original register defined by the node.  */
+ register-move instruction added to handle the
+ modulo-variable-expansion of the register defined by this node.
+ This register-move copies the original register defined by the node.
+  */
   rtx first_reg_move;
-
+  
   /* The number of register-move instructions added, immediately preceding
  first_reg_move.  */
   int nreg_moves;
+  
+  /* Auxiliary info used in the calculation of the register-moves.  */
+  void *aux;
+};
+
+typedef struct regmove_info *regmove_info_ptr;
+DEF_VEC_P (regmove_info_ptr);
+DEF_VEC_ALLOC_P (regmove_info_ptr, heap);
 
+/* The scheduling parameters held for each node.  */
+typedef struct node_sched_params
+{
+  int asap;/* A lower-bound on the absolute scheduling cycle.  */
+  int time;/* The absolute scheduling cycle (time = asap).  */
+  
+  /* Information about register-moves needed for
+ definitions in the instruction.  */
+  VEC (regmove_info_ptr, heap) *insn_regmove_info;
+  
   int row;/* Holds time % ii.  */
   int stage;  /* Holds time / ii.  */
 
@@ -423,12 +441,58 @@ set_node_sched_params (ddg_ptr g)
  appropriate sched_params structure.  */
   for (i = 0; i  g-num_nodes; i++)
 {
+  rtx insn = g-nodes[i].insn;
+  rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+  rtx set = single_set (insn);
+  
   /* Watch out for aliasing problems?  */
   node_sched_params[i].asap = g-nodes[i].aux.count;
+  node_sched_params[i].insn_regmove_info = NULL;
+  
+  /* Record the definition(s) in the instruction.  These will be
+later used to calculate the register-moves needed for each
+definition. */
+  if (set  REG_P (SET_DEST (set)))
+   { 
+ regmove_info_ptr elt = 
+   (regmove_info_ptr) xcalloc (1, sizeof (struct regmove_info));
+ 
+ elt-def = SET_DEST (set);
+ VEC_safe_push (regmove_info_ptr, heap, 
+node_sched_params[i].insn_regmove_info,
+elt);
+   }
+  
+  if (note)
+   for_each_inc_dec (insn, record_inc_dec_insn_info, 
+ node_sched_params[i]);
+  
   g-nodes[i].aux.info = node_sched_params[i];
 }
 }
 
+/* Free the sched_params information allocated for each node.  */
+static void
+free_node_sched_params (ddg_ptr g)
+{
+  int i;
+  regmove_info_ptr def;
+
+  for (i = 0; i  g-num_nodes; 

[PATCH, SMS] Avoid unfreed memory when SMS fails

2011-04-15 Thread Revital Eres
Hello,

This patch fixes the scenario where SMS fails to
schedule a loop and continue to the next one without
freeing data structures allocated while scheduling
the first loop.

Bootstrap and regtested on ppc64-redhat-linux.

OK for mainline?

Thanks,
Revital

Changelog:

* modulo-sched.c (sms_schedule): Avoid unfreed memory when SMS fails.

Index: modulo-sched.c
===
--- modulo-sched.c  (revision 170464)
+++ modulo-sched.c  (working copy)
@@ -1177,7 +1177,6 @@ sms_schedule (void)
  fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
  fprintf (dump_file, )\n);
}
- continue;
}
   else
{


[PATCH, SMS] Free sccs field

2011-04-15 Thread Revital Eres
Hello,

The attached patch adds missing free operation for storage
allocated while calculating SCCs.

Bootstrap and regtested on ppc64-redhat-linux.

OK for mainline?

Thanks,
Revital

Changelog:

* ddg.c (free_ddg_all_sccs): Free sccs field in struct ddg_all_sccs.

Index: ddg.c
===
--- ddg.c   (revision 171573)
+++ ddg.c   (working copy)
@@ -1011,6 +1082,8 @@ free_ddg_all_sccs (ddg_all_sccs_ptr all_
   for (i = 0; i  all_sccs-num_sccs; i++)
 free_scc (all_sccs-sccs[i]);

+  if (all_sccs-sccs)
+free (all_sccs-sccs);
   free (all_sccs);
 }


Re: [PATCH, SMS] Free sccs field

2011-04-15 Thread Revital Eres
Hello,

On 15 April 2011 18:53, Nathan Froyd froy...@codesourcery.com wrote:
 On Fri, Apr 15, 2011 at 06:27:05PM +0300, Revital Eres wrote:
 +  if (all_sccs-sccs)
 +    free (all_sccs-sccs);

 No need to check for non-NULL prior to free'ing.

OK, I'll commit the patch without the check then.
(after re-testing)

Thanks,
Revital



 -Nathan




Re: [PATCH, SMS] New flag to apply SMS when SC equals 1

2011-04-15 Thread Revital Eres
Hello,

 If it's for debugging, can you use a --parm instead (like
 modulo-sched-min-sc or similar)?


I think I can use --param for debugging purposes in this case.
(I might add modulo-sched-max-sc as well)

Thanks,
Revital

 Thanks,
 Richard.

 Thanks,
 Revital

 Changelog:

        * common.opt (fmodulo-sched-allow-sc-one): New flag.
        * modulo-sched.c (sms_schedule): Allow SMS when stage count
        equals one and -fmodulo-sched-allow-sc-one flag is set.