[PATCH, SMS] Fix PR51794

2012-01-10 Thread Revital1 Eres

Hello,

The patch below fixes ICE reported in PR51794.
It avoids creating DDG edges  for register uses of class DF_REF_ARTIFICIAL
as
the latter does not have real instructions for them and thus calling
BLOCK_FOR_INSN fails.

Tested and bootstrap on ppc64-redhat-linux, enabling SMS on loops with SC
1.

OK for mainline?

Thanks,
Revital

Chanelog:

PR rtl-optimization/51794
* ddg.c (add_cross_iteration_register_deps): Avoid
creating edges for uses of class DF_REF_ARTIFICIAL.

testsuite/
PR rtl-optimization/51794
* gcc.dg/sms-12.c: New test.

(See attached file: patch_fix_pr51794.txt)Index: ddg.c
===
--- ddg.c   (revision 183001)
+++ ddg.c   (working copy)
@@ -315,7 +315,12 @@ add_cross_iteration_register_deps (ddg_p
   /* Create inter-loop true dependences and anti dependences.  */
   for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use-next)
 {
-  rtx use_insn = DF_REF_INSN (r_use-ref);
+  rtx use_insn;
+
+  if (r_use-ref == NULL || DF_REF_CLASS (r_use-ref) == DF_REF_ARTIFICIAL)
+   continue;
+
+  use_insn = DF_REF_INSN (r_use-ref);
 
   if (BLOCK_FOR_INSN (use_insn) != g-bb)
continue;
Index: testsuite/gcc.dg/sms-12.c
===
--- testsuite/gcc.dg/sms-12.c   (revision 0)
+++ testsuite/gcc.dg/sms-12.c   (revision 0)
@@ -0,0 +1,13 @@
+ /* { dg-do compile } */
+ /* { dg-options -O -fmodulo-sched } */
+
+
+int
+res_inverse (int a)
+{
+  int i;
+  char **b = (char **) __builtin_alloca (a * sizeof (char *));
+  for (i = 0; i  a; i++)
+b[i] = (char *) __builtin_alloca (sizeof (*b[i]));
+  return 0;
+}


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-03 Thread Revital1 Eres

Hello,

 On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
  Ayal Zaks ayal.z...@gmail.com writes:
  +  for (i = 0; i  ira_pressure_classes_num; i++)
  +    {
  +      enum reg_class pressure_class;
  +
  +      pressure_class = ira_pressure_classes[i];
  +
  +      if (max_reg_pressure[pressure_class] == 0)
  +     continue;
  +
  +      if (dump_file)
  +     fprintf (dump_file, %s=%d  %d , reg_class_names
[pressure_class],
  +              max_reg_pressure[pressure_class],
  +              ira_available_class_regs[pressure_class]);
  +
  +      if (max_reg_pressure[pressure_class]
  +        ira_class_hard_regs_num[pressure_class])
  +     {
  +       if (dump_file)
  +         fprintf (dump_file, (pressure)\n);
  +
  +       pressure_p = true;
 
  you can break; now.
 
  FWIW, I thought the same thing at first, but I think Revital's wayis
better.
  It's nice to know when looking at dumps whether there is excess
pressure
  in more than one class.  This isn't performance-critical code after
all.
 

 ok

  however, you have everything setup to compute the amount of spill, so
  suggest to do the following instead:
 
            pressure += max_reg_pressure[pressure_class]
                        - ira_class_hard_regs_num[pressure_class]);
 
  or better call the variable spillage instead of pressure, and
  return it. Current use will only check if it's zero or positive.
 
  I read this suggestion in the same way as Revital seems to have done:
  that we sum the pressure change over all classes.  But that isn't the
idea.
  Using N too many registers in one pressure class cannot be mitigated by
  leaving N registers in another pressure class unused.
 

 of-course (wasn't thinking of spilling from one register file to
 another); meant to suggest doing:

 if (max_reg_pressure[pressure_class]  ira_class_hard_regs_num
 [pressure_class])
   spillage += max_reg_pressure[pressure_class] -
 ira_class_hard_regs_num[pressure_class]);


  TBH, the original version of this function looked better to me.
 

 ok, it would surely suffice for now.


Attached is an updated version with the two changes mentioned above taken
from the previous patch.

Tested and bootstrap with the other patch in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Thanks again,
Revital

2012-01-03  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.

(See attached file: patch_pressure_3_1_12.txt)Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182766)
+++ doc/invoke.texi (working copy)
@@ -374,6 +374,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6476,6 +6477,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h

Re: Adjust debug output from SMS's get_schdedule_window

2011-08-04 Thread Revital1 Eres
Hello Richard,

 This patch adjusts the dump output from
modulo-sched.c:get_schdedule_window.
 Dump output is very much down to personal preference, so please feel free
 just to reject the change.

The output format looks great to me although I cannot approve it.

Thanks,
Revital



Re: Patches ping

2011-08-01 Thread Revital1 Eres

Hi,

Thanks for the review!


 Changelog:
 
        (sms_schedule_by_order): Update call to get_sched_window.
        all set_must_precede_follow.
     ^^^
     call

Done.

 +/* Set bitmaps TMP_FOLLOW and TMP_PRECEDE to MUST_FOLLOW and
MUST_PRECEDE
 +   respectively only if cycle C falls in the scheduling window
boundaries
     ^^
  on the border of

Done.



    sbitmap tmp_precede = NULL;
    sbitmap tmp_follow = NULL;
 are redundantly reset in set_must_precede_follow().

Done. I removed the setting of tmp_precede and tmp_follow before calling
set_must_precede_follow. (they are reset in set_must_precede_follow)

 +/* Update the sched_params for node U using the II,
     ^
 (time, row and stage)
  +   the CYCLE of U and MIN_CYCLE.  */
 Please also clarify that we're not simply taking
   SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii);
 because the stages are aligned on cycle 0.

Done. I assume it should be because the stages may not be aligned on cycle
0.

 +  /* First, normailize the partial schedualing.  */
    ^   ^

 +   /* Try to achieve optimized SC by normalizing the partial
 +  schedule (having the cycles start from cycle zero). The branch
    ^
 +  location must be placed in row ii-1 in the final scheduling.

 +  If that's not the case after the normalization then try to
     ^^
 +  move the branch to that row if possible.  */
     ^
     If failed, shift all instructions to position the branch in row
ii-1.

Done.


 For consistency and clarity, may be instead of:

 +   /* Bring the branch to cycle -1.  */
 +   int amount = SCHED_TIME (g-closing_branch) + 1;
 it would be better to have:

 +   /* Bring the branch to cycle ii-1.  */
 +   int amount = SCHED_TIME (g-closing_branch) - (ii - 1);


OK, the attached patch contains this fix in other places
as well where we bring the branch to cycle -1.

 Some thoughts on possible improvements (not mandatory; especially
 given the delay in approval, sorry..thanks for the ping):
 o Have optimize_sc() take care of all possible rotations doing the
 best it can, without returning an indication which leaves subsequent
 (suboptimal) processing to the caller.
 o Instead of removing the branch and then trying to get it back into
 the same cycle if you can't place it in row ii-1, consider keeping it
 in its place and removing it only if you succeed to schedule it
 (also..) in row ii-1.
 o May be worthwhile to apply more refactoring, so that the code to
 reschedule the branch in row ii-1 reuses more of the general code for
 scheduling an instruction (possibly setting end = start + end), but
 avoid splitting a row.
 o Would be interesting to learn of loops that still have suboptimal
 SC's, to see if it's still an issue.

Thanks for the suggestions, will try to address them.

I'm now re-testing the attached patch on PowerPC and will commit it if
there will be no further comments.

Thanks,
Revital

(See attached file: patch_opt_sc_1_8.txt)
Index: ChangeLog
===
--- ChangeLog   (revision 176998)
+++ ChangeLog   (working copy)
@@ -1,3 +1,18 @@
+2011-08-01  Revital Eres  revital.e...@linaro.org
+
+   * modulo-sched.c (calculate_stage_count,
+   calculate_must_precede_follow, get_sched_window,
+   try_scheduling_node_in_cycle, remove_node_from_ps): Add
+   declaration.
+   (update_node_sched_params, set_must_precede_follow, optimize_sc):
+   New functions.
+   (reset_sched_times): Call update_node_sched_params.
+   (sms_schedule): Call optimize_sc.
+   (get_sched_window): Change function arguments.
+   (sms_schedule_by_order): Update call to get_sched_window.
+   Call set_must_precede_follow.
+   (calculate_stage_count): Add function argument.
+
 2011-07-31  Richard Henderson  r...@redhat.com
 
* config/h8300/crti.asm: Add flags to .section directive.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 176998)
+++ modulo-sched.c  (working copy)
@@ -203,7 +203,16 @@ static void generate_prolog_epilog (part
 rtx, rtx);
 static void duplicate_insns_of_cycles (partial_schedule_ptr,
   int, int, int, rtx);
-static int calculate_stage_count (partial_schedule_ptr ps);
+static int calculate_stage_count (partial_schedule_ptr, int);
+static void calculate_must_precede_follow (ddg_node_ptr, int, int,
+  int, int, sbitmap, sbitmap, sbitmap);
+static int get_sched_window 

Re: [PATCH 3/9] [SMS] Eliminate redundant edges

2011-07-24 Thread Revital1 Eres
Hi Roman,

 While building a data dependency graph for loop a ddg edge for some pair
 of instructions with inter-loop dependency should be created only if
 there is no edge for intra-loop dependency between these instructions.
 Creating both of edges leads sometimes to the fact that function

It would be much appreciated if you could provide an example for the
problematic scenario which leads to the miscompilation.

Thanks,
Revital





Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-07-24 Thread Revital1 Eres
Hello Roman,

 This patch should be applied only after pending patches by Revital. This
patch
 significantly enhances the existing implementation of the SMS.  Patch
adds
 support of scheduling loops without doloop pattern.  The loop should meet
the
 following requirements.

Thanks for the patch!
I will try to go through it in more details soon.
I'm testing now whether the recent bootstrap
failure on ARM machine (PR49789) is revolved with your patch.
I also plan to see the effect of it on PowerPC and SPU which currently
support doloop pattern.

Thanks,
Revital



[PATCH, SMS 3/3] Skip DEBUG_INSN in loop-doloo​​​​p.

2011-05-08 Thread Revital1 Eres

Hello,

(sorry for multiple copies of this email)

This small fix was inserted to skip DEBUG_INSNs while
recognizing doloop pattern in loop-doloop.c file.  It's a fix
for the already approved do-loop patch (not in mainline yet,
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c

The patch was tested together with the rest of the patches in this series
and on top of the patch to support do-loop for ARM (not yet in mainline,
but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

   * loop-doloop.c (doloop_condition_get): Use prev_nondebug_insn
  instead of PREV_INSN.



--- loop-doloop.c   2011-05-07 16:08:27.0 +0300
+++ loop-doloop_new.c   2011-05-07 16:07:48.0 +0300
@@ -151,7 +151,7 @@ doloop_condition_get (rtx doloop_pat)
 inc = XVECEXP (PATTERN (prev_insn), 0, 1);
   }
  else
-inc = PATTERN (PREV_INSN (doloop_pat));
+inc = PATTERN (prev_insn);
  /* We expect the condition to be of the form (reg != 0)  */
  cond = XEXP (SET_SRC (cmp), 0);
  if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)



Re: debug insns in SMS (was: Re: Debug_insn)

2011-05-04 Thread Revital1 Eres
Hello Alexandre

 I think this will restore proper functioning to SMS in the presence of
 debug insns.  A while ago, we'd never generate deps of non-debug insns
 on debug insns.  I introduced them to enable sched to adjust (reset)
 debug insns when non-debug insns were moved before them.  I believe it
 is safe to leave them out of the SCCs.  Even though this will end up
 causing some loss of debug info, that's probably unavoidable, and the
 end result after this change is pobably the best we can hope for.  Your
 thoughts?

Thanks for the patch!

I actually discussed this issue with Ayal yesterday.
Ayal also suggested to reconsider the edges that are created in
the DDG between real instructions and debug_insns. Currently, we create
bidirectional anti deps edges between them. This leads to the problem you
were trying to solve in the current patch (described below) where these
extra edges influence the construction of the strongly connected component
and the code generated with and w\o -g. Your patch seems to solve this
problem.
However I can not approve it as I'm not the maintainer (Ayal is).

Another problem with the handling of debug insns in SMS is that
debug_insns instructions are been ignored while scheduling which means
that they do not delimit the scheduling window of the real instructions
they depend on. This could lead to a scenario where the dependencies
between real instruction and debug_insn could be violated as we leave
the debug_insns in their original place after scheduling. I'll try to send
you
an example of this problem as well.

Example of code gen difference when using -g and without it which this
patch tries to solve:
The following nodes are all belong to the same SCC running with -g.
node 1 is not present in this SCC when not using -g.
(the nodes marked in [] are the one that do not
exist with your patch and prevent from node 1 to be added to the SCC
when compiling with -g)

 node 0
Debug_insn  (ior(MEM(ivtmp.24),MEM(ivtmp.46))
in edges: 1-0, 2-0
out edges: [0-1], [0-4], [0-2]


node 1
Reg 220 = MEM (pre_inc (ivtmp.24))
in edges: [0-1], 1-1
out edges: 1-0, 1-1, 1-3

node 2
Reg 221= MEM (pre_inc (ivtmp.46))
in edges: [0-2], 4-2, 2-2
out edges: 2-0, 2-3, 2-4, 2-2

node 3
Reg 222= ior (220,221)
in edges: 2-3, 1-3
out edges: 3-4

node 4
MEM[pre_inc(196)] = 222
in edges: 3-4, 4-4, [0-4], 2-4
out edges: 4-4, 4-2

Thanks,
Revital



[PATCH, SMS] Support closing_branch_deps

2011-03-07 Thread Revital1 Eres

Hello,

The attached patch includes enhancements for SMS to support targets
that their doloop part is not decoupled from the rest of the loop's
instructions, as SMS currently requires. (ARM is an example for such
target, where the loop's instructions might use CC which is used in the
doloop part)

The patch uses already existing closing_branch_deps field in the DDG to
indicate whether the closing branch should be scheduled with the rest
of the loop instructions to preserve the dependencies between them.
In the current implementation, closing_branch_deps is always false which
enables to leave the closing branch outside of the scheduling process and
place it at ii-1 row at the end of the process.  This patch extends the
implementation by identifying loops where closing_branch_deps  should be
true and scheduling the closing branch with the rest of the instructions
for such loops, rotating the branch to be in the ii-1 row at the end of
the scheduling process.

The patch also enables the scheduling when debug_insn is generated for
instructions in the do-loop part. To enable this, a small fix was inserted
on top of the already approved do-loop patch (not in mainline yet,
http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c
file to consider prev_nondebug_insn instead of PREV_INSN as follows:
(the diff is against the patch in the above link and thus it is not
included in the attached file, however it is part of the tested patch
and thus I appreciate an approval for this part also)


@@ -610,7 +614,7 @@
 +inc = XVECEXP (PATTERN (prev_insn), 0, 1);
 +  }
 +  else
-+inc = PATTERN (PREV_INSN (doloop_pat));
++inc = PATTERN (prev_insn);
/* We expect the condition to be of the form (reg != 0)  */
cond = XEXP (SET_SRC (cmp), 0);
if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)

The attached patch was tested with the patch to support do-loop for ARM
(http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) including the
fix mentioned above; as following:
On ppc64-redhat-linux regtest as well as bootstrap with flags(*);
enabling SMS also on loops with stage count 1.
Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language
with flags (*)  configured  w and w\o --with-arch=armv7-a; and w and w\o
--with-mode=thumb
(four different configurations in total) enabling SMS also on loops with
stage count 1.

(*) flags used for bootstrap: -O2 -fmodulo-sched
-fmodulo-sched-allow-regmoves
-fno-auto-inc-dec -funsafe-math-optimizations

OK for trunk once stage 1 will be open?

Thanks,
Revital

ChangeLog:

   * ddg.c (check_closing_branch_deps, get_node_of_insn_uid):
New functions.
(create_ddg): Pass sbitmap containing do-loop related
instructions instead of closing_branch_deps parameter and call
check_closing_branch_deps function.
* ddg.h (create_ddg): Adjust the function declaration.
* modulo-sched.c (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT
and redefine.
(doloop_register_get): Handle NONDEBUG_INSN_P.
(stage_count): New field in struct partial_schedule.
(mark_doloop_insns, calculate_stage_count): New functions.
(normalize_sched_times): Rename to reset_sched_times and handle
incrementing the sched time of the nodes by a constant value
passed as parameter.
(duplicate_insns_of_cycles): Skip closing branch.
(sms_schedule_by_order): Schedule closing branch when
closing_branch_deps is true.
(ps_insn_find_column): Handle closing branch.
(sms_schedule): Call reset_sched_times and handle case where
do-loop pattern is not decoupled from the other loop instructions.
(ps_insert_empty_row): Update calls to normalize_sched_times
and rotate_partial_schedule functions.
   * loop-doloop.c (doloop_condition_get): Use prev_nondebug_insn
instead of PREV_INSN.


testsuite Changlog:

* gcc.target/arm/sms-9.c: New file.
* gcc.target/arm/sms-10.c: New file.

(See attached file: patch_doloop_fix_7_3_new.txt)

(See attached file: sms-10.c)(See attached file: sms-9.c)Index: ddg.c
===
--- ddg.c   (revision 170464)
+++ ddg.c   (working copy)
@@ -60,6 +60,8 @@ static void create_ddg_dep_no_link (ddg_
 static ddg_edge_ptr create_ddg_edge (ddg_node_ptr, ddg_node_ptr, dep_type,
 dep_data_type, int, int);
 static void add_edge_to_ddg (ddg_ptr g, ddg_edge_ptr);
+static ddg_node_ptr get_node_of_insn_uid (ddg_ptr, int);
+
 
 /* Auxiliary variable for mem_read_insn_p/mem_write_insn_p.  */
 static bool mem_ref_p;
@@ -488,12 +490,65 @@ build_intra_loop_deps (ddg_ptr g)
   sched_free_deps (head, tail, false);
 }
 
+/* Given DOLOOP_INSNS which holds the instructions that
+   belong to the do-loop part; mark closing_branch_deps field in ddg G
+   as TRUE if the