from:"Andrey Belevantsev"

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-30 Thread Andrey Belevantsev via Gcc-patches

Hi Martin,

On 29.06.2021 13:09, Martin Liška wrote:
> On 6/28/21 5:33 PM, Joseph Myers wrote:
>> Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
>> version also available for review?
>
> I've just uploaded them here:
> https://splichal.eu/gccsphinx-final/
>
> Martin

I've randomly looked at the PDF version of the GCC internals manual and the
table of contents there only has an introduction and an index (looks like
all other chapters went under introduction).  Other PDFs have the first
level chapters in the contents.  Maybe it will be a good idea to include
the lower level chapters as well, or at least to fix the gccint one :)

Best,
Andrey

Fix PR 86979

2019-03-15 Thread Andrey Belevantsev

Hello,

As explained in the PR trail, we incorrectly update the availability sets
in the rare case of several successors and one of them having another
fence.  Fixed as follows.  Ok for trunk?

Best,
Andrey

2019-03-15  Andrey Belevantsev  

PR middle-end/89676
* sel-sched.c (compute_av_set_at_bb_end): When we have an ineligible
successor,
use NULL as its av set.
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 315f2c0c0ab..2053694b196 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -2820,10 +2820,12 @@ compute_av_set_at_bb_end (insn_t insn, ilist_t p, int 
ws)
 FOR_EACH_VEC_ELT (sinfo->succs_ok, is, succ)
   {
 basic_block succ_bb = BLOCK_FOR_INSN (succ);
+   av_set_t av_succ = (is_ineligible_successor (succ, p)
+   ? NULL
+   : BB_AV_SET (succ_bb));
 
 gcc_assert (BB_LV_SET_VALID_P (succ_bb));
-mark_unavailable_targets (av1, BB_AV_SET (succ_bb),
-  BB_LV_SET (succ_bb));
+mark_unavailable_targets (av1, av_succ, BB_LV_SET (succ_bb));
   }
 
   /* Finally, check liveness restrictions on paths leaving the region.  */

Re: [PATCH] sched-ebb.c: avoid moving table jumps (PR rtl-optimization/88423)

2019-01-23 Thread Andrey Belevantsev

Hi David,

On 18.01.2019 19:58, David Malcolm wrote:
> On Fri, 2019-01-18 at 12:32 -0500, David Malcolm wrote:
> 
> [CCing Abel]
> 
>> PR rtl-optimization/88423 reports an ICE within sched-ebb.c's
>> begin_move_insn, failing the assertion at line 175, where there's
>> no fall-through edge:
>>
>> 171 rtx_insn *x = NEXT_INSN (insn);
>> 172 if (e)
>> 173   gcc_checking_assert (NOTE_P (x) || LABEL_P (x));
>> 174 else
>> 175   gcc_checking_assert (BARRIER_P (x));
>>
>> "insn" is a jump_insn for a table jump, and its NEXT_INSN is the
>> placeholder code_label, followed by the jump_table_data.

This code was added at the time of our work for the speculation support in
ia64.  I would guess it was just never designed to support tablejumps,
rather the jumps to recovery blocks or some such.  But I might be wrong, it
was back in 2006.  If you look at fix_jump_move, which was added about that
time, it doesn't have any traces of tablejumps as well.

>>
>> It's not clear to me if such a jump_insn can be repositioned within
>> the insn stream, or if the scheduler is able to do so.  I believe a
>> tablejump is always at the end of such a head/tail insn sub-stream.
>> Is it a requirement that the placeholder code_label for the jump_insn
>> is always its NEXT_INSN?
>>
>> The loop at the start of schedule_ebb adjusts the head and tail
>> of the insns to be scheduled so that it skips leading and trailing
>> notes
>> and debug insns.
>>
>> This patch adjusts that loop to also skip trailing jump_insn
>> instances
>> that are table jumps, so that we don't attempt to move such table
>> jumps.
>>
>> This fixes the ICE, but I'm not very familiar with this part of the
>> compiler - so I have two questions:
>>
>> (1) What does GCC mean by "ebb" in this context?
>>
>> My understanding is that the normal definition of an "extended basic
>> block" (e.g. Muchnick's book pp175-177) is that it's a maximal
>> grouping
>> of basic blocks where only one BB in each group has multiple in-edges
>> and all other BBs in the group have a single in-edge (and thus e.g.
>> there's a tree-like structure of BBs within each EBB).
>>
>> From what I can tell, schedule_ebbs is iterating over BBs, looking
>> for
>> runs of BBs joined by next_bb that are connected by fallthrough edges
>> and don't have labels (and aren't flagged with BB_DISABLE_SCHEDULE).
>> It uses this run of BBs to generate a run of instructions within the
>> NEXT_INSN/PREV_INSN doubly-linked list, which it passes as "head"
>> and "tail" to schedule_ebb.
>>
>> This sounds like it will be a group of basic blocks with single in-
>> edges
>> internally, but it isn't a *maximal* group of such BBs - but perhaps
>> it's "maximal" in the sense of what the NEXT_INSN/PREV_INSN
>> representation can cope with?

You are right, it's not a tree structure in the sense of classical EBBs but
rather a trace.  There was a code to also perform tail duplication in order
to have better traces for scheduling, but the corresponding option got
deprecated back in 2010.

>>
>> There (presumably) can't be a fallthrough edge after a table jump, so
>> a table jump could only ever be at the end of such a chain, never in
>> the
>> middle.
>>
>> (2) Is it OK to omit "tail" from consideration here, from a dataflow
>> and insn-dependency point-of-view?  Presumably the scheduler is
>> written
>> to ensure that data used by subsequent basic blocks will still be
>> available
>> after the insns within an "EBB" are reordered, so presumably any data
>> uses *within* the jump_insn are still going to be available - but, as
>> I
>> said, I'm not very familiar with this part of the code.  (I think I'm
>> also
>> assuming that the jump_insn can't clobber data, just the PC)

For that, I'm not sure.  Your patch will leave the tablejump unscheduled at
all, i.e. any fields like INSN_TICK would be unfilled and thus the later
passes like bundling on ia64 will not work.  Maybe we can just stop
tablejumps from moving within its block, Alexander?

Andrey

>>
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>>
>> OK for trunk?
>>
>> gcc/ChangeLog:
>>  PR rtl-optimization/88423
>>  * sched-ebb.c (schedule_ebb): Don't move the jump_insn for a
>> table
>>  jump.
>>
>> gcc/testsuite/ChangeLog:
>>  PR rtl-optimization/88423
>>  * gcc.c-torture/compile/pr88423.c: New test.
>> ---
>>  gcc/sched-ebb.c   | 4 
>>  gcc/testsuite/gcc.c-torture/compile/pr88423.c | 5 +
>>  2 files changed, 9 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr88423.c
>>
>> diff --git a/gcc/sched-ebb.c b/gcc/sched-ebb.c
>> index d459e09..1fe0b76 100644
>> --- a/gcc/sched-ebb.c
>> +++ b/gcc/sched-ebb.c
>> @@ -485,6 +485,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail,
>> bool modulo_scheduling)
>>  tail = PREV_INSN (tail);
>>else if (LABEL_P (head))
>>  head = NEXT_INSN (head);
>> +  else if (tablejump_p (tail, NULL,

Fix PR 85423 (Re: Fix PRs 80463, 83972, 83480)

2018-04-23 Thread Andrey Belevantsev

Hello,

So this PR shows that I have incorrectly mirrored the conditional from
sched-deps.c that creates the dependence from a debug insn on the previous
insn (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463#c3 for the
hunk).  Thus we have incorrectly discarded some legitimate debug-debug
dependencies.  The straightforward fix works for all of four PRs, tested on
x86-64.

I have put the test in gcc.dg though it requires -march=nano.  Do you want
me to create an extra machine-dependent test?

Best,
Andrey

2018-04-23  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/85423

* sel-sched-ir.c (has_dependence_note_mem_dep): Only discard
dependencies to debug insns when previous insn is non-debug.

* gcc.dg/pr85423.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index ee970522890..85ff5bd3eb4 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3308,7 +3308,7 @@ has_dependence_note_dep (insn_t pro, ds_t ds 
ATTRIBUTE_UNUSED)
  that a bookkeeping copy should be movable as the original insn.
  Detect that here and allow that movement if we allowed it before
  in the first place.  */
-  if (DEBUG_INSN_P (real_con)
+  if (DEBUG_INSN_P (real_con) && !DEBUG_INSN_P (real_pro)
   && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con))
 return;
 
diff --git a/gcc/testsuite/gcc.dg/pr85423.c b/gcc/testsuite/gcc.dg/pr85423.c
new file mode 100644
index 000..21d4a2eb4b9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr85423.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments 
-fno-guess-branch-probability -fno-peephole2 -fno-ssa-phiopt -fno-tree-pre 
--param max-jump-thread-duplication-stmts=8 -w" } */
+
+int vn, xm;
+
+void
+i1 (int);
+
+void
+mb (int *ap, int ev)
+{
+  while (vn < 1)
+{
+  i1 (vn);
+
+  ev += *ap && ++vn;
+
+  while (xm < 1)
+++xm;
+
+  if (*ap == 0)
+*ap = ev;
+
+  ++vn;
+}
+}

Add test from PR 83852 (was Re: Fix PR 83962)

2018-04-13 Thread Andrey Belevantsev

On 09.04.2018 12:16, Andrey Belevantsev wrote:
> On 06.04.2018 18:59, Alexander Monakov wrote:
>> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
>>
>>> Hello,
>>>
>>> This issues is about the correct order in which we need to call the
>>> routines that fix up the control flow for us.
>>
>> OK with formatting both in the new comment and the Changelog fixed.
> 
> Thanks, fixed that in rev. 259229.

I've found out that this patch also fixes PR 83852, so I've committed the
test from that PR as obvious after verifying that it works on cross-ppc
compiler and on x86-64.

Andrey
Index: gcc.dg/pr83852.c
===
*** gcc.dg/pr83852.c(revision 0)
--- gcc.dg/pr83852.c(revision 259373)
***
*** 0 
--- 1,33 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-std=gnu99 -O2 -fselective-scheduling -fno-if-conversion 
-fno-tree-dse -w" } */
+ long long int uo;
+ unsigned int vt;
+ 
+ void
+ r5 (long long int h8, long long int pu)
+ {
+   short int wj;
+   long long int *mh = h8;
+ 
+   for (wj = 0; wj < 3; ++wj)
+ {
+   int oq;
+   long long int ns, xf;
+ 
+   h8 += 2;
+   oq = !!h8 && !!wj;
+   ++uo;
+   vt ^= oq + uo;
+   ns = !!uo && !!vt;
+   xf = (h8 != 0) ? mh : 1;
+   pu += ns < xf;
+ }
+ 
+   for (pu = 0; pu < 1; ++pu)
+ {
+   int *sc;
+ 
+   sc = (int *)
+   *sc = 0;
+ }
+ }
Index: ChangeLog
===
*** ChangeLog   (revision 259372)
--- ChangeLog   (revision 259373)
***
*** 1,3 
--- 1,8 
+ 2018-04-13  Andrey Belevantsev  <a...@ispras.ru>
+ 
+   PR rtl-optimization/83852
+   * gcc.dg/pr83852.c: New testcase.
+ 
  2018-04-13  Andreas Krebbel  <kreb...@linux.ibm.com>
  
  PR testsuite/85326

Re: [PATCH] sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)

2018-04-12 Thread Andrey Belevantsev

On 12.04.2018 0:55, Alexander Monakov wrote:
> As noted in PR 85354, we cannot simply invoke cfg_cleanup after dominators are
> computed, because they may become invalid but neither freed nor recomputed, so
> this trips checking in flow_loops_find.
> 
> We can move cleanup_cfg earlier (and run it for all sel-sched invocations, not
> only when pipelining).

OK.  Sorry, I should have noticed that before, and our ia64 tester also
misses libraries required for graphite.

Best,
Andrey

> 
> Bootstrapped/regtested on x86_64 and ppc64 (my previous testing missed this
> issue: the testcase requires graphite, but libisl wasn't present).
> 
>   PR rtl-optimization/85354
>   * sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call...
>   * sel-sched.c (sel_global_init): ... here.
> 
> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
> index 50a7daafba6..ee970522890 100644
> --- a/gcc/sel-sched-ir.c
> +++ b/gcc/sel-sched-ir.c
> @@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgrtl.h"
>  #include "cfganal.h"
>  #include "cfgbuild.h"
> -#include "cfgcleanup.h"
>  #include "insn-config.h"
>  #include "insn-attr.h"
>  #include "recog.h"
> @@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop)
>  void
>  sel_init_pipelining (void)
>  {
> -  /* Remove empty blocks: their presence can break assumptions elsewhere,
> - e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
> -  cleanup_cfg (0);
>/* Collect loop information to be used in outer loops pipelining.  */
>loop_optimizer_init (LOOPS_HAVE_PREHEADERS
> | LOOPS_HAVE_FALLTHRU_PREHEADERS
> diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
> index cd29df35666..59762964c6e 100644
> --- a/gcc/sel-sched.c
> +++ b/gcc/sel-sched.c
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "regs.h"
>  #include "cfgbuild.h"
> +#include "cfgcleanup.h"
>  #include "insn-config.h"
>  #include "insn-attr.h"
>  #include "params.h"
> @@ -7661,6 +7662,10 @@ sel_sched_region (int rgn)
>  static void
>  sel_global_init (void)
>  {
> +  /* Remove empty blocks: their presence can break assumptions elsewhere,
> + e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
> +  cleanup_cfg (0);
> +
>calculate_dominance_info (CDI_DOMINATORS);
>alloc_sched_pools ();
>

Re: [PATCH] sched-deps: respect deps->readonly in macro-fusion (PR 84566)

2018-04-11 Thread Andrey Belevantsev

On 10.04.2018 13:40, Alexander Monakov wrote:
> Hi,
> 
> this fixes a simple "regression" under the qsort_chk umbrella: sched-deps
> analysis has deps->readonly flag, but macro-fusion code does not respect it
> and mutates instructions. This breaks an assumption in sel_rank_for_schedule
> and manifests as qsort checking error.
> 
> Since sched_macro_fuse_insns is only called to set SCHED_GROUP_P on suitable
> insns, guard the call with !deps->readonly.
> 
> Bootstrapped/regtested on x86_64 with sel-sched active and
> --with-cpu=sandybridge to exercise macro-fusion code and verified on aarch64
> cross-compiler that the failing testcase given in the PR is fixed.
> 
> OK to apply?

Fine with me but you need a scheduler maintainer approval.

Andrey

> 
> Thanks.
> Alexander
> 
>   PR rtl-optimization/84566
>   * sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking
>   sched_macro_fuse_insns.
> 
> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index 9a5cbebea40..120b5f0ddc1 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -2897,7 +2897,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, 
> rtx_insn *insn)
>&& code == SET);
>  
>/* Group compare and branch insns for macro-fusion.  */
> -  if (targetm.sched.macro_fusion_p
> +  if (!deps->readonly
> +  && targetm.sched.macro_fusion_p
>&& targetm.sched.macro_fusion_p ())
>  sched_macro_fuse_insns (insn);
>  
>

Re: [PATCH] sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)

2018-04-11 Thread Andrey Belevantsev

On 10.04.2018 14:09, Alexander Monakov wrote:
> Hi,
> 
> The add_branch_dependencies function is fairly unusual in that it creates
> dependence edges "out of thin air" for all sorts of instructions preceding
> BB end. I think that is really unfortunate (explicit barriers in RTL would
> be more natural), but I've already complained about that in the PR.
> 
> The bug/regression is that this function was not run for sel-sched, but the
> testcase uncovers that moving a USE away from the return insn can break
> assumptions in mode-switching code.
> 
> Solve this by running the first part of add_branch_dependencies where it
> sets CANT_MOVE flags on immovable non-branch insns.
> 
> Bootstrapped/regtested on x86_64 with sel-sched active. OK to apply?

Looks fine to me but I cannot approve -- maybe Vladimir can take a look?

Andrey

> 
> Alexander
> 
> 
>   PR target/84301
>   * sched-rgn.c (add_branch_dependences): Move sel_sched_p check here...
>   (compute_block_dependences): ... from here.
> 
>   * gcc.target/i386/pr84301.c: New test.
> 
> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
> index 8c3a740b70e..3c67fccb9b1 100644
> --- a/gcc/sched-rgn.c
> +++ b/gcc/sched-rgn.c
> @@ -2497,6 +2497,11 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail)
>while (insn != head && DEBUG_INSN_P (insn));
>  }
>  
> +  /* Selective scheduling handles control dependencies by itself, and
> + CANT_MOVE flags ensure that other insns will be kept in place.  */
> +  if (sel_sched_p ())
> +return;
> +
>/* Make sure these insns are scheduled last in their block.  */
>insn = last;
>if (insn != 0)
> @@ -2725,9 +2730,7 @@ compute_block_dependences (int bb)
>  
>sched_analyze (_deps, head, tail);
>  
> -  /* Selective scheduling handles control dependencies by itself.  */
> -  if (!sel_sched_p ())
> -add_branch_dependences (head, tail);
> +  add_branch_dependences (head, tail);
>  
>if (current_nr_blocks > 1)
>  propagate_deps (bb, _deps);
> diff --git a/gcc/testsuite/gcc.target/i386/pr84301.c 
> b/gcc/testsuite/gcc.target/i386/pr84301.c
> index e69de29bb2d..f1708b8ea6c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr84301.c
> +++ b/gcc/testsuite/gcc.target/i386/pr84301.c
> @@ -0,0 +1,15 @@
> +/* PR target/84301 */
> +/* { dg-do compile } */
> +/* { dg-options "-march=bdver1 -O1 -fexpensive-optimizations 
> -fschedule-insns -fselective-scheduling -fno-dce -fno-tree-dce --param 
> max-pending-list-length=0 --param selsched-max-lookahead=2" } */
> +
> +int lr;
> +long int xl;
> +
> +int
> +v4 (void)
> +{
> +  int mp;
> +
> +  ++xl;
> +  mp = (lr - xl) > 1;
> +}
>

Re: [PATCH] sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)

2018-04-10 Thread Andrey Belevantsev

Hello,

On 10.04.2018 17:40, Alexander Monakov wrote:
> Hi,
> 
> We have this code in sel-sched.c sel_region_init():
> 
> 6918   /* Init correct liveness sets on each instruction of a single-block 
> loop.
> 6919  This is the only situation when we can't update liveness when 
> calling
> 6920  compute_live for the first insn of the loop.  */
> 6921   if (current_loop_nest)
> 6922 {
> 6923   int header =
> 6924 (sel_is_loop_preheader_p (BASIC_BLOCK_FOR_FN (cfun, BB_TO_BLOCK 
> (0)))
> 6925  ? 1
> 6926  : 0);
> 6927
> 6928   if (current_nr_blocks == header + 1)
> 6929 update_liveness_on_insn
> 6930   (sel_bb_head (BASIC_BLOCK_FOR_FN (cfun, BB_TO_BLOCK 
> (header;
> 6931 }
> 
> It appears it does not account for presence of empty BBs between the preheader
> and the "actual header" BB hosting the first insn of the loop. The testcase in
> the PR provides an example where we don't run update_liveness_on_insn here, 
> but
> then remove the empty BB and recurse endlessly in compute_live_after_bb.
> 
> This patch solves this in a brute-but-straightforward manner by invoking
> cleanup_cfg just before loop_optimizer_init.
> 
> Bootstrapped/regtested on x86_64 together with the other two patches and also
> checked on a powerpc cross-compiler that the testcase is fixed. OK to apply?

OK.  The comment before cleanup_cfg will not hurt but do as you wish.

Thank you,
Andrey

> 
> Thanks.
> Alexander
> 
>   PR rtl-optimization/85659
>   * sel-sched-ir.c (sel_init_pipelining): Invoke cleanup_cfg.
> 
>   * gcc.dg/pr84659.c: New test.
> 
> 
> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
> index ee970522890..aef380a7e80 100644
> --- a/gcc/sel-sched-ir.c
> +++ b/gcc/sel-sched-ir.c
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgrtl.h"
>  #include "cfganal.h"
>  #include "cfgbuild.h"
> +#include "cfgcleanup.h"
>  #include "insn-config.h"
>  #include "insn-attr.h"
>  #include "recog.h"
> @@ -6121,6 +6122,7 @@ make_regions_from_loop_nest (struct loop *loop)
>  void
>  sel_init_pipelining (void)
>  {
> +  cleanup_cfg (0);
>/* Collect loop information to be used in outer loops pipelining.  */
>loop_optimizer_init (LOOPS_HAVE_PREHEADERS
> | LOOPS_HAVE_FALLTHRU_PREHEADERS
> diff --git a/gcc/testsuite/gcc.dg/pr84659.c b/gcc/testsuite/gcc.dg/pr84659.c
> index e69de29bb2d..94c885f3869 100644
> --- a/gcc/testsuite/gcc.dg/pr84659.c
> +++ b/gcc/testsuite/gcc.dg/pr84659.c
> @@ -0,0 +1,19 @@
> +/* PR rtl-optimization/84659 */
> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-O2 -fselective-scheduling -fsel-sched-pipelining 
> -fno-split-wide-types -fno-strict-aliasing -fno-tree-dce" } */
> +
> +void
> +jk (int **lq, int *k4, long long int qv, int od)
> +{
> +  while (**lq < 1)
> +{
> +  int uo;
> +
> +  uo = ((od == 0) ? qv : *k4) != 1;
> +  ++**lq;
> +}
> +
> +  for (;;)
> +{
> +}
> +}
>

Re: Fix PRs 80463, 83972, 83480

2018-04-10 Thread Andrey Belevantsev

On 09.04.2018 21:23, Jakub Jelinek wrote:
> On Mon, Apr 09, 2018 at 01:30:12PM +0300, Andrey Belevantsev wrote:
>> I think that should be fine, that is, as long as the insn moved up through
>> all those debug insns, the copy will do that as well.  It's that
>> problematic conditional in sched-deps.c that we should take care of.
>>
>> I've reworded the comment and committed the attached patch.  Thanks for
>> your help.
> 
> The C++ testcase FAILs everywhere:
> FAIL: g++.dg/pr80463.C  -std=gnu++98 (test for excess errors)
> Excess errors:
> cc1plus: warning: var-tracking-assignments changes selective scheduling
> 
> The other testcases in the patch used -w probably to disable the same
> warning, so I've committed following as obvious to trunk after regtesting it
> on x86_64-linux and i686-linux:

Thank you very much, I have missed this when committing.  Sorry for the noise.

Andrey

> 
> 2018-04-09  Jakub Jelinek  <ja...@redhat.com>
> 
>   PR rtl-optimization/80463
>   * g++.dg/pr80463.C: Add -w to dg-options.
> 
> --- gcc/testsuite/g++.dg/pr80463.C.jj 2018-04-09 20:15:47.226631780 +0200
> +++ gcc/testsuite/g++.dg/pr80463.C2018-04-09 20:19:43.783616136 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
> -/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments" 
> } */
> +/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments 
> -w" } */
>  
>  int *a;
>  int b, c;
> 
> 
>   Jakub
>

Re: Fix PRs 80463, 83972, 83480

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 19:18, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> In these PRs we deal with the dependencies we are forced to make between a
>> debug insn and its previous insn (unless bb head).  In the latter case, if
>> such an insn has been moved, we fixup its movement so it aligns with the
>> sel-sched invariants.  We also carefully adjust seqnos in the case we had a
>> single debug insn left in the block.
> 
> This is OK with overlong lines fixed and the new comment reworded for clarity
> (see below).
> 
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  <a...@ispras.ru>
>>
>>  PR rtl-optimization/80463
>>  PR rtl-optimization/83972
>>  PR rtl-optimization/83480
>>
>>  * sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the
>> correct producer for the insn.
>>  (tidy_control_flow): Fixup seqnos in case of debug insns.
>>
>>  * gcc.dg/pr80463.c: New test.
>>  * g++.dg/pr80463.C: Likewise.
>>  * gcc.dg/pr83972.c: Likewise.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem 
>> ATTRIBUTE_UNUSED,
>>
>>  /* Note a dependence.  */
>>  static void
>> -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED,
>> +has_dependence_note_dep (insn_t pro,
>>  ds_t ds ATTRIBUTE_UNUSED)
>>  {
>> -  if (!sched_insns_conditions_mutex_p (has_dependence_data.pro,
>> -  VINSN_INSN_RTX
>> (has_dependence_data.con)))
>> +  insn_t real_pro = has_dependence_data.pro;
>> +  insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con);
>> +
>> +  /* We do not allow for debug insns to move through others unless they
>> + are at the start of bb.  Such insns may create bookkeeping copies
>> + that would not be able to move up breaking sel-sched invariants.
> 
> I have trouble parsing this, it seems a word is accidentally between "move up"
> and "breaking". Also the "such" is a bit ambiguous.
> 
>> + Detect that here and allow that movement if we allowed it before
>> + in the first place.  */
>> +  if (DEBUG_INSN_P (real_con)
>> +  && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con))
>> +return;
> 
> Should we be concerned about debug insns appearing in sequence here? E.g. if
> pro and real_con are not consecutive, but all insns in between are debug 
> insns?

I think that should be fine, that is, as long as the insn moved up through
all those debug insns, the copy will do that as well.  It's that
problematic conditional in sched-deps.c that we should take care of.

I've reworded the comment and committed the attached patch.  Thanks for
your help.

Andrey

> 
>> +
>> +  if (!sched_insns_conditions_mutex_p (real_pro, real_con))
>>  {
>>ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where];
>>
>> @@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying)
>>
>>gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU);
>>
>> +  /* We could have skipped some debug insns which did not get removed
>> with the block,
>> + and the seqnos could become incorrect.  Fix them up here.  */
>> +  if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first ||
>> sel_bb_end (xbb) != last))
>> +   {
>> + if (!sel_bb_empty_p (xbb->prev_bb))
>> +   {
>> + int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb));
>> + if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb)))
>> +   for (insn_t insn = sel_bb_head (xbb); insn != first; insn =
>> NEXT_INSN (insn))
>> + INSN_SEQNO (insn) = prev_seqno + 1;
>> +   }
>> +   }
>> +
>>/* It can turn out that after removing unused jump, basic block
>>   that contained that jump, becomes empty too.  In such case
>>   remove it too.  */
>> diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C
>> new file mode 100644
>> index 000..5614c28ca45
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/pr80463.C
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-g -fselective-scheduling

Re: Fix PR 83913

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 19:10, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issue ended up being fixed the way different from described in the PR.
>>  We do not want to walk away from the invariant "zero SCHED_TIMES -- insn
>> is not scheduled" even for bookkeeping copies (testing showed it trips over
>> asserts designed to catch this).  Rather we choose merging exprs in the way
>> the larger sched-times wins.
> 
> My understanding is this is not a "complete" solution to the problem, and a
> chance for a similar blowup on some other testcase remains. Still, avoiding
> picking the minimum sched-times value should be a good mitigation.

Well, it's not much different with any other situation when we pose a limit
on pipelining with the sched-times values.  At least for now I can't think
of something better.

Adjusted the comment as per your suggestion and committed the attached patch.

Andrey

> 
> Please consider adding a comment that the average sched-times value is taken
> as a compromise to thwart "endless" pipelining of bookkeeping-producing insns
> available anywhere vs. pipelining of useful insns, or something like that?
> 
> OK with that considered/added.
> 
>>
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  <a...@ispras.ru>
>>
>>  PR rtl-optimization/83913
>>
>>  * sel-sched-ir.c (merge_expr_data): Choose the middle between two
>> different sched-times
>>  when merging exprs.
>>
>>  * gcc.dg/pr83913.c: New test.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t
>> split_point)
>>if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from))
>>  EXPR_PRIORITY (to) = EXPR_PRIORITY (from);
>>
>> -  if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from))
>> -EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from);
>> +  if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from))
>> +EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES 
>> (to)
>> + + 1) / 2);
>>
>>if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from))
>>  EXPR_ORIG_BB_INDEX (to) = 0;
>> @@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem 
>> ATTRIBUTE_UNUSED,
>>
>>  /* Note a dependence.  */
>>  static void
>> diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c
>> new file mode 100644
>> index 000..c898d71a261
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr83913.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling
>> -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate
>> -fno-rerun-cse-after-loop -fno-web" } */
>> +
>> +int jo, z4;
>> +
>> +int
>> +be (long unsigned int l7, int nt)
>> +{
>> +  int en;
>> +
>> +  jo = l7;
>> +  for (en = 0; en < 24; ++en)
>> +{
>> +  jo = (jo / z4) * (!!jo >= ((!!nt) & 2));
>> +  if (jo == 0)
>> +nt = 0;
>> +  else
>> +{
>> +  nt = z4;
>> +  ++z4;
>> +  nt = (long unsigned int) nt == (l7 + 1);
>> +}
>> +}
>> +
>> +  return nt;
>> +}
>>
>>
>>

Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 259229)
--- gcc/ChangeLog   (revision 259230)
***
*** 1,5 
--- 1,12 
  2018-04-09  Andrey Belevantsev  <a...@ispras.ru>
  
+   PR rtl-optimization/83913
+ 
+   * sel-sched-ir.c (merge_expr_data): Choose the middle between two
+   different sched-times when merging exprs.
+ 
+ 2018-04-09  Andrey Belevantsev  <a...@ispras.ru>
+ 
PR rtl-optimization/83962
  
* sel-sched-ir.c (tidy_control_flow): Correct the order in which we call
Index: gcc/testsuite/gcc.dg/pr83913.c
===
*** gcc/testsuite/gcc.dg/pr83913.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr83913.c  (revision 259230)
***
*** 0 
--- 1,26 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O2 -funroll-all-loops -fselective-scheduling 
-fsel-sched-pipelining -fschedule-insns -fno-dce -fno-for

Re: Fix PR 83962

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 18:59, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issues is about the correct order in which we need to call the
>> routines that fix up the control flow for us.
> 
> OK with formatting both in the new comment and the Changelog fixed.

Thanks, fixed that in rev. 259229.

Andrey

> 
>> Best,
>> Andrey
>>
>> 2018-04-03  Andrey Belevantsev  <a...@ispras.ru>
>>
>>  PR rtl-optimization/83962
>>
>>  * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call
>> tidy_fallthru_edge
>>  and tidy_control_flow.
>>
>>  * gcc.dg/pr83962.c: New test.
>>
>> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
>> index a965d2ec42f..f6de96a7f3d 100644
>> --- a/gcc/sel-sched-ir.c
>> +++ b/gcc/sel-sched-ir.c
>> @@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying)
>>&& INSN_SCHED_TIMES (BB_END (xbb)) == 0
>>&& !IN_CURRENT_FENCE_P (BB_END (xbb)))
>>  {
>> -  if (sel_remove_insn (BB_END (xbb), false, false))
>> -return true;
>> +  /* We used to call sel_remove_insn here that can trigger
>> tidy_control_flow
>> + before we fix up the fallthru edge.  Correct that ordering by
>> +explicitly doing the latter before the former.  */
>> +  clear_expr (INSN_EXPR (BB_END (xbb)));
>>tidy_fallthru_edge (EDGE_SUCC (xbb, 0));
>> +  if (tidy_control_flow (xbb, false))
>> +   return true;
>>  }
>>
>>first = sel_bb_head (xbb);
>> diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c
>> new file mode 100644
>> index 000..0547e218715
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr83962.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
>> +/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2
>> -fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */
>> +unsigned int ca;
>> +
>> +void
>> +v6 (long long unsigned int as, int p9)
>> +{
>> +  while (p9 < 1)
>> +as = (as != ca) || (as > 1);
>> +}
>>

Re: Fix PR 83530

2018-04-09 Thread Andrey Belevantsev

On 06.04.2018 18:55, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issue is when we cannot correctly make insn tick data for the
>> unscheduled code (but processed by the modulo scheduler).  Fixed by closely
>> following usual scheduling process as described in the PR.
> 
> This is ok with the following nit-picks fixed.

Thank, I've committed the attached.

Andrey

> 
>> 2018-04-03  Andrey Belevantsev  <a...@ispras.ru>
>>
>>  PR rtl-optimization/83530
>>
>>  * sel-sched.c (force_next_insn): New global variable.
>>  (remove_insn_for_debug): When force_next_insn is true, also leave only
>> next insn
>>  in the ready list.
>>  (sel_sched_region): When the region wasn't scheduled, make another pass
>> over it
>>  with force_next_insn set to 1.
> 
> Overlong lines.
> 
>>  * gcc.dg/pr8350.c: New test.
> 
> Typo in test name.
>  
>> --- a/gcc/sel-sched.c
>> +++ b/gcc/sel-sched.c
>> @@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying)
>> distinguishing between bookkeeping copies and original insns.  */
>>  static int max_uid_before_move_op = 0;
>>
>> +/* When true, we're always scheduling next insn on the already scheduled 
>> code
>> +   to get the right insn data for the following bundling or other passes.  
>> */
>> +static int force_next_insn = 0;
>> +
>>  /* Remove from AV_VLIW_P all instructions but next when debug counter
>> tells us so.  Next instruction is fetched from BNDS.  */
>>  static void
>>  remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p)
>>  {
>> -  if (! dbg_cnt (sel_sched_insn_cnt))
>> +  if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn)
>>  /* Leave only the next insn in av_vliw.  */
>>  {
>>av_set_iterator av_it;
>> @@ -7642,7 +7646,13 @@ sel_sched_region (int rgn)
>>  sel_sched_region_1 ();
>>else
>>  /* Force initialization of INSN_SCHED_CYCLEs for correct bundling.  */
> 
> I believe this comment needs updating.
> 
> Please also consider moving both assignments of reset_sched_cycles_p to
> after the if-else statement, just before sel_region_finish call.
> 
>> -reset_sched_cycles_p = true;
>> +{
>> +  reset_sched_cycles_p = false;
>> +  pipelining_p = false;
>> +  force_next_insn = 1;
>> +  sel_sched_region_1 ();
>> +  force_next_insn = 0;
>> +}
>>
>>sel_region_finish (reset_sched_cycles_p);
>>  }

Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 259227)
--- gcc/ChangeLog   (revision 259228)
***
*** 1,3 
--- 1,13 
+ 2018-04-09  Andrey Belevantsev  <a...@ispras.ru>
+ 
+   PR rtl-optimization/83530
+ 
+   * sel-sched.c (force_next_insn): New global variable.
+   (remove_insn_for_debug): When force_next_insn is true, also leave only
+   next insn in the ready list.
+   (sel_sched_region): When the region wasn't scheduled, make another pass
+   over it with force_next_insn set to 1.
+ 
  2018-04-08  Monk Chiang  <sh.chian...@gmail.com>
  
* config.gcc (nds32le-*-*, nds32be-*-*): Add nds32/nds32_intrinsic.h
Index: gcc/testsuite/gcc.dg/pr83530.c
===
*** gcc/testsuite/gcc.dg/pr83530.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr83530.c  (revision 259228)
***
*** 0 
--- 1,15 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */
+ int vm, z0;
+ short int mz;
+ 
+ int
+ ny (void)
+ {
+   int ch;
+ 
+   for (ch = 0; ch < 6; ++ch)
+ vm += ch / vm;
+ 
+   return z0 + mz;
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog (revision 259227)
--- gcc/testsuite/ChangeLog (revision 259228)
***
*** 1,3 
--- 1,8 
+ 2018-04-09  Andrey Belevantsev  <a...@ispras.ru>
+ 
+   PR rtl-optimization/83530
+   * gcc.dg/pr83530.c: New test.
+ 
  2018-04-07  Thomas Koenig  <tkoe...@gcc.gnu.org>
  
PR middle-end/82976
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 259227)
--- gcc/sel-sched.c (revision 259228)
*** remove_temp_moveop_nops (bool full_tidyi
*** 5004,5015 
 distinguishing between bookkeeping copies and original insns.  */
  static int max_uid_before_move_op = 0;
  
  /* Remove from AV_VLIW_P

Re: Fix PR 83913

2018-04-03 Thread Andrey Belevantsev

On 03.04.2018 19:02, Alexander Monakov wrote:
> On Tue, 3 Apr 2018, Andrey Belevantsev wrote:
> 
>> Hello,
>>
>> This issue ended up being fixed the way different from described in the PR.
>>  We do not want to walk away from the invariant "zero SCHED_TIMES -- insn
>> is not scheduled" even for bookkeeping copies (testing showed it trips over
>> asserts designed to catch this).  Rather we choose merging exprs in the way
>> the larger sched-times wins.
> 
> ... but the Changelog and the actual patch take the average rather than the
> maximum sched-time? :)  I believe either way would be acceptable, but please
> clarify the intent.

Sorry, the average is the intent.  Just to have a bit more of pipelining
chances.

Andrey

> 
> Alexander
>

Fix PRs 80463, 83972, 83480

2018-04-03 Thread Andrey Belevantsev

Hello,

In these PRs we deal with the dependencies we are forced to make between a
debug insn and its previous insn (unless bb head).  In the latter case, if
such an insn has been moved, we fixup its movement so it aligns with the
sel-sched invariants.  We also carefully adjust seqnos in the case we had a
single debug insn left in the block.

Best,
Andrey

2018-04-03  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/80463
PR rtl-optimization/83972
PR rtl-optimization/83480

* sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the
correct producer for the insn.
(tidy_control_flow): Fixup seqnos in case of debug insns.

* gcc.dg/pr80463.c: New test.
* g++.dg/pr80463.C: Likewise.
* gcc.dg/pr83972.c: Likewise.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index a965d2ec42f..f6de96a7f3d 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED,

 /* Note a dependence.  */
 static void
-has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED,
+has_dependence_note_dep (insn_t pro,
 ds_t ds ATTRIBUTE_UNUSED)
 {
-  if (!sched_insns_conditions_mutex_p (has_dependence_data.pro,
-  VINSN_INSN_RTX
(has_dependence_data.con)))
+  insn_t real_pro = has_dependence_data.pro;
+  insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con);
+
+  /* We do not allow for debug insns to move through others unless they
+ are at the start of bb.  Such insns may create bookkeeping copies
+ that would not be able to move up breaking sel-sched invariants.
+ Detect that here and allow that movement if we allowed it before
+ in the first place.  */
+  if (DEBUG_INSN_P (real_con)
+  && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con))
+return;
+
+  if (!sched_insns_conditions_mutex_p (real_pro, real_con))
 {
   ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where];

@@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying)

   gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU);

+  /* We could have skipped some debug insns which did not get removed
with the block,
+ and the seqnos could become incorrect.  Fix them up here.  */
+  if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first ||
sel_bb_end (xbb) != last))
+   {
+ if (!sel_bb_empty_p (xbb->prev_bb))
+   {
+ int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb));
+ if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb)))
+   for (insn_t insn = sel_bb_head (xbb); insn != first; insn =
NEXT_INSN (insn))
+ INSN_SEQNO (insn) = prev_seqno + 1;
+   }
+   }
+
   /* It can turn out that after removing unused jump, basic block
  that contained that jump, becomes empty too.  In such case
  remove it too.  */
diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C
new file mode 100644
index 000..5614c28ca45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr80463.C
@@ -0,0 +1,20 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-g -fselective-scheduling2 -O2
-fvar-tracking-assignments" } */
+
+int *a;
+int b, c;
+void
+d ()
+{
+  for (int e; c; e++)
+switch (e)
+  {
+  case 0:
+   a[e] = 1;
+  case 1:
+   b = 2;
+   break;
+  default:
+   a[e] = 3;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/pr80463.c b/gcc/testsuite/gcc.dg/pr80463.c
new file mode 100644
index 000..cebf2fef1f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80463.c
@@ -0,0 +1,54 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-g -O2 -fvar-tracking-assignments -fselective-scheduling2
-ftree-loop-vectorize -fnon-call-exceptions -fno-tree-vrp -fno-gcse-lm
-fno-tree-loop-im -fno-reorder-blocks-and-partition -fno-reorder-blocks
-fno-move-loop-invariants -w" } */
+
+short int t2;
+int cd, aa, ft;
+
+void
+dh (void)
+{
+  int qs = 0;
+
+  if (t2 < 1)
+{
+  int bq = 0;
+
+  while (bq < 1)
+{
+}
+
+  while (t2 < 1)
+{
+  if (t2 == 0)
+{
+  bq = 0;
+  cd = !!cd;
+}
+  else
+{
+  bq = 1;
+  cd = bq > qs;
+}
+
+  t2 += cd;
+  bq = (t2 / qs) == bq;
+
+  if (aa != ft)
+{
+  qs %= 0;
+  while (bq != 0)
+{
+ ro:
+  ;
+}
+}
+
+  ++t2;
+}
+
+ ia:
+  goto ro;
+}
+
+  goto ia;
+}
diff --git a/gcc/testsuite/gcc.dg/pr83972.c b/gcc/testsuite/gcc.dg/pr83972.c
new file mode 100644
index 000..b8de42cef0a
---

Fix PR 83913

2018-04-03 Thread Andrey Belevantsev

Hello,

This issue ended up being fixed the way different from described in the PR.
 We do not want to walk away from the invariant "zero SCHED_TIMES -- insn
is not scheduled" even for bookkeeping copies (testing showed it trips over
asserts designed to catch this).  Rather we choose merging exprs in the way
the larger sched-times wins.

Best,
Andrey

2018-04-03  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/83913

* sel-sched-ir.c (merge_expr_data): Choose the middle between two
different sched-times
when merging exprs.

* gcc.dg/pr83913.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index a965d2ec42f..f6de96a7f3d 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t
split_point)
   if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from))
 EXPR_PRIORITY (to) = EXPR_PRIORITY (from);

-  if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from))
-EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from);
+  if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from))
+EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES (to)
+ + 1) / 2);

   if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from))
 EXPR_ORIG_BB_INDEX (to) = 0;
@@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED,

 /* Note a dependence.  */
 static void
diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c
new file mode 100644
index 000..c898d71a261
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83913.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling
-fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate
-fno-rerun-cse-after-loop -fno-web" } */
+
+int jo, z4;
+
+int
+be (long unsigned int l7, int nt)
+{
+  int en;
+
+  jo = l7;
+  for (en = 0; en < 24; ++en)
+{
+  jo = (jo / z4) * (!!jo >= ((!!nt) & 2));
+  if (jo == 0)
+nt = 0;
+  else
+{
+  nt = z4;
+  ++z4;
+  nt = (long unsigned int) nt == (l7 + 1);
+}
+}
+
+  return nt;
+}

Fix PR 83962

2018-04-03 Thread Andrey Belevantsev

Hello,

This issues is about the correct order in which we need to call the
routines that fix up the control flow for us.

Best,
Andrey

2018-04-03  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/83962

* sel-sched-ir.c (tidy_control_flow): Correct the order in which we call
tidy_fallthru_edge
and tidy_control_flow.

* gcc.dg/pr83962.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index a965d2ec42f..f6de96a7f3d 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying)
   && INSN_SCHED_TIMES (BB_END (xbb)) == 0
   && !IN_CURRENT_FENCE_P (BB_END (xbb)))
 {
-  if (sel_remove_insn (BB_END (xbb), false, false))
-return true;
+  /* We used to call sel_remove_insn here that can trigger
tidy_control_flow
+ before we fix up the fallthru edge.  Correct that ordering by
+explicitly doing the latter before the former.  */
+  clear_expr (INSN_EXPR (BB_END (xbb)));
   tidy_fallthru_edge (EDGE_SUCC (xbb, 0));
+  if (tidy_control_flow (xbb, false))
+   return true;
 }

   first = sel_bb_head (xbb);
diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c
new file mode 100644
index 000..0547e218715
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83962.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2
-fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */
+unsigned int ca;
+
+void
+v6 (long long unsigned int as, int p9)
+{
+  while (p9 < 1)
+as = (as != ca) || (as > 1);
+}

Fix PR 83530

2018-04-03 Thread Andrey Belevantsev

Hello,

This issue is when we cannot correctly make insn tick data for the
unscheduled code (but processed by the modulo scheduler).  Fixed by closely
following usual scheduling process as described in the PR.

Best,
Andrey

2018-04-03  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/83530

* sel-sched.c (force_next_insn): New global variable.
(remove_insn_for_debug): When force_next_insn is true, also leave only
next insn
in the ready list.
(sel_sched_region): When the region wasn't scheduled, make another pass
over it
with force_next_insn set to 1.


* gcc.dg/pr8350.c: New test.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 76092f9587a..fca2b69c5ee 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying)
distinguishing between bookkeeping copies and original insns.  */
 static int max_uid_before_move_op = 0;

+/* When true, we're always scheduling next insn on the already scheduled code
+   to get the right insn data for the following bundling or other passes.  */
+static int force_next_insn = 0;
+
 /* Remove from AV_VLIW_P all instructions but next when debug counter
tells us so.  Next instruction is fetched from BNDS.  */
 static void
 remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p)
 {
-  if (! dbg_cnt (sel_sched_insn_cnt))
+  if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn)
 /* Leave only the next insn in av_vliw.  */
 {
   av_set_iterator av_it;
@@ -7642,7 +7646,13 @@ sel_sched_region (int rgn)
 sel_sched_region_1 ();
   else
 /* Force initialization of INSN_SCHED_CYCLEs for correct bundling.  */
-reset_sched_cycles_p = true;
+{
+  reset_sched_cycles_p = false;
+  pipelining_p = false;
+  force_next_insn = 1;
+  sel_sched_region_1 ();
+  force_next_insn = 0;
+}

   sel_region_finish (reset_sched_cycles_p);
 }
diff --git a/gcc/testsuite/gcc.dg/pr83530.c b/gcc/testsuite/gcc.dg/pr83530.c
new file mode 100644
index 000..f4d8927de92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83530.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */
+int vm, z0;
+short int mz;
+
+int
+ny (void)
+{
+  int ch;
+
+  for (ch = 0; ch < 6; ++ch)
+vm += ch / vm;
+
+  return z0 + mz;
+}

Selective scheduling fixes

2018-04-03 Thread Andrey Belevantsev

Hello,

I will post patches to various recent selective scheduling PRs as replies
to this mail.  The patches have been tested on x86-64 (default languages)
and ia64 (c.c++), in case of ppc issues I've checked on the ppc cross
compiler.  I will also run the ia64 boostrap with enabled sel-sched but it
will take more time.

Best,
Andrey

Re: [PATCH] sel-sched: fix zero-usefulness case in sel_rank_for_schedule (PR 83513)

2017-12-26 Thread Andrey Belevantsev


On 25.12.2017 19:47, Alexander Monakov wrote:

Hello,

we need the following follow-up fix for priority comparison in
sel_rank_for_schedule as demonstrated by PR 83513.  Checked on
x86_64 by running a bootstrap and also checking for no regressions in
make -k check-gcc 
RUNTESTFLAGS="--target_board=unix/-fselective-scheduling/-fschedule-insns"

OK to apply?

Yes.

Andrey


PR rtl-optimization/83513
* sel-sched.c (sel_rank_for_schedule): Order by non-zero usefulness
before priority comparison.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index c1be0136551..be3813717ba 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -3396,17 +3396,22 @@ sel_rank_for_schedule (const void *x, const void *y)
else if (control_flow_insn_p (tmp2_insn) && !control_flow_insn_p (tmp_insn))
  return 1;

+  /* Prefer an expr with non-zero usefulness.  */
+  int u1 = EXPR_USEFULNESS (tmp), u2 = EXPR_USEFULNESS (tmp2);
+
+  if (u1 == 0)
+{
+  if (u2 == 0)
+u1 = u2 = 1;
+  else
+return 1;
+}
+  else if (u2 == 0)
+return -1;
+
/* Prefer an expr with greater priority.  */
-  if (EXPR_USEFULNESS (tmp) != 0 || EXPR_USEFULNESS (tmp2) != 0)
-{
-  int p2 = EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2),
-  p1 = EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp);
-
-  val = p2 * EXPR_USEFULNESS (tmp2) - p1 * EXPR_USEFULNESS (tmp);
-}
-  else
-val = EXPR_PRIORITY (tmp2) - EXPR_PRIORITY (tmp)
- + EXPR_PRIORITY_ADJ (tmp2) - EXPR_PRIORITY_ADJ (tmp);
+  val = (u2 * (EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2))
+ - u1 * (EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp)));
if (val)
  return val;

Re: [PATCH 6/9] sel-sched: Don't mess with register restores

2016-08-04 Thread Andrey Belevantsev

Hello,

On 01.08.2016 4:42, Segher Boessenkool wrote:
> If selective scheduling copies register restores it confuses dwarf2cfi.
> 
> 2016-06-07  Segher Boessenkool  
> 
>   * sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy
>   instructions with a REG_CFA_RESTORE note.

OK from sel-sched POV.

Best,
Andrey

> ---
>  gcc/sel-sched-ir.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
> index 83f813a..4a3984a 100644
> --- a/gcc/sel-sched-ir.c
> +++ b/gcc/sel-sched-ir.c
> @@ -3015,6 +3015,7 @@ init_global_and_expr_for_insn (insn_t insn)
>/* TRAP_IF though have an INSN code is control_flow_insn_p ().  */
>|| control_flow_insn_p (insn)
>|| volatile_insn_p (PATTERN (insn))
> +   || find_reg_note (insn, REG_CFA_RESTORE, NULL)
>|| (targetm.cannot_copy_insn_p
>&& targetm.cannot_copy_insn_p (insn)))
>  force_unique_p = true;
>

Re: Various selective scheduling fixes

2016-04-01 Thread Andrey Belevantsev


Hi Christophe,

On 01.04.2016 10:33, Christophe Lyon wrote:

On 31 March 2016 at 16:43, Andrey Belevantsev <a...@ispras.ru> wrote:

Hello,

On 14.03.2016 12:10, Andrey Belevantsev wrote:


Hello,

In this thread I will be posting the patches for the fixed selective
scheduling PRs (except the one that was already kindly checked in by
Jeff).
 The patches were tested both on x86-64 and ia64 with the following
combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
schedulers forced to sel-sched.  In all cases everything seemed to be
fine.

Three of the PRs are regressions, the other two showed different errors
across the variety of releases tested by submitters;  I think all of them
are appropriate at this stage -- they do not touch anything outside of
selective scheduling except the first patch where a piece of code from
sched-deps.c needs to be refactored into a function to be called from
sel-sched.c.



I've backported all regression PRs to gcc-5-branch after testing there again
with selective scheduling force enabled: PRs 64411, 0, 69032, 69102.
The first one was not marked as a regression as such but the test for PR
70292, which is duplicate, works for me on gcc 5.1 thus making it a
regression, too.



Hi,

The backport for pr69102 shows that the new testcase fails to compile (ICE)
when GCC is configured as:

--target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4

/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
In function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
internal compiler error: Segmentation fault
0xa64d15 crash_signal
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31cd2 invoke_dfa_lookahead_guard
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
0xa31cd2 find_best_expr
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
0xa343fb fill_insns
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
0xa343fb schedule_on_fences
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
0xa36010 sel_sched_region_2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
0xa36f2a sel_sched_region_1
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
0xa36f2a sel_sched_region(int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
0xa37589 run_selective_scheduling()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
0xa14aed rest_of_handle_sched2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Can you have a look?


That's because A15 is the only place which enables 
autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook. 
But autoprefetch modeling doesn't work for selective scheduling, it uses 
haifa structures that are not kept up to date during sel-sched.  So this is 
not supposed to work as soon as the param value for prefetcher lookahead 
depth is positive.


The following patch works for me.  Could you check it with your testing? 
If it works fine for you, I would install the patch both for trunk and 
gcc-5.  It would be great to force sel-sched to be enabled, too.  I could 
do that but I don't have the hardware or cross-arm target tools at the moment.


* haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Disable 
for selective scheduler.


Best,
Andrey




Christophe


Andrey



Andrey





diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index ad2450b..c790830 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5691,6 +5691,10 @@ autopref_multipass_dfa_lookahead_guard (rtx_insn *insn1, int ready_index)
 {
   int r = 0;
 
+  /* Autoprefetcher modeling is not supported by selective scheduler.  */
+  if (sel_sched_p ())
+return 0;
+
   if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) <= 0)
 return 0;

Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly

2016-03-31 Thread Andrey Belevantsev


Hello,

On 12.03.2016 20:13, Jeff Law wrote:


As Andrey outlined in the PR, selective-scheduling was missing a check &
handling of hard registers in modes that span more than one hard reg. This
caused an incorrect register selection during renaming.

I verified removing the printf call from the test would not compromise the
test.  Then I did a normal x86 bootstrap & regression test with the patch.
Of course that's essentially useless, so I also did another bootstrap and
regression test with -fselective-scheduling in BOOT_CFLAGS with and without
this patch.  In both cases there were no regressions.

I'm installing Andrey's patch on the trunk.  I'm not sure this is worth
addressing in gcc-5.


I've looked at the patch again and as it fixes general code and has a 
regression marker I've included it in the bunch of other PRs that were 
backported to gcc-5.  I forgot you were hesitant putting it to gcc-5 though 
:) So I can revert it from the branch if you want me to.


Andrey



Jeff

Re: Various selective scheduling fixes

2016-03-31 Thread Andrey Belevantsev


Hello,

On 14.03.2016 12:10, Andrey Belevantsev wrote:

Hello,

In this thread I will be posting the patches for the fixed selective
scheduling PRs (except the one that was already kindly checked in by Jeff).
 The patches were tested both on x86-64 and ia64 with the following
combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
schedulers forced to sel-sched.  In all cases everything seemed to be fine.

Three of the PRs are regressions, the other two showed different errors
across the variety of releases tested by submitters;  I think all of them
are appropriate at this stage -- they do not touch anything outside of
selective scheduling except the first patch where a piece of code from
sched-deps.c needs to be refactored into a function to be called from
sel-sched.c.


I've backported all regression PRs to gcc-5-branch after testing there 
again with selective scheduling force enabled: PRs 64411, 0, 69032, 
69102.  The first one was not marked as a regression as such but the test 
for PR 70292, which is duplicate, works for me on gcc 5.1 thus making it a 
regression, too.


Andrey



Andrey

[committed] Add patch for PR 70292

2016-03-31 Thread Andrey Belevantsev


Hello,

PR 70292 turned out to be the duplicate of PR 64411, which is already fixed 
on trunk, but still the testcase from the PR is worth adding.  It can be 
tortured instead of being a target test, and also this PR showed that 64411 
is a regression and should be backported to gcc-5-branch.


Committed the test to trunk.  I don't think we need it for gcc-5 though.

Andrey
Index: gcc/testsuite/gcc.dg/pr70292.c
===
*** gcc/testsuite/gcc.dg/pr70292.c	(revision 0)
--- gcc/testsuite/gcc.dg/pr70292.c	(revision 234627)
***
*** 0 
--- 1,12 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options "-O1 -fno-inline -fno-dce -fschedule-insns -fselective-scheduling -fno-tree-dce" } */
+ 
+ void bar() {}
+ 
+ int t106_1mul(unsigned int x, unsigned int y) {
+ int r;
+ if (__builtin_mul_overflow(x, y, )) {
+ bar();
+ }
+ return r;
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog	(revision 234626)
--- gcc/testsuite/ChangeLog	(revision 234627)
***
*** 1,3 
--- 1,8 
+ 2016-03-31  Andrey Belevantsev  <a...@ispras.ru>
+ 
+ 	PR target/70292
+ 	* gcc.c-torture/pr70292.c: New test.
+ 
  2016-03-31  Marek Polacek  <pola...@redhat.com>
  
  	PR c/70297

Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly

2016-03-21 Thread Andrey Belevantsev


On 14.03.2016 21:39, Jeff Law wrote:

On 03/14/2016 03:56 AM, Andrey Belevantsev wrote:


Thank you for checking this in.  I've also tested this patch in the
similar way (forcing selective scheduling for 2nd and both schedulers)
both on x86-64 and ia64.  I've posted the patches for remaining
sel-sched PRs just now -- it took some time bringing our Itaniums back
to life.

No problem.  I found it trolling the P4/P5 regression list.  It was the
only one that I could wrap my head around easily that night.


I forgot I had a test from BZ for this PR, it wasn't attached to the patch 
so it didn't get committed.  I've committed it now as rev. 234360.


Andrey



Thanks for following-up on the others.  Hopefully between Alexander, Bernd
and myself we can get them reviewed and work towards getting those BZs
resolved.

jeff



Index: gcc/testsuite/gcc.target/arm/pr69307.c
===
*** gcc/testsuite/gcc.target/arm/pr69307.c	(revision 0)
--- gcc/testsuite/gcc.target/arm/pr69307.c	(revision 234360)
***
*** 0 
--- 1,34 
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fselective-scheduling -mtune=arm1136j-s" } */
+ 
+ typedef unsigned char uint8_t;
+ typedef unsigned short int uint16_t;
+ typedef unsigned int uint32_t;
+ typedef unsigned long long int uint64_t;
+ typedef uint8_t u8;
+ typedef uint16_t u16;
+ typedef uint32_t u32;
+ typedef uint64_t u64;
+ u64 __attribute__((noinline, noclone))
+ foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3)
+ {
+   u8 *p8_2 = _2;
+   u16 *p16_2 = _2;
+   u8 *p8_3 = _3;
+   u64 *p64_3 = _3;
+   p8_2 = _3;
+   *p8_3 -= *p64_3;
+   *p8_2 = (u64)*p8_2 % ((u64)*p8_2 | 3);
+   u8_2 = (u64)u8_2 / ((u64)*p16_2 | 1);
+   u16_0 = (u64)u16_0 % ((u64)*p8_2 | 3);
+   return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3;
+ }
+ int main()
+ {
+   u64 x = 0;
+   x += foo(3llu, 6llu, 15llu, 28llu, 5llu, 11llu, 20llu, 44llu, 7llu, 10llu, 20llu, 55llu, 0llu, 9llu, 17llu, 48llu);
+   __builtin_printf("%02x%02x%02x%02x%02x%02x%02x%02x\n", (unsigned)((x >> 0) & 0xff), (unsigned)((x >> 8) & 0xff), (unsigned)((x >> 16) & 0xff), (unsigned)((x >> 24) & 0xff), (unsigned)((x >> 32) & 0xff), (unsigned)((x >> 40) & 0xff), (unsigned)((x >> 48) & 0xff), (unsigned)((x >> 56) & 0xff));
+   if (x != 0x1f3)
+ __builtin_abort();
+   return 0;
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog	(revision 234359)
--- gcc/testsuite/ChangeLog	(revision 234360)
***
*** 1,5 
--- 1,10 
  2016-03-21  Andrey Belevantsev  <a...@ispras.ru>
  
+ 	PR rtl-optimization/69307
+ 	* gcc.target/arm/pr69307.c: New test.
+ 	
+ 2016-03-21  Andrey Belevantsev  <a...@ispras.ru>
+ 
  	PR rtl-optimization/69102
  	* gcc.c-torture/compile/pr69102.c: New test.

Re: [02/05] Fix PR 63384

2016-03-15 Thread Andrey Belevantsev


On 15.03.2016 20:44, Alexander Monakov wrote:

On Tue, 15 Mar 2016, Marek Polacek wrote:

This test fails for me due to
cc1plus: warning: var-tracking-assignments changes selective scheduling


Thanks for the heads-up Marek, and sorry for the trouble.  Like I said in the
adjacent reply, the warning is expected (I didn't realize the testsuite would
notice that, though).  I think the right fix is to simply add "-w" to
dg-options, and while we are at it, we should probably change -fvta-toggle to
just -fvta as well (because VTA is active either way, right?).


Yes, the -fvta should work.  Sorry for the breakage, I guess I've misread 
the compare-tests output when also checking the run with forced sel-sched 
enabled.


I can take care of the test tomorrow morning or you can do it now.

Best,
Andrey



Andrey?

Thanks.
Alexander

Re: [05/05] Fix PR 69102

2016-03-15 Thread Andrey Belevantsev


Hello,

On 14.03.2016 12:52, Andrey Belevantsev wrote:

Hello,

The problem here is readonly dependence contexts in selective scheduler.
We're trying to cache the effect of initializing a dependence context with
remembering that context and setting a readonly bit on it.  When first
moving the insn 43 with REG_ARGS_SIZE note through the insn 3 (a simple eax
set) we also set the last_args_size field of the context.  Later, when we
make a copy of insn 43 and try to move it again through insn 3, we take the
cached dependency context and notice the (fake) dep with last_args_size
insn, which is the old insn 43.  Then the assert saying that we should be
able to lift the bookkeeping copy up the same way as we did with the
original insn breaks.

Fixed by the attached patch that makes us notice only deps with the current
producer insn.

Ok for trunk?


We've discussed the patch with Alexander a bit and decided to take the 
different approach.  The core issue here is not handling the new 
last_args_size field in the readonly contexts.  In general, the readonly 
context approach, when analyzing an insn with a readonly context, would 
create the necessary dependencies with all of the last_ fields but refrain 
from modifying those fields.  The reason is we need to capture the effect 
of only the single producer in the readonly context.  Failing to do so may 
update the last_ fields with the effect of subsequent analyses and having 
the fake dependencies with the insns that got into those fields instead of 
having the dependencies with the currently used producer.


So the right fix here is to guard setting of the last_args_size field with 
!deps->readonly test as it is done elsewhere in the sched-deps.c.  In stage 
1 we will also want to set the asserts in the sel-sched dependency hooks 
(where I have placed early returns in the previous version of the patch) 
actually checking that the dependency is always created with the current 
producer, and such cases will be caught sooner.


The new patch bootstrapped and tested on x86-64 with selective scheduler 
forced enabled with no regressions.  It needs the maintainer outside of 
sel-sched as we touch sched-deps.c file.  Ok for trunk?  The test is the 
same as in previous patch.


Andrey

2016-03-15  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69102
* sched-deps.c (sched_analyze_insn): Do not set last_args_size field
when we have a readonly dependency context.



gcc/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69102
* sel-sched.c (has_dependence_note_dep): Only take into
account dependencies produced by the current producer insn.
(has_dependence_note_mem_dep): Likewise.

testsuite/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69102
* gcc.c-torture/compile/pr69102.c: New test.

Best,
Andrey



diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 3d4a1d5..77ffcd0 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3495,7 +3495,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn)
 {
   if (deps->last_args_size)
 	add_dependence (insn, deps->last_args_size, REG_DEP_OUTPUT);
-  deps->last_args_size = insn;
+  if (!deps->readonly)
+	deps->last_args_size = insn;
 }
 }

Re: [01/05] Fix PR 64411

2016-03-15 Thread Andrey Belevantsev


Hello,

On 14.03.2016 19:45, Bernd Schmidt wrote:

On 03/14/2016 05:23 PM, Alexander Monakov wrote:

On Mon, 14 Mar 2016, Andrey Belevantsev wrote:

In this case, we get an inconsistency between the sched-deps interface,
saying
we can't move an insn writing the si register through a vector insn, and
the
liveness analysis, saying we can.  The latter doesn't take into account
implicit_reg_pending_clobbers set calculated in sched-deps before register
allocation.  The solution is to reflect this set in our insn data
(sets/uses/clobbers).

Ok for trunk?


One nit; the prototype of the new function:

extern void get_implicit_reg_pending_clobbers (rtx_insn *, HARD_REG_SET *);

has source operand on the left, destination on the right; it's probably
nicer
to swap them around.

OK as far as selective scheduler changes go, but this also needs a general
scheduler maintainer ack for the sched-deps.c change.  Vladimir, can you
have
a look?


Needs better documentation of the new function's arguments (as per general
requirements for such things), but otherwise that part is ok (either arg
order). The sel-sched parts should also have proper function comments
however, and here:

+{
+  SET_REGNO_REG_SET (IDATA_REG_SETS (id), regno);
+}

we don't use braces around single statements.


I've incorporated both yours and Alexander's comments and committed the 
patch as rev. 234216.


Andrey




Bernd

Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly

2016-03-14 Thread Andrey Belevantsev


Hello Jeff,

On 12.03.2016 20:13, Jeff Law wrote:


As Andrey outlined in the PR, selective-scheduling was missing a check &
handling of hard registers in modes that span more than one hard reg. This
caused an incorrect register selection during renaming.

I verified removing the printf call from the test would not compromise the
test.  Then I did a normal x86 bootstrap & regression test with the patch.
Of course that's essentially useless, so I also did another bootstrap and
regression test with -fselective-scheduling in BOOT_CFLAGS with and without
this patch.  In both cases there were no regressions.


Thank you for checking this in.  I've also tested this patch in the similar 
way (forcing selective scheduling for 2nd and both schedulers) both on 
x86-64 and ia64.  I've posted the patches for remaining sel-sched PRs just 
now -- it took some time bringing our Itaniums back to life.


Andrey



I'm installing Andrey's patch on the trunk.  I'm not sure this is worth
addressing in gcc-5.

Jeff

[05/05] Fix PR 69102

2016-03-14 Thread Andrey Belevantsev


Hello,

The problem here is readonly dependence contexts in selective scheduler. 
We're trying to cache the effect of initializing a dependence context with 
remembering that context and setting a readonly bit on it.  When first 
moving the insn 43 with REG_ARGS_SIZE note through the insn 3 (a simple eax 
set) we also set the last_args_size field of the context.  Later, when we 
make a copy of insn 43 and try to move it again through insn 3, we take the 
cached dependency context and notice the (fake) dep with last_args_size 
insn, which is the old insn 43.  Then the assert saying that we should be 
able to lift the bookkeeping copy up the same way as we did with the 
original insn breaks.


Fixed by the attached patch that makes us notice only deps with the current 
producer insn.


Ok for trunk?

gcc/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69102
* sel-sched.c (has_dependence_note_dep): Only take into
account dependencies produced by the current producer insn.
(has_dependence_note_mem_dep): Likewise.

testsuite/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69102
* gcc.c-torture/compile/pr69102.c: New test.

Best,
Andrey

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index c1a9e55..b4aa933 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3277,9 +3277,14 @@ has_dependence_note_reg_use (int regno)
 static void
 has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED,
 			 rtx pending_mem ATTRIBUTE_UNUSED,
-			 insn_t pending_insn ATTRIBUTE_UNUSED,
+			 insn_t pending_insn,
 			 ds_t ds ATTRIBUTE_UNUSED)
 {
+  /* We're only interested in dependencies with the current producer.
+ We might get other insns that were saved in dependence context
+ as last_* or pending_* fields.  */
+  if (INSN_UID (pending_insn) != INSN_UID (has_dependence_data.pro))
+return;
   if (!sched_insns_conditions_mutex_p (has_dependence_data.pro,
    VINSN_INSN_RTX (has_dependence_data.con)))
 {
@@ -3291,9 +3296,14 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED,
 
 /* Note a dependence.  */
 static void
-has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED,
+has_dependence_note_dep (insn_t pro,
 			 ds_t ds ATTRIBUTE_UNUSED)
 {
+  /* We're only interested in dependencies with the current producer.
+ We might get other insns that were saved in dependence context
+ as last_* or pending_* fields.  */
+  if (INSN_UID (pro) != INSN_UID (has_dependence_data.pro))
+return;
   if (!sched_insns_conditions_mutex_p (has_dependence_data.pro,
    VINSN_INSN_RTX (has_dependence_data.con)))
 {
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69102.c b/gcc/testsuite/gcc.c-torture/compile/pr69102.c
new file mode 100644
index 000..b1328ca
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr69102.c
@@ -0,0 +1,21 @@
+/* { dg-options "-Og -fPIC -fschedule-insns2 -fselective-scheduling2 -fno-tree-fre --param=max-sched-extend-regions-iters=10" } */
+void bar (unsigned int);
+
+void
+foo (void)
+{
+  char buf[1] = { 3 };
+  const char *p = buf;
+  const char **q = 
+  unsigned int ch;
+  switch (**q)
+{
+case 1:  ch = 5; break;
+case 2:  ch = 4; break;
+case 3:  ch = 3; break;
+case 4:  ch = 2; break;
+case 5:  ch = 1; break;
+default: ch = 0; break;
+}
+  bar (ch);
+}

[04/05] Fix PR 69032

2016-03-14 Thread Andrey Belevantsev


Hello,

We fail to find the proper seqno for the fresh bookkeeping copy in this PR. 
 The problem is that in get_seqno_by_preds we are iterating over bb from 
the given insn backwards up to the first bb insn.  We skip the initial insn 
when iterating over bb, yet we should take seqno from it.


The code in question originally didn't include bb head when iterating, and 
was patched to do so in 2011.  The patch was wrong and instead of including 
bb head managed to exclude the original insn itself.  By reading the 
original and patched code I've convinced myself that the right fix will be 
to do what the patch intended and include both the initial insn and the bb 
head in the iteration.


Ok for trunk?

gcc/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69032
* sel-sched-ir.c (get_seqno_by_preds): Include both tmp and head when 
looping backwards over basic block insns.


testsuite/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/69032
* gcc.dg/pr69032.c: New test.

Best,
Andrey
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index ec59280..c1a9e55 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -4103,11 +4103,14 @@ get_seqno_by_preds (rtx_insn *insn)
   insn_t *preds;
   int n, i, seqno;
 
-  while (tmp != head)
+  /* Loop backwards from insn to head including both.  */
+  while (1)
 {
-  tmp = PREV_INSN (tmp);
   if (INSN_P (tmp))
 return INSN_SEQNO (tmp);
+  if (tmp == head)
+	break;
+  tmp = PREV_INSN (tmp);
 }
 
   cfg_preds (bb, , );
diff --git a/gcc/testsuite/gcc.dg/pr69032.c b/gcc/testsuite/gcc.dg/pr69032.c
new file mode 100644
index 000..e0925cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr69032.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fsched-pressure -fsel-sched-pipelining -fselective-scheduling" } */
+
+void foo (long long i)
+{
+   while (i != -1)
+ {
+	++i;
+	 __asm__ ("");
+ }
+}

[03/05] Fix PR 66660

2016-03-14 Thread Andrey Belevantsev


Hello,

We speculate an insn in the PR but we do not make a check for it though we 
should.  The thing that broke this was the fix for PR 45472.  In that pr, 
we have moved a volatile insn too far up because we failed to merge the 
bits describing its volatility when we have processed a control flow split. 
 The code to propagate the insn pattern with the insn merging was added 
when the volatility of the two insns from the both split branches differ. 
However, the volatility of the speculated insn and its original differ: the 
original insn may trap while the speculated version may not.  Thus, we 
replace a speculative pattern with the original one per the PR 45472 fix 
for no reason.


The patch for this problem just limits the original fix for PR 45472 to 
apply for non-speculative insns only.  There is no test as it is not so 
easy to construct one -- we could count the number of speculation check in 
the resulting assembly but there is no way to force speculation to happen.


Ok for trunk?

gcc/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR target/0
* sel-sched-ir.c (merge_expr): Do not propagate trap bits into 
speculative insns.


Best,
Andrey
commit 53ef39496acc26cc0021555e403068e93343aa20
Author: Andrey Belevantsev <a...@ispras.ru>
Date:   Wed Jan 27 17:20:27 2016 +0300

Fix pr0: do not propagate trap bits into speculative insns

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index e181cb9..ec59280 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -1871,12 +1871,12 @@ merge_expr (expr_t to, expr_t from, insn_t split_point)
   /* Make sure that speculative pattern is propagated into exprs that
  have non-speculative one.  This will provide us with consistent
  speculative bits and speculative patterns inside expr.  */
-  if ((EXPR_SPEC_DONE_DS (from) != 0
-   && EXPR_SPEC_DONE_DS (to) == 0)
-  /* Do likewise for volatile insns, so that we always retain
-	 the may_trap_p bit on the resulting expression.  */
-  || (VINSN_MAY_TRAP_P (EXPR_VINSN (from))
-	  && !VINSN_MAY_TRAP_P (EXPR_VINSN (to
+  if (EXPR_SPEC_DONE_DS (to) == 0
+  && (EXPR_SPEC_DONE_DS (from) != 0
+	  /* Do likewise for volatile insns, so that we always retain
+	 the may_trap_p bit on the resulting expression.  */
+	  || (VINSN_MAY_TRAP_P (EXPR_VINSN (from))
+	  && !VINSN_MAY_TRAP_P (EXPR_VINSN (to)
 change_vinsn_in_expr (to, EXPR_VINSN (from));
 
   merge_expr_data (to, from, split_point);

[02/05] Fix PR 63384

2016-03-14 Thread Andrey Belevantsev


Hello,

Here we're looping because we decrease the counter of the insns we still 
can issue on a DEBUG_INSN thus rendering the counter negative.  The fix is 
to not count debug insns in the corresponding code.  The selective 
scheduling is known to spoil the result of var tracking, but still it is 
not the reason to hang in there.


The toggle option used in the test seems to be the equivalent of just 
enabling var-tracking-assignments which should lead to the same situation; 
however, if specified as is, var-tracking-assignments will be disabled by 
the toplev.c:1460 code.  Maybe we also need the same treatment for 
flag_var_tracking_assignments_toggle.


Ok for trunk?

gcc/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/63384
* sel-sched.c (invoke_aftermath_hooks): Do not decrease issue_more on 
DEBUG_INSN_P insns.


testsuite/

2016-03-14  Andrey Belevantsev  <a...@ispras.ru>

PR rtl-optimization/63384
* testsuite/g++.dg/pr63384.C: New test.

Best,
Andrey

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index c798935..893a3e5 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4249,7 +4249,8 @@ invoke_aftermath_hooks (fence_t fence, rtx_insn *best_insn, int issue_more)
   issue_more);
   memcpy (FENCE_STATE (fence), curr_state, dfa_state_size);
 }
-  else if (GET_CODE (PATTERN (best_insn)) != USE
+  else if (! DEBUG_INSN_P (best_insn)
+	   && GET_CODE (PATTERN (best_insn)) != USE
&& GET_CODE (PATTERN (best_insn)) != CLOBBER)
 issue_more--;
 
diff --git a/gcc/testsuite/g++.dg/pr63384.C b/gcc/testsuite/g++.dg/pr63384.C
new file mode 100644
index 000..b4e0784
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr63384.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fselective-scheduling2 -fsel-sched-pipelining  -fsel-sched-pipelining-outer-loops -fsel-sched-reschedule-pipelined -fvar-tracking-assignments-toggle -ftree-vectorize" } */
+
+template  T **make_test_matrix() {
+ T **data = new T *;
+ for (int i = 0; i < 1000; i++)
+;
+}
+
+template  void test() { T **c = make_test_matrix(); }
+
+main() { test(); }

[01/05] Fix PR 64411

2016-03-14 Thread Andrey Belevantsev


Hello,

In this case, we get an inconsistency between the sched-deps interface, 
saying we can't move an insn writing the si register through a vector insn, 
and the liveness analysis, saying we can.  The latter doesn't take into 
account implicit_reg_pending_clobbers set calculated in sched-deps before 
register allocation.  The solution is to reflect this set in our insn data 
(sets/uses/clobbers).


Ok for trunk?

gcc/

2016-01-15  Andrey Belevantsev  <a...@ispras.ru>

PR target/64411
* sched-deps.c (get_implicit_reg_pending_clobbers): New function,
factored out from ...
(sched_analyze_insn): ... here.
* sched-int.h (get_implicit_reg_pending_clobbers): Declare it.
* sel-sched-ir.c (setup_id_implicit_regs): New function, use
get_implicit_reg_pending_clobbers in it.
(setup_id_reg_sets): Use setup_id_implicit_regs.
(deps_init_id): Ditto.

testsuite/

2016-01-15  Andrey Belevantsev  <a...@ispras.ru>

PR target/64411
* gcc.target/i386/pr64411.C: New test.

Best,
Andrey
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 4961dfb..3d4a1d5 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2860,6 +2860,17 @@ sched_macro_fuse_insns (rtx_insn *insn)
 
 }
 
+/* Get the implicit reg pending clobbers for INSN.  */
+void
+get_implicit_reg_pending_clobbers (rtx_insn *insn, HARD_REG_SET *temp)
+{
+  extract_insn (insn);
+  preprocess_constraints (insn);
+  alternative_mask preferred = get_preferred_alternatives (insn);
+  ira_implicitly_set_insn_hard_regs (temp, preferred);
+  AND_COMPL_HARD_REG_SET (*temp, ira_no_alloc_regs);
+}
+
 /* Analyze an INSN with pattern X to find all dependencies.  */
 static void
 sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn)
@@ -2872,12 +2883,7 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn)
   if (! reload_completed)
 {
   HARD_REG_SET temp;
-
-  extract_insn (insn);
-  preprocess_constraints (insn);
-  alternative_mask prefrred = get_preferred_alternatives (insn);
-  ira_implicitly_set_insn_hard_regs (, prefrred);
-  AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
+  get_implicit_reg_pending_clobbers (insn, );
   IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp);
 }
 
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 378c3aa..d0e2c0e 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1351,6 +1351,7 @@ extern void finish_deps_global (void);
 extern void deps_analyze_insn (struct deps_desc *, rtx_insn *);
 extern void remove_from_deps (struct deps_desc *, rtx_insn *);
 extern void init_insn_reg_pressure_info (rtx_insn *);
+extern void get_implicit_reg_pending_clobbers (rtx_insn *, HARD_REG_SET *);
 
 extern dw_t get_dep_weak (ds_t, ds_t);
 extern ds_t set_dep_weak (ds_t, ds_t, dw_t);
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index d6c86b8..e181cb9 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -2650,6 +2650,24 @@ maybe_downgrade_id_to_use (idata_t id, insn_t insn)
 IDATA_TYPE (id) = USE;
 }
 
+/* Setup implicit register clobbers calculated by sched-deps before reload.  */
+static void
+setup_id_implicit_regs (idata_t id, insn_t insn)
+{
+  if (reload_completed)
+return;
+
+  HARD_REG_SET temp;
+  unsigned regno;
+  hard_reg_set_iterator hrsi;
+
+  get_implicit_reg_pending_clobbers (insn, );
+  EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi)
+{
+  SET_REGNO_REG_SET (IDATA_REG_SETS (id), regno);
+}
+}
+
 /* Setup register sets describing INSN in ID.  */
 static void
 setup_id_reg_sets (idata_t id, insn_t insn)
@@ -2704,6 +2722,9 @@ setup_id_reg_sets (idata_t id, insn_t insn)
 	}
 }
 
+  /* Also get implicit reg clobbers from sched-deps.  */
+  setup_id_implicit_regs (id, insn);
+
   return_regset_to_pool (tmp);
 }
 
@@ -2735,20 +2756,18 @@ deps_init_id (idata_t id, insn_t insn, bool force_unique_p)
   deps_init_id_data.force_use_p = false;
 
   init_deps (dc, false);
-
   memcpy (_init_id_sched_deps_info,
 	  _deps_init_id_sched_deps_info,
 	  sizeof (deps_init_id_sched_deps_info));
-
   if (spec_info != NULL)
 deps_init_id_sched_deps_info.generate_spec_deps = 1;
-
   sched_deps_info = _init_id_sched_deps_info;
 
   deps_analyze_insn (dc, insn);
+  /* Implicit reg clobbers received from sched-deps separately.  */
+  setup_id_implicit_regs (id, insn);
 
   free_deps (dc);
-
   deps_init_id_data.id = NULL;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/pr64411.C b/gcc/testsuite/gcc.target/i386/pr64411.C
new file mode 100644
index 000..55858fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr64411.C
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mcmodel=medium -fPIC -fschedule-insns -fselective-scheduling" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+extern "C"  long strtol ()
+  { return 0; }
+
+static struct {
+  void *sp[2];
+} info;
+
+union S813
+{
+  void * c[5];
+}
+s813;
+
+S813 a81

Various selective scheduling fixes

2016-03-14 Thread Andrey Belevantsev


Hello,

In this thread I will be posting the patches for the fixed selective 
scheduling PRs (except the one that was already kindly checked in by Jeff). 
 The patches were tested both on x86-64 and ia64 with the following 
combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched 
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the 
bootstrap/regtest with the second scheduler forced to sel-sched; 3) both 
schedulers forced to sel-sched.  In all cases everything seemed to be fine.


Three of the PRs are regressions, the other two showed different errors 
across the variety of releases tested by submitters;  I think all of them 
are appropriate at this stage -- they do not touch anything outside of 
selective scheduling except the first patch where a piece of code from 
sched-deps.c needs to be refactored into a function to be called from 
sel-sched.c.


Andrey

Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Andrey Belevantsev


On 17.12.2015 15:13, Yury Gribov wrote:

On 12/17/2015 02:58 PM, Andrey Belevantsev wrote:

Hello,

On 17.12.2015 11:58, Yury Gribov wrote:

Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator


Here, as Jakub mentioned, we assume that the argument addresses will
never be equal,


The problem is that this is not guaranteed.


Well, if the consensus is that this is indeed the case, you're free to 
change both places as you suggest.


Yours,
Andrey




thus that would always be different basic blocks (the
comparator is used for providing a custom sort over loop body bbs) and
you don't need a return 0 there.  You can put there gcc_unreachable
instead as in ...


* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


... this case -- here gcc_unreachable () marks that we're sorting pool
pointers and their values are always different.  Please do not remove it.


Same here.

/Yury

Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Andrey Belevantsev


Hello,

On 17.12.2015 11:58, Yury Gribov wrote:

Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator


Here, as Jakub mentioned, we assume that the argument addresses will never 
be equal, thus that would always be different basic blocks (the comparator 
is used for providing a custom sort over loop body bbs) and you don't need 
a return 0 there.  You can put there gcc_unreachable instead as in ...



* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


... this case -- here gcc_unreachable () marks that we're sorting pool 
pointers and their values are always different.  Please do not remove it.


Yours,
Andrey



/Yury

Re: [PATCH] Fix -Wlogical-not-parentheses warning in sel-sched-ir.c (PR c/61271)

2014-08-26 Thread Andrey Belevantsev


On 26.08.2014 18:03, Marek Polacek wrote:

Another wrongdoing detected by -Wlogical-not-parentheses.  From
my reading of the code it seems that simple != was meant here.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK. Heck, we traced this code back to the May 2007 commit :)

Andrey



2014-08-26  Marek Polacek  pola...@redhat.com

PR c/61271
* sel-sched-ir.c (make_regions_from_the_rest): Fix condition.

diff --git gcc/sel-sched-ir.c gcc/sel-sched-ir.c
index c36658f..dd777fa 100644
--- gcc/sel-sched-ir.c
+++ gcc/sel-sched-ir.c
@@ -6185,7 +6185,7 @@ make_regions_from_the_rest (void)

FOR_EACH_BB_FN (bb, cfun)
  {
-  if (bb-loop_father  !bb-loop_father-num == 0
+  if (bb-loop_father  bb-loop_father-num != 0
   !(bb-flags  BB_IRREDUCIBLE_LOOP))
loop_hdr[bb-index] = bb-loop_father-num;
  }

Marek

[PATCH] Fix PR 60901

2014-05-14 Thread Andrey Belevantsev


Hello,

This ICE comes from the ix86_dependencies_evaluation_hook code assumption 
that any scheduling region will be connected.  This assumption is not 
correct in case of the outer loops pipelining of the selective scheduler as 
explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3.
Trying to add dependencies between insns from the different scheduling 
regions results in a segfault within the dependency analyzer code.


The fix is to adjust the code to account for the situation when basic 
block's predecessors do not belong to the same scheduling region.


Bootstrapped and tested on x86-64, OK for trunk?  Branches?  The fix is low 
risk as the additional test should always be true for the regular scheduler.


Yours,
Andrey

gcc/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60901

	* config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that bb 
predecessor belongs to the same scheduling region.  Adjust comment.


testsuite/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60901
* gcc.dg/pr60901.c: New test.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 99f0657..0274288 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26254,13 +26254,17 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail)
 	  {
 		edge e;
 		edge_iterator ei;
-		/* Assume that region is SCC, i.e. all immediate predecessors
-	   of non-head block are in the same region.  */
+
+		/* Regions are SCCs with the exception of selective
+		   scheduling with pipelining of outer blocks enabled.
+		   So also check that immediate predecessors of a non-head
+		   block are in the same region.  */
 		FOR_EACH_EDGE (e, ei, bb-preds)
 		  {
 		/* Avoid creating of loop-carried dependencies through
-		   using topological odering in region.  */
-		if (BLOCK_TO_BB (bb-index)  BLOCK_TO_BB (e-src-index))
+		   using topological ordering in the region.  */
+		if (rgn == CONTAINING_RGN (e-src-index)
+			 BLOCK_TO_BB (bb-index)  BLOCK_TO_BB (e-src-index))
 		  add_dependee_for_func_arg (first_arg, e-src); 
 		  }
 	  }
diff --git a/gcc/testsuite/gcc.dg/pr60901.c b/gcc/testsuite/gcc.dg/pr60901.c
new file mode 100644
index 000..1350f16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr60901.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fselective-scheduling -fschedule-insns -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fno-tree-dominator-opts  } */
+
+extern int n;
+extern void bar (void);
+extern int baz (int);
+
+void
+foo (void)
+{
+  int i, j;
+  for (j = 0; j  n; j++)
+{
+  for (i = 1; i  j; i++)
+	bar ();
+  baz (0);
+}
+}

[PATCH] Fix PR 60866

2014-05-14 Thread Andrey Belevantsev


Hello,

As described in the PR in more details, this ICE is about the scheduler not 
being able to initialize its data structures for the new unconditional jump 
created when redirecting an edge and simplifying control flow.  It so 
happens that the new jump is the only unscheduled instruction being left so 
we need to pass some of its data from the old jump.  The difficulty is to 
support only this particular situation to retain the existing checking code 
and to pass down the required information to the point where it is needed.


I assume that we are able to use default parameters as I haven't heard 
otherwise.


Bootstrapped and tested on x86-64, OK for trunk and branches?  Branches 
should also be safe as we're fixing only very specific corner case (hey, we 
regressed four releases and nobody noticed until now).


Yours,
Andrey

gcc/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60866
* sel-sched-ir (sel_init_new_insn): New parameter old_seqno.
Default it to -1.  Pass it down to init_simplejump_data.
(init_simplejump_data): New parameter old_seqno.  Pass it down
to get_seqno_for_a_jump.
(get_seqno_for_a_jump): New parameter old_seqno.  Use it for
initializing new jump seqno as a last resort.  Add comment.
(sel_redirect_edge_and_branch): Save old seqno of the conditional
jump and pass it down to sel_init_new_insn.
(sel_redirect_edge_and_branch_force): Likewise.

testsuite/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60866
* gcc.dg/pr60866.c: New test.
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 868083b..3cba326 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -162,7 +162,7 @@ static void create_initial_data_sets (basic_block);
 static void free_av_set (basic_block);
 static void invalidate_av_set (basic_block);
 static void extend_insn_data (void);
-static void sel_init_new_insn (insn_t, int);
+static void sel_init_new_insn (insn_t, int, int = -1);
 static void finish_insns (void);
 
 /* Various list functions.  */
@@ -4007,9 +4007,10 @@ get_seqno_by_succs (rtx insn)
   return seqno;
 }
 
-/* Compute seqno for INSN by its preds or succs.  */
+/* Compute seqno for INSN by its preds or succs.  Use OLD_SEQNO to compute
+   seqno in corner cases.  */
 static int
-get_seqno_for_a_jump (insn_t insn)
+get_seqno_for_a_jump (insn_t insn, int old_seqno)
 {
   int seqno;
 
@@ -4065,8 +4066,16 @@ get_seqno_for_a_jump (insn_t insn)
   if (seqno  0)
 seqno = get_seqno_by_succs (insn);
 
-  gcc_assert (seqno = 0);
+  if (seqno  0)
+{
+  /* The only case where this could be here legally is that the only
+	 unscheduled insn was a conditional jump that got removed and turned
+	 into this unconditional one.  Initialize from the old seqno
+	 of that jump passed down to here.  */
+  seqno = old_seqno;
+}
 
+  gcc_assert (seqno = 0);
   return seqno;
 }
 
@@ -4246,22 +4255,24 @@ init_insn_data (insn_t insn)
 }
 
 /* This is used to initialize spurious jumps generated by
-   sel_redirect_edge ().  */
+   sel_redirect_edge ().  OLD_SEQNO is used for initializing seqnos
+   in corner cases within get_seqno_for_a_jump.  */
 static void
-init_simplejump_data (insn_t insn)
+init_simplejump_data (insn_t insn, int old_seqno)
 {
   init_expr (INSN_EXPR (insn), vinsn_create (insn, false), 0,
 	 REG_BR_PROB_BASE, 0, 0, 0, 0, 0, 0,
 	 vNULL, true, false, false,
 	 false, true);
-  INSN_SEQNO (insn) = get_seqno_for_a_jump (insn);
+  INSN_SEQNO (insn) = get_seqno_for_a_jump (insn, old_seqno);
   init_first_time_insn_data (insn);
 }
 
 /* Perform deferred initialization of insns.  This is used to process
-   a new jump that may be created by redirect_edge.  */
-void
-sel_init_new_insn (insn_t insn, int flags)
+   a new jump that may be created by redirect_edge.  OLD_SEQNO is used
+   for initializing simplejumps in init_simplejump_data.  */
+static void
+sel_init_new_insn (insn_t insn, int flags, int old_seqno)
 {
   /* We create data structures for bb when the first insn is emitted in it.  */
   if (INSN_P (insn)
@@ -4288,7 +4299,7 @@ sel_init_new_insn (insn_t insn, int flags)
   if (flags  INSN_INIT_TODO_SIMPLEJUMP)
 {
   extend_insn_data ();
-  init_simplejump_data (insn);
+  init_simplejump_data (insn, old_seqno);
 }
 
   gcc_assert (CONTAINING_RGN (BLOCK_NUM (insn))
@@ -5575,14 +5586,14 @@ sel_merge_blocks (basic_block a, basic_block b)
 }
 
 /* A wrapper for redirect_edge_and_branch_force, which also initializes
-   data structures for possibly created bb and insns.  Returns the newly
-   added bb or NULL, when a bb was not needed.  */
+   data structures for possibly created bb and insns.  */
 void
 sel_redirect_edge_and_branch_force (edge e, basic_block to)
 {
   basic_block jump_bb, src, orig_dest = e-dest;
   int prev_max_uid;
   rtx jump;
+  int old_seqno = -1;
 
   /* This function is now used only

Re: [PATCH] Fix PR 60901

2014-05-14 Thread Andrey Belevantsev


On 14.05.2014 13:09, Uros Bizjak wrote:

On Wed, May 14, 2014 at 10:57 AM, Andrey Belevantsev a...@ispras.ru wrote:


This ICE comes from the ix86_dependencies_evaluation_hook code assumption
that any scheduling region will be connected.  This assumption is not
correct in case of the outer loops pipelining of the selective scheduler as
explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3.
Trying to add dependencies between insns from the different scheduling
regions results in a segfault within the dependency analyzer code.

The fix is to adjust the code to account for the situation when basic
block's predecessors do not belong to the same scheduling region.

Bootstrapped and tested on x86-64, OK for trunk?  Branches?  The fix is low
risk as the additional test should always be true for the regular scheduler.


I don't know all scheduler details, so your opinion counts there.
Let's put this fix to mainline first and after a week without
problems, backport it to all release branches.


gcc/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60901

 * config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that
bb predecessor belongs to the same scheduling region.  Adjust comment.

testsuite/
2014-05-14  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60901
 * gcc.dg/pr60901.c: New test.


+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fselective-scheduling -fschedule-insns
-fsel-sched-pipelining -fsel-sched-pipelining-outer-loops
-fno-tree-dominator-opts  } */
+

As this is clearly a target bug, let's put the test in gcc.target/i386
directory. You can remove target selector, and the test will run for
64bit and 32bit targets automatically.


Thanks for noticing, I forgot about this.  Updated the test and committed 
in 210414.


Andrey

Re: [PATCH] Fix PR 60268

2014-02-25 Thread Andrey Belevantsev


On 25.02.2014 13:14, Andreas Schwab wrote:

Andrey Belevantsev a...@ispras.ru writes:


Fixed by placing the initialization properly at the end of sched_rgn_init
and also moving the check for sched_pressure != NONE outside of the if
statement in schedule_region as discussed in the PR trail with Jakub.

Bootstrapped and tested on x86-64, ok?


This breaks m68k:

$ gcc/xgcc -Bgcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 
-flive-range-shrinkage -c -o pr60268.o 
../gcc/testsuite/gcc.c-torture/compile/pr60268.c
../gcc/testsuite/gcc.c-torture/compile/pr60268.c: In function ‘f’:
../gcc/testsuite/gcc.c-torture/compile/pr60268.c:6:1: internal compiler error: 
in m68k_sched_issue_rate, at config/m68k/m68k.c:5978


The patch itself has nothing to do with the ICE, probably it means that 
-flive-range-shrinkage was never tried without tuning on m68k, because in 
m68k.c there is


630   /* Setup scheduling options.  */
631   if (TUNE_CFV1)
632 m68k_sched_cpu = CPU_CFV1;
633   else if (TUNE_CFV2)
634 m68k_sched_cpu = CPU_CFV2;
635   else if (TUNE_CFV3)
636 m68k_sched_cpu = CPU_CFV3;
637   else if (TUNE_CFV4)
638 m68k_sched_cpu = CPU_CFV4;
639   else
640 {
641   m68k_sched_cpu = CPU_UNKNOWN;
642   flag_schedule_insns = 0;
643   flag_schedule_insns_after_reload = 0;
644   flag_modulo_sched = 0;
645 }

And on line 641 m68k_sched_cpu set to CPU_UNKNOWN is causing the ICE.

I guess you need to turn the live range shrinkage off in that piece of 
code, the below patch fixes the ICE for me:


diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index f20d071..ea1bcd4 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -642,6 +642,7 @@ m68k_option_override (void)
   flag_schedule_insns = 0;
   flag_schedule_insns_after_reload = 0;
   flag_modulo_sched = 0;
+  flag_live_range_shrinkage = 0;
 }

   if (m68k_sched_cpu != CPU_UNKNOWN)


Yours,
Andrey


0xbabc8b m68k_sched_issue_rate
 ../../gcc/config/m68k/m68k.c:5978
0xc3d9dc sched_init()
 ../../gcc/haifa-sched.c:6657
0xc3eecf haifa_sched_init()
 ../../gcc/haifa-sched.c:6719
0x8e807c schedule_insns
 ../../gcc/sched-rgn.c:3407
0x8e87cb schedule_insns
 ../../gcc/sched-rgn.c:3401
0x8e87cb rest_of_handle_live_range_shrinkage
 ../../gcc/sched-rgn.c:3614
0x8e87cb execute
 ../../gcc/sched-rgn.c:3704

Andreas.

[PATCH, committed] Fix PR 60292

2014-02-24 Thread Andrey Belevantsev


Hello,

This patch fixes the ICE on assert recently introduced by me and by itself 
is a continuation of PR 57422.  When resetting the target availability bit, 
we do not always recompute it as this means renaming support and it is 
expensive.  Thus, we cannot rely on the recomputation to check that the 
fence insn will always get the bit right.  So the patch avoids resetting 
the bit for the fence insn.


Bootstrapped and tested on x86-64 with selective scheduling enabled and on 
ia64, Dominique has kindly tested on darwin.


Committed after offline discussion with Alexander.

Andrey
Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 208108)
--- gcc/ChangeLog   (revision 208109)
***
*** 1,3 
--- 1,9 
+ 2014-02-25  Andrey Belevantsev  a...@ispras.ru
+ 
+   PR rtl-optimization/60292
+   * sel-sched.c (fill_vec_av_set): Do not reset target availability
+   bit fot the fence instruction.
+ 
  2014-02-24  Alangi Derick  alangider...@gmail.com
  
* calls.h: Fix typo in comment.
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 208108)
--- gcc/sel-sched.c (revision 208109)
*** fill_vec_av_set (av_set_t av, blist_t bn
*** 3823,3829 
  
/* If insn was already scheduled on the current fence,
 set TARGET_AVAILABLE to -1 no matter what expr's attribute says.  */
!   if (vinsn_vec_has_expr_p (vec_target_unavailable_vinsns, expr))
target_available = -1;
  
/* If the availability of the EXPR is invalidated by the insertion of
--- 3823,3830 
  
/* If insn was already scheduled on the current fence,
 set TARGET_AVAILABLE to -1 no matter what expr's attribute says.  */
!   if (vinsn_vec_has_expr_p (vec_target_unavailable_vinsns, expr)
!  !fence_insn_p)
target_available = -1;
  
/* If the availability of the EXPR is invalidated by the insertion of

Print out final scheduling time for sel-sched

2014-02-24 Thread Andrey Belevantsev


Hello,

While looking at PR 60086 I noticed that we do not print the total 
scheduling time for the region in selective scheduler as we do for the 
Haifa scheduler.  This is helpful for the PRs analysis and needs only the 
tiny adjustment so I went ahead and installed the following.  Bootstrapped 
and tested with the patch for PR 60292.


The patch does not actually fixes 60086, just helps with analyzing it, so 
no mentions of the PR in the changelog.


Andrey
Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 208109)
--- gcc/ChangeLog   (revision 208110)
***
*** 1,4 
--- 1,11 
  2014-02-25  Andrey Belevantsev  a...@ispras.ru
+   
+   * sel-sched.c (calculate_new_fences): New parameter ptime.
+   Calculate it as a maximum over all fence cycles.
+   (sel_sched_region_2): Adjust the call to calculate_new_fences.
+   Print the final schedule timing when sched_verbose.
+ 
+ 2014-02-25  Andrey Belevantsev  a...@ispras.ru
  
PR rtl-optimization/60292
* sel-sched.c (fill_vec_av_set): Do not reset target availability
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 208109)
--- gcc/sel-sched.c (revision 208110)
*** find_min_max_seqno (flist_t fences, int 
*** 7467,7478 
  }
  }
  
! /* Calculate new fences from FENCES.  */
  static flist_t
! calculate_new_fences (flist_t fences, int orig_max_seqno)
  {
flist_t old_fences = fences;
struct flist_tail_def _new_fences, *new_fences = _new_fences;
  
flist_tail_init (new_fences);
for (; fences; fences = FLIST_NEXT (fences))
--- 7467,7479 
  }
  }
  
! /* Calculate new fences from FENCES.  Write the current time to PTIME.  */
  static flist_t
! calculate_new_fences (flist_t fences, int orig_max_seqno, int *ptime)
  {
flist_t old_fences = fences;
struct flist_tail_def _new_fences, *new_fences = _new_fences;
+   int max_time = 0;
  
flist_tail_init (new_fences);
for (; fences; fences = FLIST_NEXT (fences))
*** calculate_new_fences (flist_t fences, in
*** 7501,7509 
--- 7502,7512 
  }
else
  extract_new_fences_from (fences, new_fences, orig_max_seqno);
+   max_time = MAX (max_time, FENCE_CYCLE (fence));
  }
  
flist_clear (old_fences);
+   *ptime = max_time;
return FLIST_TAIL_HEAD (new_fences);
  }
  
*** static void
*** 7558,7563 
--- 7561,7567 
  sel_sched_region_2 (int orig_max_seqno)
  {
int highest_seqno_in_use = orig_max_seqno;
+   int max_time = 0;
  
stat_bookkeeping_copies = 0;
stat_insns_needed_bookkeeping = 0;
*** sel_sched_region_2 (int orig_max_seqno)
*** 7573,7591 
  
find_min_max_seqno (fences, min_seqno, max_seqno);
schedule_on_fences (fences, max_seqno, scheduled_insns_tailp);
!   fences = calculate_new_fences (fences, orig_max_seqno);
highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno,
highest_seqno_in_use,
scheduled_insns);
  }
  
if (sched_verbose = 1)
! sel_print (Scheduled %d bookkeeping copies, %d insns needed 
!bookkeeping, %d insns renamed, %d insns substituted\n,
!stat_bookkeeping_copies,
!stat_insns_needed_bookkeeping,
!stat_renamed_scheduled,
!stat_substitutions_total);
  }
  
  /* Schedule a region.  When pipelining, search for possibly never scheduled
--- 7577,7598 
  
find_min_max_seqno (fences, min_seqno, max_seqno);
schedule_on_fences (fences, max_seqno, scheduled_insns_tailp);
!   fences = calculate_new_fences (fences, orig_max_seqno, max_time);
highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno,
highest_seqno_in_use,
scheduled_insns);
  }
  
if (sched_verbose = 1)
! {
!   sel_print (Total scheduling time: %d cycles\n, max_time);
!   sel_print (Scheduled %d bookkeeping copies, %d insns needed 
!bookkeeping, %d insns renamed, %d insns substituted\n,
!stat_bookkeeping_copies,
!stat_insns_needed_bookkeeping,
!stat_renamed_scheduled,
!stat_substitutions_total);
! }
  }
  
  /* Schedule a region.  When pipelining, search for possibly never scheduled

[PATCH] Fix PR 60268

2014-02-20 Thread Andrey Belevantsev


Hello,

While fixing PR 58960 I forgot about single-block regions placing the 
initialization of the new nr_regions_initial variable in the wrong place. 
Thus for single block regions we ended up with nr_regions = 1 and 
nr_regions_initial = 0 and effectively turned off sched-pressure 
immediately.  No worries for the usual scheduling path but with the 
-flive-range-shrinkage we have broke an assert that sched-pressure is in 
the specific mode.


Fixed by placing the initialization properly at the end of sched_rgn_init 
and also moving the check for sched_pressure != NONE outside of the if 
statement in schedule_region as discussed in the PR trail with Jakub.


Bootstrapped and tested on x86-64, ok?

Andrey

gcc/

2014-02-21  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60268
* sched-rgn.c (haifa_find_rgns): Move the nr_regions_initial init to ...
(sched_rgn_init) ... here.
(schedule_region): Check for SCHED_PRESSURE_NONE earlier.

testsuite/

2014-02-21  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/60268
* gcc.c-torture/compile/pr60268.c: New test.
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 0573b6a..dc6fa16 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -1067,7 +1067,6 @@ haifa_find_rgns (void)
 	BLOCK_TO_BB (bb-index) = 0;
   }
 
-  nr_regions_initial = nr_regions;
   free (max_hdr);
   free (degree);
   free (stack);
@@ -2997,10 +2996,10 @@ schedule_region (int rgn)
 
   /* Do not support register pressure sensitive scheduling for the new regions
  as we don't update the liveness info for them.  */
-  if (rgn = nr_regions_initial)
+  if (sched_pressure != SCHED_PRESSURE_NONE
+   rgn = nr_regions_initial)
 {
-  if (sched_pressure != SCHED_PRESSURE_NONE)
-	free_global_sched_pressure_data ();
+  free_global_sched_pressure_data ();
   sched_pressure = SCHED_PRESSURE_NONE;
 }
 
@@ -3166,6 +3165,7 @@ sched_rgn_init (bool single_blocks_p)
 
   RGN_BLOCKS (nr_regions) = (RGN_BLOCKS (nr_regions - 1) +
 			 RGN_NR_BLOCKS (nr_regions - 1));
+  nr_regions_initial = nr_regions;
 }
 
 /* Free data structures for region scheduling.  */
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr60268.c b/gcc/testsuite/gcc.c-torture/compile/pr60268.c
new file mode 100644
index 000..c3a6f94
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr60268.c
@@ -0,0 +1,6 @@
+/* { dg-options -flive-range-shrinkage } */
+void f()
+{
+  int i = 0;
+  void *p = 0;
+}

Re: [PATCH] Fix PR 58960

2014-02-06 Thread Andrey Belevantsev


On 30.01.2014 9:42, Andrey Belevantsev wrote:

Hello,


Ping.



As detailed in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58960#c6, we
fail to use the DF liveness info in the register pressure sensitive
scheduling for the new blocks as we do not properly compute it in this
case.  The patch fixes this by avoiding to use the sched-pressure for the
new regions, as currently these are only ia64 recovery blocks and supposed
to be cold.  In the case we'd get other cases of the new blocks, this may
be reconsidered.  The other options of computing the DF info sketched at
the above link do not seem plausible for this stage.

Bootstrapped and tested on ia64, also tested by Andreas Schwab on ia64 (see
PR log).  OK for trunk?

Andrey

2013-01-30  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/58960

 * haifa-sched.c (alloc_global_sched_pressure_data): New, factored out
from ...
 (sched_init) ... here.
 (free_global_sched_pressure_data): New, factored out from ...
 (sched_finish): ... here.
 * sched-int.h (free_global_sched_pressure_data): Declare.
 * sched-rgn.c (nr_regions_initial): New static global.
 (haifa_find_rgns): Initialize it.
 (schedule_region): Disable sched-pressure for the newly generated
regions.

Re: PATCH: PR tree-optimization/60024: global-buffer-overflow in init_regs_for_mode

2014-02-03 Thread Andrey Belevantsev


Hello,

On 01.02.2014 19:16, H.J. Lu wrote:

Hi,

init_regs_for_mode accesses global_regs before checking if regno is OK
for a mode, which leads to buffer overflow.  This patch moves
HARD_REGNO_MODE_OK for checking global_regs.  There is no functional
change.  OK for trunk?


This is fine, you'd need to change tree-optimization/60024 to 
rtl-optimization/60024 in the ChangeLog.  Thanks for doing this.


Yours,
Andrey



Thanks.


H.J.
---
2014-02-01  H.J. Lu  hongjiu...@intel.com

PR tree-optimization/60024
* sel-sched.c (init_regs_for_mode): Check if mode is OK first.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index ad4a0aa..7b1a183 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -1116,9 +1116,16 @@ init_regs_for_mode (enum machine_mode mode)

for (cur_reg = 0; cur_reg  FIRST_PSEUDO_REGISTER; cur_reg++)
  {
-  int nregs = hard_regno_nregs[cur_reg][mode];
+  int nregs;
int i;

+  /* See whether it accepts all modes that occur in
+ original insns.  */
+  if (! HARD_REGNO_MODE_OK (cur_reg, mode))
+continue;
+
+  nregs = hard_regno_nregs[cur_reg][mode];
+
for (i = nregs - 1; i = 0; --i)
  if (fixed_regs[cur_reg + i]
  || global_regs[cur_reg + i]
@@ -1140,11 +1147,6 @@ init_regs_for_mode (enum machine_mode mode)
if (i = 0)
  continue;

-  /* See whether it accepts all modes that occur in
- original insns.  */
-  if (! HARD_REGNO_MODE_OK (cur_reg, mode))
-continue;
-
if (HARD_REGNO_CALL_PART_CLOBBERED (cur_reg, mode))
  SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode],
cur_reg);

Re: [PATCH] Fix PR 57662

2014-02-03 Thread Andrey Belevantsev


Hello,

On 14.08.2013 17:12, Andrey Belevantsev wrote:

Hello,

As noted in Bugzilla, the problem is this PR is the rare case of removing
the unconditional jump that is used for traversing CFG when simplifying
control flow.  We rescan the remaining successors when we know the flow
below the current point has changed, and when we have removed the jump we
are currently on, we segfault.  The fix is just to break out of successor
iterator loop as in the unconditional jump case we have already visited the
only successor.


I've recently found another test case for this PR which exhibits the 
segfault in the other spot of the CFG traversal driver -- we try to mark 
the basic block as visited when it is already gone.  Fixed with the below 
patch, the test was reduced as much as multidelta could.  Bootstrapped and 
tested on ia64, committed to trunk.


Yours,
Andrey

gcc/
PR rtl-optimization/57662
* sel-sched.c (code_motion_path_driver): Do not mark already not
existing blocks in the visiting bitmap.

testsuite/
PR rtl-optimization/57662
* g++.dg/pr57662.C: New test.




Bootstrapped and tested on x86-64, ok for trunk?  If yes, Alexander, could
you please commit this for me?

Andrey

2013-08-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/57662

gcc/
 * sel-sched.c (code_motion_process_successors): When the current insn
 is removed after the recursive traversal, break from the loop.
 Add comments and debug printouts.

testsuite/

 * gcc.dg/pr57662.c: New test.



Index: gcc/ChangeLog
===
*** gcc/ChangeLog	(revision 207420)
--- gcc/ChangeLog	(revision 207422)
***
*** 1,5 
--- 1,11 
  2014-02-03  Andrey Belevantsev  a...@ispras.ru
  
+ 	PR rtl-optimization/57662
+ 	* sel-sched.c (code_motion_path_driver): Do not mark already not
+ 	existing blocks in the visiting bitmap.
+ 
+ 2014-02-03  Andrey Belevantsev  a...@ispras.ru
+ 
  	* sel-sched-ir.c (sel_gen_insn_from_expr_after): Reset INSN_DELETED_P
  	on the insn being emitted.
  
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog	(revision 207420)
--- gcc/testsuite/ChangeLog	(revision 207422)
***
*** 1,3 
--- 1,8 
+ 2014-02-03  Andrey Belevantsev  a...@ispras.ru
+ 
+ 	PR rtl-optimization/57662
+ 	* g++.dg/pr57662.C: New test.
+ 
  2014-02-02  Bill Schmidt  wschm...@linux.vnet.ibm.com
  
  	* gcc.dg/vmx/3b-15.c: Remove special handling for little endian.
Index: gcc/testsuite/g++.dg/pr57662.C
===
*** gcc/testsuite/g++.dg/pr57662.C	(revision 0)
--- gcc/testsuite/g++.dg/pr57662.C	(revision 207422)
***
*** 0 
--- 1,339 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+ /* { dg-options -O2 -fselective-scheduling2 -fsel-sched-pipelining } */
+ 
+ extern C {
+ 	typedef struct _IO_FILE FILE;
+ 	extern int putc(int __c, FILE * __stream);
+ 	extern int strcmp(__const char *__s1, __const char *__s2) throw()
+ 	__attribute__ ((__pure__)) __attribute__ ((__nonnull__(1, 2)));
+ } typedef union tree_node *tree;
+ struct gcc_options {
+ 	int x_flag_openmp;
+ };
+ extern struct gcc_options global_options;
+ struct ht_identifier {
+ 	const unsigned char *str;
+ };
+ enum cpp_ttype {
+ 	CPP_SEMICOLON, CPP_NAME
+ };
+ struct vl_embed {
+ };
+ struct va_heap {
+ };
+ struct va_gc {
+ 	typedef vl_embed default_layout;
+ };
+ template  typename T, typename A = va_heap, typename L =
+ typename A::default_layout  struct vec {
+ };
+ enum tree_code {
+ 	ERROR_MARK,
+ 	IDENTIFIER_NODE,
+ 	OMP_SIMD,
+ 	CILK_SIMD,
+ 	MAX_TREE_CODES
+ };
+ struct tree_identifier {
+ 	struct ht_identifier
+ 	 id;
+ };
+ union tree_node {
+ 	struct tree_identifier
+ 	 identifier;
+ };
+ inline tree
+ tree_check(tree __t, const char *__f, int __l, const char *__g, tree_code __c)
+ {
+ }
+ 
+ extern tree chainon(tree, tree);
+ extern vec  tree, va_gc  *make_tree_vector(void);
+ typedef unsigned long omp_clause_mask;
+ enum c_omp_clause_split {
+ 	C_OMP_CLAUSE_SPLIT_TARGET = 0, C_OMP_CLAUSE_SPLIT_COUNT
+ };
+ typedef struct cxx_saved_binding {
+ 	tree attributes;
+ } cp_decl_specifier_seq;
+ typedef enum pragma_kind {
+ 	PRAGMA_NONE = 0, PRAGMA_OMP_DECLARE_REDUCTION, PRAGMA_OMP_TARGET
+ } pragma_kind;
+ typedef enum pragma_omp_clause {
+ 	PRAGMA_OMP_CLAUSE_NONE =
+ 	0, PRAGMA_OMP_CLAUSE_DEVICE, PRAGMA_OMP_CLAUSE_IF,
+ 	PRAGMA_OMP_CLAUSE_MAP
+ } pragma_omp_clause;
+ typedef struct cp_token {
+ 	enum cpp_ttype type:8;
+ 	union cp_token_value {
+ 		tree value;
+ 	} u;
+ } cp_token;
+ typedef struct cp_token *cp_token_position;
+ typedef struct cp_lexer {
+ 	cp_token_position next_token;
+ 	bool debugging_p;
+ 	cp_lexer *lexer;
+ } cp_parser;
+ static FILE *cp_lexer_debug_stream;
+ static inline bool cp_lexer_debugging_p(cp_lexer * lexer)
+ {
+ 	return lexer

[PATCH] Reset INSN_DELETED_P when reusing insns in the selective scheduler

2014-01-31 Thread Andrey Belevantsev


Hello,

While testing PR 58960 on ia64, I've enabled selective scheduling for 
bootstrap and found two small issues.  One is the other instance of PR 
57662 and another one is a miscompile that happens because we emit an insn 
with INSN_DELETED_P flag set; the insn then obviously does not get output 
in the assembly file.  The reason for this is that we've cached this insn 
in the transformation cache and reused it after it was deleted from the 
insn stream.  The patch resetting the flag qualifies as obvious, but I've 
discussed it additionally with Alexander, and will be committing it 
shortly.  (The bootstrap fail is actually a regression, also I don't see 
how to make a (small) test case.)


Steven, we've been adding insns through add_insn_after and since you've 
touched the code last, I wanted to ask you two things.  First, do you want 
an assert there that we do not add INSN_DELETED_P insns (like we have for 
the after parameter in add_insn_after_nobb?)  I have added one and it 
didn't fail on the x86-64 bootstrap, which is no wonder since this one is 
directly used by reorg and sel-sched only.  Second, the comment for 
add_insn_after about the BB parameter does not apply -- the code 
immediately does


3871 void
3872 add_insn_after (rtx insn, rtx after, basic_block bb)
3873 {
3874   add_insn_after_nobb (insn, after);
3875   if (!BARRIER_P (after)
3876!BARRIER_P (insn)
3877(bb = BLOCK_FOR_INSN (after)))

so the bb parameter gets overwritten.  Should that be fixed in line with 
add_insn_before code?


Yours,
Andrey

2013-01-31  Andrey Belevantsev  a...@ispras.ru

	* sel-sched-ir.c (sel_gen_insn_from_expr_after): Reset INSN_DELETED_P on 
the insn being emitted.
Index: sel-sched-ir.c
===
*** sel-sched-ir.c	(revision 207299)
--- sel-sched-ir.c	(working copy)
*** sel_gen_insn_from_expr_after (expr_t exp
*** 1398,1403 
--- 1398,1408 
emit_expr = set_insn_init (expr, vinsn ? vinsn : EXPR_VINSN (expr),
   seqno);
insn = EXPR_INSN_RTX (emit_expr);
+ 
+   /* The insn may come from the transformation cache, which may hold already
+  deleted insns, so mark it as not deleted.  */
+   INSN_DELETED_P (insn) = 0;
+ 
add_insn_after (insn, after, BLOCK_FOR_INSN (insn));
  
flags = INSN_INIT_TODO_SSID;

[PATCH] Fix PR 58960

2014-01-29 Thread Andrey Belevantsev


Hello,

As detailed in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58960#c6, we 
fail to use the DF liveness info in the register pressure sensitive 
scheduling for the new blocks as we do not properly compute it in this 
case.  The patch fixes this by avoiding to use the sched-pressure for the 
new regions, as currently these are only ia64 recovery blocks and supposed 
to be cold.  In the case we'd get other cases of the new blocks, this may 
be reconsidered.  The other options of computing the DF info sketched at 
the above link do not seem plausible for this stage.


Bootstrapped and tested on ia64, also tested by Andreas Schwab on ia64 (see 
PR log).  OK for trunk?


Andrey

2013-01-30  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/58960

* haifa-sched.c (alloc_global_sched_pressure_data): New, factored out 
from ...
(sched_init) ... here.
(free_global_sched_pressure_data): New, factored out from ...
(sched_finish): ... here.
* sched-int.h (free_global_sched_pressure_data): Declare.
* sched-rgn.c (nr_regions_initial): New static global.
(haifa_find_rgns): Initialize it.
(schedule_region): Disable sched-pressure for the newly generated 
regions.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 4d51984..45398bf 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -6553,6 +6553,53 @@ setup_sched_dump (void)
 		? stderr : dump_file);
 }
 
+/* Allocate data for register pressure sensitive scheduling.  */
+static void
+alloc_global_sched_pressure_data (void)
+{
+  if (sched_pressure != SCHED_PRESSURE_NONE)
+{
+  int i, max_regno = max_reg_num ();
+
+  if (sched_dump != NULL)
+	/* We need info about pseudos for rtl dumps about pseudo
+	   classes and costs.  */
+	regstat_init_n_sets_and_refs ();
+  ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
+  sched_regno_pressure_class
+	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
+  for (i = 0; i  max_regno; i++)
+	sched_regno_pressure_class[i]
+	  = (i  FIRST_PSEUDO_REGISTER
+	 ? ira_pressure_class_translate[REGNO_REG_CLASS (i)]
+	 : ira_pressure_class_translate[reg_allocno_class (i)]);
+  curr_reg_live = BITMAP_ALLOC (NULL);
+  if (sched_pressure == SCHED_PRESSURE_WEIGHTED)
+	{
+	  saved_reg_live = BITMAP_ALLOC (NULL);
+	  region_ref_regs = BITMAP_ALLOC (NULL);
+	}
+}
+}
+
+/*  Free data for register pressure sensitive scheduling.  */
+void
+free_global_sched_pressure_data (void)
+{
+  if (sched_pressure != SCHED_PRESSURE_NONE)
+{
+  if (regstat_n_sets_and_refs != NULL)
+	regstat_free_n_sets_and_refs ();
+  if (sched_pressure == SCHED_PRESSURE_WEIGHTED)
+	{
+	  BITMAP_FREE (region_ref_regs);
+	  BITMAP_FREE (saved_reg_live);
+	}
+  BITMAP_FREE (curr_reg_live);
+  free (sched_regno_pressure_class);
+}
+}
+
 /* Initialize some global state for the scheduler.  This function works
with the common data shared between all the schedulers.  It is called
from the scheduler specific initialization routine.  */
@@ -6656,29 +6703,7 @@ sched_init (void)
   if (targetm.sched.init_global)
 targetm.sched.init_global (sched_dump, sched_verbose, get_max_uid () + 1);
 
-  if (sched_pressure != SCHED_PRESSURE_NONE)
-{
-  int i, max_regno = max_reg_num ();
-
-  if (sched_dump != NULL)
-	/* We need info about pseudos for rtl dumps about pseudo
-	   classes and costs.  */
-	regstat_init_n_sets_and_refs ();
-  ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
-  sched_regno_pressure_class
-	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
-  for (i = 0; i  max_regno; i++)
-	sched_regno_pressure_class[i]
-	  = (i  FIRST_PSEUDO_REGISTER
-	 ? ira_pressure_class_translate[REGNO_REG_CLASS (i)]
-	 : ira_pressure_class_translate[reg_allocno_class (i)]);
-  curr_reg_live = BITMAP_ALLOC (NULL);
-  if (sched_pressure == SCHED_PRESSURE_WEIGHTED)
-	{
-	  saved_reg_live = BITMAP_ALLOC (NULL);
-	  region_ref_regs = BITMAP_ALLOC (NULL);
-	}
-}
+  alloc_global_sched_pressure_data ();
 
   curr_state = xmalloc (dfa_state_size);
 }
@@ -6777,18 +6802,7 @@ void
 sched_finish (void)
 {
   haifa_finish_h_i_d ();
-  if (sched_pressure != SCHED_PRESSURE_NONE)
-{
-  if (regstat_n_sets_and_refs != NULL)
-	regstat_free_n_sets_and_refs ();
-  if (sched_pressure == SCHED_PRESSURE_WEIGHTED)
-	{
-	  BITMAP_FREE (region_ref_regs);
-	  BITMAP_FREE (saved_reg_live);
-	}
-  BITMAP_FREE (curr_reg_live);
-  free (sched_regno_pressure_class);
-}
+  free_global_sched_pressure_data ();
   free (curr_state);
 
   if (targetm.sched.finish_global)
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 3b1106f..b5481b9 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1337,6 +1337,7 @@ extern void debug_ds (ds_t);
 extern void initialize_live_range_shrinkage (void);
 extern void finish_live_range_shrinkage (void

Re: [PATCH, committed] Fix PR 57422

2014-01-21 Thread Andrey Belevantsev


On 23.12.2013 10:52, Andrey Belevantsev wrote:

Hello,

As described in the PR, the ICE reason was the typo made when introducing
calls to add_hard_reg_set.  Fixed by the first attached patch, bootstrapped
and tested on both ia64 and x86_64, committed as obvious.

The test case is very sensitive to the scheduler decisions (e.g. it didn't
fail on trunk but only on the revision reported for me), so instead of
adding the test I have put in the code two asserts checking that we can
always schedule the fence instruction as is.  This hunk was tested together
with the first but committed separately.

The first patch can be safely committed to 4.8, the second can stay on
trunk only.  Jakub, will it be fine with you?


Now the first hunk is also committed to 4.8 and 4.7.

Andrey

Re: [PATCH, committed] Fix PR 57422

2013-12-27 Thread Andrey Belevantsev


On 23.12.2013 16:24, H.J. Lu wrote:

On Sun, Dec 22, 2013 at 10:52 PM, Andrey Belevantsev a...@ispras.ru wrote:

Hello,

As described in the PR, the ICE reason was the typo made when introducing
calls to add_hard_reg_set.  Fixed by the first attached patch, bootstrapped
and tested on both ia64 and x86_64, committed as obvious.

The test case is very sensitive to the scheduler decisions (e.g. it didn't
fail on trunk but only on the revision reported for me), so instead of
adding the test I have put in the code two asserts checking that we can
always schedule the fence instruction as is.  This hunk was tested together
with the first but committed separately.



Testcase is very small. Why not add it?


Frankly, I think that the chances of this test uncovering similar issues in 
the future are very small.  It needs lots of options to make it trigger and 
even with this a specific revision.  The chance of triggering the asserts I 
added on another code is much higher.  In the past, I have also avoided to 
add tests that require 5+ options to trigger the issue, adding only those 
that have some hope on more ore less reliably reproducing the required 
issue.  The best solution of course is having an infrastructure to test the 
specific scheduler decisions, which we don't have.


You are welcome to add the test if you feel so strongly about us needing it.

Andrey

[PATCH, committed] Fix PR 57422

2013-12-22 Thread Andrey Belevantsev


Hello,

As described in the PR, the ICE reason was the typo made when introducing 
calls to add_hard_reg_set.  Fixed by the first attached patch, bootstrapped 
and tested on both ia64 and x86_64, committed as obvious.


The test case is very sensitive to the scheduler decisions (e.g. it didn't 
fail on trunk but only on the revision reported for me), so instead of 
adding the test I have put in the code two asserts checking that we can 
always schedule the fence instruction as is.  This hunk was tested together 
with the first but committed separately.


The first patch can be safely committed to 4.8, the second can stay on 
trunk only.  Jakub, will it be fine with you?


Andrey
Index: gcc/ChangeLog
===
*** gcc/ChangeLog	(revision 206172)
--- gcc/ChangeLog	(revision 206173)
***
*** 1,3 
--- 1,9 
+ 2013-12-23  Andrey Belevantsev  a...@ispras.ru
+ 
+ 	PR rtl-optimization/57422
+ 	* sel-sched.c (mark_unavailable_hard_regs): Fix typo when calling
+ 	add_to_hard_reg_set.
+ 
  2013-12-20  Sharad Singhai  sing...@google.com
  
  	* Makefile.in: Add optinfo.texi.
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c	(revision 206172)
--- gcc/sel-sched.c	(revision 206173)
*** mark_unavailable_hard_regs (def_t def, s
*** 1253,1259 
  
if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
  add_to_hard_reg_set (reg_rename_p-unavailable_hard_regs, 
! 			 Pmode, HARD_FRAME_POINTER_IS_FRAME_POINTER);
  }
  
  #ifdef STACK_REGS
--- 1253,1259 
  
if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
  add_to_hard_reg_set (reg_rename_p-unavailable_hard_regs, 
! 			 Pmode, HARD_FRAME_POINTER_REGNUM);
  }
  
  #ifdef STACK_REGS
Index: gcc/ChangeLog
===
*** gcc/ChangeLog	(revision 206173)
--- gcc/ChangeLog	(revision 206174)
***
*** 1,6 
--- 1,12 
  2013-12-23  Andrey Belevantsev  a...@ispras.ru
  
  	PR rtl-optimization/57422
+ 	* sel-sched.c (fill_vec_av_set): Assert that the fence insn
+ 	can always be scheduled in its current form.
+ 
+ 2013-12-23  Andrey Belevantsev  a...@ispras.ru
+ 
+ 	PR rtl-optimization/57422
  	* sel-sched.c (mark_unavailable_hard_regs): Fix typo when calling
  	add_to_hard_reg_set.
  
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c	(revision 206173)
--- gcc/sel-sched.c	(revision 206174)
*** fill_vec_av_set (av_set_t av, blist_t bn
*** 3801,3806 
--- 3801,3807 
signed char target_available;
bool is_orig_reg_p = true;
int need_cycles, new_prio;
+   bool fence_insn_p = INSN_UID (insn) == INSN_UID (FENCE_INSN (fence));
  
/* Don't allow any insns other than from SCHED_GROUP if we have one.  */
if (FENCE_SCHED_NEXT (fence)  insn != FENCE_SCHED_NEXT (fence))
*** fill_vec_av_set (av_set_t av, blist_t bn
*** 3855,3863 
if (sched_verbose = 4)
  sel_print (Expr %d has no suitable target register\n,
 INSN_UID (insn));
!   continue;
  }
  
/* Filter expressions that need to be renamed or speculated when
  	 pipelining, because compensating register copies or speculation
  	 checks are likely to be placed near the beginning of the loop,
--- 3856,3871 
if (sched_verbose = 4)
  sel_print (Expr %d has no suitable target register\n,
 INSN_UID (insn));
! 
! 	  /* A fence insn should not get here.  */
! 	  gcc_assert (!fence_insn_p);
! 	  continue;
  }
  
+   /* At this point a fence insn should always be available.  */
+   gcc_assert (!fence_insn_p
+ 		  || INSN_UID (FENCE_INSN (fence)) == INSN_UID (EXPR_INSN_RTX (expr)));
+ 
/* Filter expressions that need to be renamed or speculated when
  	 pipelining, because compensating register copies or speculation
  	 checks are likely to be placed near the beginning of the loop,

[PATCH] Fix PR 57662

2013-08-14 Thread Andrey Belevantsev


Hello,

As noted in Bugzilla, the problem is this PR is the rare case of removing 
the unconditional jump that is used for traversing CFG when simplifying 
control flow.  We rescan the remaining successors when we know the flow 
below the current point has changed, and when we have removed the jump we 
are currently on, we segfault.  The fix is just to break out of successor 
iterator loop as in the unconditional jump case we have already visited the 
only successor.


Bootstrapped and tested on x86-64, ok for trunk?  If yes, Alexander, could 
you please commit this for me?


Andrey

2013-08-14  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/57662

gcc/
* sel-sched.c (code_motion_process_successors): When the current insn
is removed after the recursive traversal, break from the loop.
Add comments and debug printouts.

testsuite/

* gcc.dg/pr57662.c: New test.   

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index fb9386f..a7b8897 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -6424,10 +6424,23 @@ code_motion_process_successors (insn_t insn, av_set_t orig_ops,
 res = b;
 
   /* We have simplified the control flow below this point.  In this case,
- the iterator becomes invalid.  We need to try again.  */
+ the iterator becomes invalid.  We need to try again.
+	 If we have removed the insn itself, it could be only an
+	 unconditional jump.  Thus, do not rescan but break immediately --
+	 we have already visited the only successor block.  */
+  if (!BLOCK_FOR_INSN (insn))
+	{
+	  if (sched_verbose = 6)
+	sel_print (Not doing rescan: already visited the only successor
+		of block %d\n, old_index);
+	  break;
+	}
   if (BLOCK_FOR_INSN (insn)-index != old_index
   || EDGE_COUNT (bb-succs) != old_succs)
 {
+	  if (sched_verbose = 6)
+	sel_print (Rescan: control flow simplified below insn %d, block %d\n,
+			INSN_UID (insn), BLOCK_FOR_INSN (insn)-index);
   insn = sel_bb_end (BLOCK_FOR_INSN (insn));
   goto rescan;
 }
diff --git a/gcc/testsuite/gcc.dg/pr57662.c b/gcc/testsuite/gcc.dg/pr57662.c
new file mode 100644
index 000..7af8455
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr57662.c
@@ -0,0 +1,47 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options  -O -fno-guess-branch-probability -fpeel-loops -freorder-blocks-and-partition -fschedule-insns2 -fsel-sched-pipelining -fselective-scheduling2 -ftree-pre } */
+
+struct intC
+{
+  short x;
+  short y;
+};
+
+void Get(void);
+
+int size_x;
+
+struct
+{
+  int *depot_table;
+  struct intC *ti;
+  int size;
+} dummy;
+
+static inline int
+GetRotatedTileFromOffset (int *a, struct intC tidc)
+{
+  if (!*a)
+Get ();
+  switch (*a)
+{
+case 0:
+  return (tidc.y  size_x) + tidc.x;
+case 1:
+  return tidc.y + (dummy.size - tidc.x) * size_x;
+case 2:
+  return tidc.x + (dummy.size - tidc.y) * size_x;
+case 3:
+  return (dummy.size - tidc.x);
+}
+  return 0;
+}
+
+int
+GetHangarNum (int *a, int i)
+{
+  while (dummy.size)
+if (GetRotatedTileFromOffset (a, dummy.ti[i]))
+  return *dummy.depot_table;
+  return 0;
+}

[PATCH] Fix PRs 56957 and 57105

2013-04-30 Thread Andrey Belevantsev


Hello,

After Steven's changes tightening the add/remove insn interface, we found a 
problem in the selective scheduling when we incorrectly determined whether 
an insn should be removed (and a fresh one emitted) or just moved to a 
scheduling point.  Fixed by just comparing INSN_UIDs of the insn below and 
the chosen expr above and abandoning other too clever code for checking 
whether an insn was changed or extra insns were emitted.


Some extra cleanups are possible after this patch, but I'd wait a bit to 
see if there is any fallout first.


Bootstrapped and tested on ia64 and x86-64, committed after offline 
approval from Alexander.


Andrey

2013-04-30  Andrey Belevantsev  a...@ispras.ru

gcc:

PR rtl-optimization/56957
PR rtl-optimization/57105
* sel-sched.c (move_op_orig_expr_found): Remove insn_emitted
variable.  Use just INSN_UID for determining whether an insn
should be only disconnected from the insn stream.
* sel-sched-ir.h (EXPR_WAS_CHANGED): Remove.

gcc/testsuite:

PR rtl-optimization/57105
* gcc.dg/pr57105.c: New test.
Index: gcc/ChangeLog
===
*** gcc/ChangeLog	(revision 198448)
--- gcc/ChangeLog	(revision 198449)
***
*** 1,3 
--- 1,13 
+ 2013-04-30  Andrey Belevantsev  a...@ispras.ru
+ 
+ 	PR rtl-optimization/56957
+ 	PR rtl-optimization/57105
+ 
+ 	* sel-sched.c (move_op_orig_expr_found): Remove insn_emitted
+ 	variable.  Use just INSN_UID for determining whether an insn
+ 	should be only disconnected from the insn stream.
+ 	* sel-sched-ir.h (EXPR_WAS_CHANGED): Remove.
+ 
  2013-04-30  Jakub Jelinek  ja...@redhat.com
  
  	PR tree-optimization/57104
Index: gcc/testsuite/gcc.dg/pr57105.c
===
*** gcc/testsuite/gcc.dg/pr57105.c	(revision 0)
--- gcc/testsuite/gcc.dg/pr57105.c	(revision 198449)
***
*** 0 
--- 1,18 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options -Os -fselective-scheduling2 -g } */
+ int bar (int);
+ int *baz (int *);
+ 
+ void
+ foo (int a)
+ {
+   while (bar (0))
+ {
+   int *c = baz (0);
+   if (a)
+ 	{
+ 	  int i = *baz (c);
+ 	}
+   bar (*baz (c));
+ }
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog	(revision 198448)
--- gcc/testsuite/ChangeLog	(revision 198449)
***
*** 1,3 
--- 1,8 
+ 2013-04-30  Andrey Belevantsev  a...@ispras.ru
+ 
+ 	PR rtl-optimization/57105
+ 	* gcc.dg/pr57105.c: New test.
+ 
  2013-04-30  Jakub Jelinek  ja...@redhat.com
  
  	PR tree-optimization/57104
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c	(revision 198448)
--- gcc/sel-sched.c	(revision 198449)
*** move_op_orig_expr_found (insn_t insn, ex
*** 6051,6064 
   cmpd_local_params_p lparams ATTRIBUTE_UNUSED,
   void *static_params)
  {
!   bool only_disconnect, insn_emitted;
moveop_static_params_p params = (moveop_static_params_p) static_params;
  
copy_expr_onside (params-c_expr, INSN_EXPR (insn));
track_scheduled_insns_and_blocks (insn);
!   insn_emitted = handle_emitting_transformations (insn, expr, params);
!   only_disconnect = (params-uid == INSN_UID (insn)
!   ! insn_emitted   ! EXPR_WAS_CHANGED (expr));
  
/* Mark that we've disconnected an insn.  */
if (only_disconnect)
--- 6051,6063 
   cmpd_local_params_p lparams ATTRIBUTE_UNUSED,
   void *static_params)
  {
!   bool only_disconnect;
moveop_static_params_p params = (moveop_static_params_p) static_params;
  
copy_expr_onside (params-c_expr, INSN_EXPR (insn));
track_scheduled_insns_and_blocks (insn);
!   handle_emitting_transformations (insn, expr, params);
!   only_disconnect = params-uid == INSN_UID (insn);
  
/* Mark that we've disconnected an insn.  */
if (only_disconnect)
Index: gcc/sel-sched-ir.h
===
*** gcc/sel-sched-ir.h	(revision 198448)
--- gcc/sel-sched-ir.h	(revision 198449)
*** typedef expr_def *expr_t;
*** 191,198 
  #define EXPR_WAS_RENAMED(EXPR) ((EXPR)-was_renamed)
  #define EXPR_CANT_MOVE(EXPR) ((EXPR)-cant_move)
  
- #define EXPR_WAS_CHANGED(EXPR) (EXPR_HISTORY_OF_CHANGES (EXPR).length ()  0)
- 
  /* Insn definition for list of original insns in find_used_regs.  */
  struct _def
  {
--- 191,196

Re: Fix PR 56077

2013-04-05 Thread Andrey Belevantsev


On 01.04.2013 12:38, Andrey Belevantsev wrote:

On 22.02.2013 17:30, Andrey Belevantsev wrote:

Hello,

As found by Jakub and explained in the PR audit trail by Alexander, this
patch fixes the selective scheduler merge glitch of 2008 that added the
unnecessary JUMP_P check to the flush_pending_lists call.  I have removed
the check and expanded the binary negation for clarity.

The patch was tested on x86-64, ia64, and ppc64 to be safe.  The patch
should be conservatively safe at this stage as it adds more flushes and
thus more dependencies to the scheduler.  The original test is fixed, but I
don't know how to add the test checking assembly insns order to the
testsuite.

OK for trunk?

Andrey

2012-02-22  Alexander Monakov  amona...@ispras.ru
 Andrey Belevantsev  a...@ispras.ru

 PR middle-end/56077
 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier,
 flush pending lists also on non-jumps.


Now backported to 4.7 and 4.6.


Leonid has reported that this patch breaks the kernel build on 
mipsel-linux.  I have quickly analyzed the issue.  The patch removes the 
JUMP_P check when flushing pending lists at the reg_pending_barrier, 
restoring the 2008 year behavior.  But at that time we didn't have debug 
insns.  Now with the removed check we flush pending lists also on debug 
insns, which results in freeing pending_read_mems but not in freeing 
pending_read_insns, as in add_dependence_list_and_free we explicitly forbid 
freeing the dependency list for debug insns (that was for fixing PR 44013). 
 This is inconsistent, and when proceeding with analysis we immediately hit


2451 t = canon_rtx (t);
2452 pending = deps-pending_read_insns;
2453 pending_mem = deps-pending_read_mems;
2454 while (pending)
2455   {
2456 if (read_dependence (XEXP (pending_mem, 0), t)

and the line 2456 segfaults because pending is not NULL, but pending mem is 
NULL.


I am testing the revert of this backport for 4.6 and will commit it in 
about an hour or so.  However, I am surprised we don't hit this either on 
4.7, 4.8 or trunk.  Some flush_pending_lists calls are protected from debug 
insns as they check CALL_P or JUMP_P, but not all of them.  It looks like 
flush_pending_lists should not be called on debug insns at all.  And 
indeed, the attached patch fixes Leonid's test case.


Jakub, you don't happen to remember any changes in this area that could 
hide the problem for 4.7 and later?


Andrey
Index: gcc/sched-deps.c
===
*** gcc/sched-deps.c	(revision 197492)
--- gcc/sched-deps.c	(working copy)
*** sched_analyze_insn (struct deps_desc *de
*** 3044,3050 
  
/* Don't flush pending lists on speculative checks for
  	 selective scheduling.  */
!   if (!sel_sched_p () || !sel_insn_is_speculation_check (insn))
  	flush_pending_lists (deps, insn, true, true);
  
if (!deps-readonly)
--- 3044,3050 
  
/* Don't flush pending lists on speculative checks for
  	 selective scheduling.  */
!   if (NONDEBUG_INSN_P (insn)  (!sel_sched_p () || !sel_insn_is_speculation_check (insn)))
  	flush_pending_lists (deps, insn, true, true);
  
if (!deps-readonly)

Re: Fix PR 56077

2013-04-05 Thread Andrey Belevantsev


On 05.04.2013 14:10, Olivier Hainque wrote:

On Apr 5, 2013, at 10:13 , Eric Botcazou ebotca...@adacore.com wrote:

We do have regressions on the 4.7 branch in the scheduler (CCed Olivier who
has more information).


  Right: we do see a SEGV while compiling the attached monitor.i (preprocessed
  output from a qemu tree) with -O2 -g.

   ./cc1 -m32 -O2 -g -quiet monitor.i

  .../monitor.c: In function ‘memory_dump’:
  .../monitor.c:1109:1: internal compiler error: Segmentation fault

  As already mentioned upthread, this is triggered by a call to
  flush_pending_lists with a DEBUG_INSN. We get into:

  if (for_write)
{
  add_dependence_list_and_free (deps, insn, deps-pending_read_insns,
1, REG_DEP_ANTI);
  if (!deps-readonly)
{
  free_EXPR_LIST_list (deps-pending_read_mems);
  deps-pending_read_list_length = 0;
}
}

  add_dependence_list_and_free doesn't free *LISTP when
  operating on DEBUG_INSNs, so we end up with pending_read_mems freed together
  with pending_read_insns not freed.

  This was cured on mainline by:

Author: mkuvyrkov
Date:   Mon Aug 27 22:11:48 2012 +

* sched-deps.c (add_dependence_list_and_free): Simplify.
(flush_pending_list_and_free): Fix a hack that was fixing a hack.  Free
lists when add_dependence_list_and_free doesn't free them.

(svn+ssh://gcc.gnu.org/svn/gcc/trunk@190733)

  http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01625.html

  I don't know whether backporting this would be better than reverting
  the offending change as just done on 4.7.


I'd say for 4.6 the best way is to revert.  PR 56077 is not that important, 
and this 4.6 release will be the last one.  For 4.7, we can additionally 
backport Maxim's patch or revert this one.  I'm fine with both options, but 
I'll test 4.7 backport too just to be ready for that.


Andrey

Re: [PATCH, committed] Fix PR 45472

2013-04-01 Thread Andrey Belevantsev


On 27.02.2013 13:03, Andrey Belevantsev wrote:

Hello,

For this volatile-related issue (no volatile bits on volatile fields of a
non-volatile struct) AFAIU there is an agreement of fixing the front-ends
if needed, but anyways the patch to the selective scheduler is required
that properly merges expressions so that the may_trap_p bit is preserved.

So the following patch was successfully tested on ia64 and x86-64, approved
by Alexander offline, committed to trunk.  The patch needs backport to
other branches in about two weeks.

Andrey

 PR middle-end/45472

 gcc/
 * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr
 when the may_trap_p bit of the exprs being merged differs.

 Reorder tests for speculativeness in the logical and operator.

 testsuite/
 * gcc.dg/45472.c: New test.


Now backported to 4.7 and 4.6 with Jakub's patch for the sel-sched-ir.c 
memory leak added.


Andrey
Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 197298)
--- gcc/ChangeLog   (revision 197299)
***
*** 1,6 
--- 1,25 
  2013-04-01  Andrey Belevantsev  a...@ispras.ru
  
Backport from mainline
+   2013-02-27  Andrey Belevantsev  a...@ispras.ru
+ 
+   PR middle-end/45472
+   
+   * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr
+   when the may_trap_p bit of the exprs being merged differs.
+   Reorder tests for speculativeness in the logical and operator.
+ 
+   Backport from mainline
+2013-03-05  Jakub Jelinek  ja...@redhat.com
+
+   PR middle-end/56461
+   * sel-sched-ir.c (free_sched_pools): Release
+   succs_info_pool.stack[succs_info_pool.max_top] vectors too
+   if succs_info_pool.max_top isn't -1.
+ 
+ 2013-04-01  Andrey Belevantsev  a...@ispras.ru
+ 
+   Backport from mainline
2012-02-19  Andrey Belevantsev  a...@ispras.ru
  
PR middle-end/55889
Index: gcc/testsuite/gcc.dg/pr45472.c
===
*** gcc/testsuite/gcc.dg/pr45472.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr45472.c  (revision 197299)
***
*** 0 
--- 1,63 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+ /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */
+ 
+ struct S
+ {
+   volatile long vl;
+   int i;
+ };
+ struct S s1, s2;
+ 
+ void
+ foo (int j, int c)
+ {
+   int i;
+   for (i = 0; i = j; i++)
+ {
+   if (c)
+   s2.vl += s1.vl;
+   s1 = s2;
+ }
+ }
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+ /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */
+ 
+ struct S
+ {
+   volatile long vl;
+   int i;
+ };
+ struct S s1, s2;
+ 
+ void
+ foo (int j, int c)
+ {
+   int i;
+   for (i = 0; i = j; i++)
+ {
+   if (c)
+   s2.vl += s1.vl;
+   s1 = s2;
+ }
+ }
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+ /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */
+ 
+ struct S
+ {
+   volatile long vl;
+   int i;
+ };
+ struct S s1, s2;
+ 
+ void
+ foo (int j, int c)
+ {
+   int i;
+   for (i = 0; i = j; i++)
+ {
+   if (c)
+   s2.vl += s1.vl;
+   s1 = s2;
+ }
+ }
Index: gcc/testsuite/ChangeLog
===
*** gcc/testsuite/ChangeLog (revision 197298)
--- gcc/testsuite/ChangeLog (revision 197299)
***
*** 1,3 
--- 1,11 
+ 2013-04-01  Andrey Belevantsev  a...@ispras.ru
+ 
+   Backport from mainline
+   2013-02-27  Andrey Belevantsev  a...@ispras.ru
+   
+   PR middle-end/45472
+   * gcc.dg/pr45472.c: New test.
+ 
  2013-03-26  Richard Biener  rguent...@suse.de
  
Backport from mainline
Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c  (revision 197298)
--- gcc/sel-sched-ir.c  (revision 197299)
*** merge_expr (expr_t to, expr_t from, insn
*** 1862,1869 
/* Make sure that speculative pattern is propagated into exprs that
   have non-speculative one.  This will provide us with consistent
   speculative bits and speculative patterns inside expr.  */
!   if (EXPR_SPEC_DONE_DS (to) == 0
!EXPR_SPEC_DONE_DS (from) != 0)
  change_vinsn_in_expr (to, EXPR_VINSN (from));
  
merge_expr_data (to, from, split_point);
--- 1862,1873 
/* Make sure that speculative pattern is propagated into exprs that
   have non-speculative one.  This will provide us with consistent
   speculative bits and speculative patterns inside expr.  */
!   if ((EXPR_SPEC_DONE_DS (from) != 0
! EXPR_SPEC_DONE_DS (to) == 0)
!   /* Do likewise for volatile insns, so that we always retain
!the may_trap_p bit on the resulting expression

Re: Fix PR 56077

2013-04-01 Thread Andrey Belevantsev


On 22.02.2013 17:30, Andrey Belevantsev wrote:

Hello,

As found by Jakub and explained in the PR audit trail by Alexander, this
patch fixes the selective scheduler merge glitch of 2008 that added the
unnecessary JUMP_P check to the flush_pending_lists call.  I have removed
the check and expanded the binary negation for clarity.

The patch was tested on x86-64, ia64, and ppc64 to be safe.  The patch
should be conservatively safe at this stage as it adds more flushes and
thus more dependencies to the scheduler.  The original test is fixed, but I
don't know how to add the test checking assembly insns order to the testsuite.

OK for trunk?

Andrey

2012-02-22  Alexander Monakov  amona...@ispras.ru
 Andrey Belevantsev  a...@ispras.ru

 PR middle-end/56077
 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier,
 flush pending lists also on non-jumps.


Now backported to 4.7 and 4.6.

Andrey
Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 197296)
--- gcc/ChangeLog   (revision 197297)
***
*** 1,3 
--- 1,13 
+ 2013-04-01  Andrey Belevantsev  a...@ispras.ru
+ 
+   Backport from mainline
+   2013-02-25  Andrey Belevantsev  a...@ispras.ru
+   Alexander Monakov  amona...@ispras.ru
+ 
+   PR middle-end/56077
+   * sched-deps.c (sched_analyze_insn): When reg_pending_barrier,
+   flush pending lists also on non-jumps.  Adjust comment.
+ 
  2013-03-30  Gerald Pfeifer  ger...@pfeifer.com
  
* doc/invoke.texi (AVR Options): Tweak link for AVR-LibC user manual.
Index: gcc/sched-deps.c
===
*** gcc/sched-deps.c(revision 197296)
--- gcc/sched-deps.c(revision 197297)
*** sched_analyze_insn (struct deps_desc *de
*** 3262,3270 
  SET_REGNO_REG_SET (deps-reg_last_in_use, i);
}
  
!   /* Flush pending lists on jumps, but not on speculative checks.  */
!   if (JUMP_P (insn)  !(sel_sched_p ()
!   sel_insn_is_speculation_check (insn)))
flush_pending_lists (deps, insn, true, true);
  
reg_pending_barrier = NOT_A_BARRIER;
--- 3262,3270 
  SET_REGNO_REG_SET (deps-reg_last_in_use, i);
}
  
!   /* Don't flush pending lists on speculative checks for
!selective scheduling.  */
!   if (!sel_sched_p () || !sel_insn_is_speculation_check (insn))
flush_pending_lists (deps, insn, true, true);
  
reg_pending_barrier = NOT_A_BARRIER;

Re: [PATCH] Fix PR 55889

2013-04-01 Thread Andrey Belevantsev


On 19.02.2013 17:13, Andrey Belevantsev wrote:

Hello,

As we discussed in the PR, the problem here is that the selective scheduler
does not expect that renaming a hard register to a pseudo would result in
extra dependencies.  The dependencies come from implicit clobbers code of
sched-deps.c, so I've made a minimal patch that checks only for that case
using the same function from IRA and avoids it.

The patch fixes the test case on AIX as reported by David and also
bootstraps and tests fine on x86-64 and ia64, ok for trunk?

The second patch is restoring debug printing in the scheduler that was
accidentally broken, possibly by Steven's cleanups (now a pattern is
expected where an insn was previously).  I will commit is as obvious
separately.

Andrey

2012-02-19  Andrey Belevantsev  a...@ispras.ru

 PR middle-end/55889

 * sel-sched.c: Include ira.h.
 (implicit_clobber_conflict_p): New function.
 (moveup_expr): Use it.
 * Makefile.in (sel-sched.o): Depend on ira.h.



Now backported this to 4.7 and 4.6.

Andrey

Index: gcc/ChangeLog
===
*** gcc/ChangeLog   (revision 197297)
--- gcc/ChangeLog   (revision 197298)
***
*** 1,6 
--- 1,18 
  2013-04-01  Andrey Belevantsev  a...@ispras.ru
  
Backport from mainline
+   2012-02-19  Andrey Belevantsev  a...@ispras.ru
+ 
+   PR middle-end/55889
+ 
+   * sel-sched.c: Include ira.h.
+   (implicit_clobber_conflict_p): New function.
+   (moveup_expr): Use it.
+   * Makefile.in (sel-sched.o): Depend on ira.h. 
+ 
+ 2013-04-01  Andrey Belevantsev  a...@ispras.ru
+ 
+   Backport from mainline
2013-02-25  Andrey Belevantsev  a...@ispras.ru
Alexander Monakov  amona...@ispras.ru
  
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 197297)
--- gcc/sel-sched.c (revision 197298)
*** along with GCC; see the file COPYING3.  
*** 45,50 
--- 45,51 
  #include rtlhooks-def.h
  #include output.h
  #include emit-rtl.h
+ #include ira.h
  
  #ifdef INSN_SCHEDULING
  #include sel-sched-ir.h
*** moving_insn_creates_bookkeeping_block_p 
*** 2113,2118 
--- 2114,2174 
return TRUE;
  }
  
+ /* Return true when the conflict with newly created implicit clobbers
+between EXPR and THROUGH_INSN is found because of renaming.  */
+ static bool
+ implicit_clobber_conflict_p (insn_t through_insn, expr_t expr)
+ {
+   HARD_REG_SET temp;
+   rtx insn, reg, rhs, pat;
+   hard_reg_set_iterator hrsi;
+   unsigned regno;
+   bool valid;
+ 
+   /* Make a new pseudo register.  */
+   reg = gen_reg_rtx (GET_MODE (EXPR_LHS (expr)));
+   max_regno = max_reg_num ();
+   maybe_extend_reg_info_p ();
+ 
+   /* Validate a change and bail out early.  */
+   insn = EXPR_INSN_RTX (expr);
+   validate_change (insn, SET_DEST (PATTERN (insn)), reg, true);
+   valid = verify_changes (0);
+   cancel_changes (0);
+   if (!valid)
+ {
+   if (sched_verbose = 6)
+   sel_print (implicit clobbers failed validation, );
+   return true;
+ }
+ 
+   /* Make a new insn with it.  */
+   rhs = copy_rtx (VINSN_RHS (EXPR_VINSN (expr)));
+   pat = gen_rtx_SET (VOIDmode, reg, rhs);
+   start_sequence ();
+   insn = emit_insn (pat);
+   end_sequence ();
+ 
+   /* Calculate implicit clobbers.  */
+   extract_insn (insn);
+   preprocess_constraints ();
+   ira_implicitly_set_insn_hard_regs (temp);
+   AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
+ 
+   /* If any implicit clobber registers intersect with regular ones in
+  through_insn, we have a dependency and thus bail out.  */
+   EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi)
+ {
+   vinsn_t vi = INSN_VINSN (through_insn);
+   if (bitmap_bit_p (VINSN_REG_SETS (vi), regno)
+ || bitmap_bit_p (VINSN_REG_CLOBBERS (vi), regno)
+ || bitmap_bit_p (VINSN_REG_USES (vi), regno))
+   return true;
+ }
+ 
+   return false;
+ }
+ 
  /* Modifies EXPR so it can be moved through the THROUGH_INSN,
 performing necessary transformations.  Record the type of transformation
 made in PTRANS_TYPE, when it is not NULL.  When INSIDE_INSN_GROUP,
*** moveup_expr (expr_t expr, insn_t through
*** 2245,2250 
--- 2301,2317 
if (!enable_schedule_as_rhs_p || !EXPR_SEPARABLE_P (expr))
  return MOVEUP_EXPR_NULL;
  
+   /* When renaming a hard register to a pseudo before reload, extra
+dependencies can occur from the implicit clobbers of the insn.
+Filter out such cases here.  */
+   if (!reload_completed  REG_P (EXPR_LHS (expr))
+  HARD_REGISTER_P (EXPR_LHS (expr))
+  implicit_clobber_conflict_p (through_insn, expr))
+   {
+ if (sched_verbose = 6)
+   sel_print (implicit clobbers conflict detected, );
+ return MOVEUP_EXPR_NULL

Re: [PATCH] Fix sel-sched memory leak (PR middle-end/56461)

2013-03-04 Thread Andrey Belevantsev


Hi Jakub,

On 2013-03-05 01:20, Jakub Jelinek wrote:

Hi!

alloc_succs_info creates vectors up to and including
succs_info_pool.max_top, so without the following fix we leak the
last set of 3 vectors.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, 
ok for

trunk?


This is fine, thanks a lot.  I can take care of backporting this to the 
older branches with the other sel-sched patches.


Andrey



2013-03-04  Jakub Jelinek  ja...@redhat.com

PR middle-end/56461
* sel-sched-ir.c (free_sched_pools): Release
succs_info_pool.stack[succs_info_pool.max_top] vectors too
if succs_info_pool.max_top isn't -1.

--- gcc/sel-sched-ir.c.jj   2013-02-27 14:59:51.0 +0100
+++ gcc/sel-sched-ir.c  2013-03-04 17:47:34.705686637 +0100
@@ -5020,7 +5020,7 @@ free_sched_pools (void)

   free_alloc_pool (sched_lists_pool);
   gcc_assert (succs_info_pool.top == -1);
-  for (i = 0; i  succs_info_pool.max_top; i++)
+  for (i = 0; i = succs_info_pool.max_top; i++)
 {
   succs_info_pool.stack[i].succs_ok.release ();
   succs_info_pool.stack[i].succs_other.release ();

Jakub

[PATCH, committed] Fix PR 45472

2013-02-27 Thread Andrey Belevantsev


Hello,

For this volatile-related issue (no volatile bits on volatile fields of a 
non-volatile struct) AFAIU there is an agreement of fixing the front-ends 
if needed, but anyways the patch to the selective scheduler is required 
that properly merges expressions so that the may_trap_p bit is preserved.


So the following patch was successfully tested on ia64 and x86-64, approved 
by Alexander offline, committed to trunk.  The patch needs backport to 
other branches in about two weeks.


Andrey

PR middle-end/45472

gcc/
* sel-sched-ir.c (merge_expr): Also change vinsn of merged expr
when the may_trap_p bit of the exprs being merged differs.

Reorder tests for speculativeness in the logical and operator.

testsuite/
* gcc.dg/45472.c: New test.
Index: gcc/testsuite/gcc.dg/pr45472.c
===
*** gcc/testsuite/gcc.dg/pr45472.c	(revision 0)
--- gcc/testsuite/gcc.dg/pr45472.c	(revision 196308)
***
*** 0 
--- 1,21 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+ /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */
+ 
+ struct S
+ {
+   volatile long vl;
+   int i;
+ };
+ struct S s1, s2;
+ 
+ void
+ foo (int j, int c)
+ {
+   int i;
+   for (i = 0; i = j; i++)
+ {
+   if (c)
+ 	s2.vl += s1.vl;
+   s1 = s2;
+ }
+ }
Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c	(revision 196307)
--- gcc/sel-sched-ir.c	(revision 196308)
*** merge_expr (expr_t to, expr_t from, insn
*** 1866,1873 
/* Make sure that speculative pattern is propagated into exprs that
   have non-speculative one.  This will provide us with consistent
   speculative bits and speculative patterns inside expr.  */
!   if (EXPR_SPEC_DONE_DS (to) == 0
!EXPR_SPEC_DONE_DS (from) != 0)
  change_vinsn_in_expr (to, EXPR_VINSN (from));
  
merge_expr_data (to, from, split_point);
--- 1866,1877 
/* Make sure that speculative pattern is propagated into exprs that
   have non-speculative one.  This will provide us with consistent
   speculative bits and speculative patterns inside expr.  */
!   if ((EXPR_SPEC_DONE_DS (from) != 0
! EXPR_SPEC_DONE_DS (to) == 0)
!   /* Do likewise for volatile insns, so that we always retain
! 	 the may_trap_p bit on the resulting expression.  */
!   || (VINSN_MAY_TRAP_P (EXPR_VINSN (from))
! 	   !VINSN_MAY_TRAP_P (EXPR_VINSN (to
  change_vinsn_in_expr (to, EXPR_VINSN (from));
  
merge_expr_data (to, from, split_point);

Fix PR 56077

2013-02-22 Thread Andrey Belevantsev


Hello,

As found by Jakub and explained in the PR audit trail by Alexander, this 
patch fixes the selective scheduler merge glitch of 2008 that added the 
unnecessary JUMP_P check to the flush_pending_lists call.  I have removed 
the check and expanded the binary negation for clarity.


The patch was tested on x86-64, ia64, and ppc64 to be safe.  The patch 
should be conservatively safe at this stage as it adds more flushes and 
thus more dependencies to the scheduler.  The original test is fixed, but I 
don't know how to add the test checking assembly insns order to the testsuite.


OK for trunk?

Andrey

2012-02-22  Alexander Monakov  amona...@ispras.ru
Andrey Belevantsev  a...@ispras.ru

PR middle-end/56077
* sched-deps.c (sched_analyze_insn): When reg_pending_barrier,
flush pending lists also on non-jumps.
Index: gcc/sched-deps.c
===
*** gcc/sched-deps.c	(revision 196136)
--- gcc/sched-deps.c	(working copy)
*** sched_analyze_insn (struct deps_desc *de
*** 3318,3325 
}
  
/* Flush pending lists on jumps, but not on speculative checks.  */
!   if (JUMP_P (insn)  !(sel_sched_p ()
!   sel_insn_is_speculation_check (insn)))
  	flush_pending_lists (deps, insn, true, true);
  
reg_pending_barrier = NOT_A_BARRIER;
--- 3318,3324 
}
  
/* Flush pending lists on jumps, but not on speculative checks.  */
!   if (!sel_sched_p () || !sel_insn_is_speculation_check (insn))
  	flush_pending_lists (deps, insn, true, true);
  
reg_pending_barrier = NOT_A_BARRIER;

[PATCH] Fix PR 55889

2013-02-19 Thread Andrey Belevantsev


Hello,

As we discussed in the PR, the problem here is that the selective scheduler 
does not expect that renaming a hard register to a pseudo would result in 
extra dependencies.  The dependencies come from implicit clobbers code of 
sched-deps.c, so I've made a minimal patch that checks only for that case 
using the same function from IRA and avoids it.


The patch fixes the test case on AIX as reported by David and also 
bootstraps and tests fine on x86-64 and ia64, ok for trunk?


The second patch is restoring debug printing in the scheduler that was 
accidentally broken, possibly by Steven's cleanups (now a pattern is 
expected where an insn was previously).  I will commit is as obvious 
separately.


Andrey

2012-02-19  Andrey Belevantsev  a...@ispras.ru

PR middle-end/55889

* sel-sched.c: Include ira.h.
(implicit_clobber_conflict_p): New function.
(moveup_expr): Use it.
* Makefile.in (sel-sched.o): Depend on ira.h.


And the second patch is:

2012-02-19  Andrey Belevantsev  a...@ispras.ru

* sel-sched-dump.c (dump_insn_rtx_flags): Explicitly set
DUMP_INSN_RTX_UID.
(dump_insn_rtx_1): Pass PATTERN (insn) to str_pattern_slim.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f3bb168..557ab08 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3338,7 +3338,7 @@ sel-sched.o : sel-sched.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(FUNCTION_H) $(INSN_ATTR_H)  $(RECOG_H) $(EXCEPT_H) $(PARAMS_H) \
$(TM_P_H) output.h $(TARGET_H) $(TREE_PASS_H)  \
$(SCHED_INT_H) $(GGC_H) $(TREE_H) langhooks.h rtlhooks-def.h \
-   $(SEL_SCHED_IR_H) $(SEL_SCHED_DUMP_H) sel-sched.h $(DBGCNT_H) $(EMIT_RTL_H)
+   $(SEL_SCHED_IR_H) $(SEL_SCHED_DUMP_H) sel-sched.h $(DBGCNT_H) $(EMIT_RTL_H) ira.h
 sel-sched-dump.o : sel-sched-dump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
$(FUNCTION_H) $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) $(PARAMS_H) \
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 6f60d70..2944489 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include langhooks.h
 #include rtlhooks-def.h
 #include emit-rtl.h
+#include ira.h
 
 #ifdef INSN_SCHEDULING
 #include sel-sched-ir.h
@@ -2101,6 +2102,61 @@ moving_insn_creates_bookkeeping_block_p (insn_t insn,
   return TRUE;
 }
 
+/* Return true when the conflict with newly created implicit clobbers
+   between EXPR and THROUGH_INSN is found because of renaming.  */
+static bool
+implicit_clobber_conflict_p (insn_t through_insn, expr_t expr)
+{
+  HARD_REG_SET temp;
+  rtx insn, reg, rhs, pat;
+  hard_reg_set_iterator hrsi;
+  unsigned regno;
+  bool valid;
+
+  /* Make a new pseudo register.  */
+  reg = gen_reg_rtx (GET_MODE (EXPR_LHS (expr)));
+  max_regno = max_reg_num ();
+  maybe_extend_reg_info_p ();
+
+  /* Validate a change and bail out early.  */
+  insn = EXPR_INSN_RTX (expr);
+  validate_change (insn, SET_DEST (PATTERN (insn)), reg, 1);
+  valid = verify_changes (0);
+  cancel_changes (0);
+  if (!valid)
+{
+  if (sched_verbose = 6)
+	sel_print (implicit clobbers failed validation, );
+  return true;
+}
+
+  /* Make a new insn with it.  */
+  rhs = copy_rtx (VINSN_RHS (EXPR_VINSN (expr)));
+  pat = gen_rtx_SET (VOIDmode, reg, rhs);
+  start_sequence ();
+  insn = emit_insn (pat);
+  end_sequence ();
+
+  /* Calculate implicit clobbers.  */
+  extract_insn (insn);
+  preprocess_constraints ();
+  ira_implicitly_set_insn_hard_regs (temp);
+  AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
+
+  /* If any implicit clobber registers intersect with regular ones in
+ through_insn, we have a dependency and thus bail out.  */
+  EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi)
+{
+  vinsn_t vi = INSN_VINSN (through_insn);
+  if (bitmap_bit_p (VINSN_REG_SETS (vi), regno)
+	  || bitmap_bit_p (VINSN_REG_CLOBBERS (vi), regno)
+	  || bitmap_bit_p (VINSN_REG_USES (vi), regno))
+	return true;
+}
+
+  return false;
+}
+
 /* Modifies EXPR so it can be moved through the THROUGH_INSN,
performing necessary transformations.  Record the type of transformation
made in PTRANS_TYPE, when it is not NULL.  When INSIDE_INSN_GROUP,
@@ -2233,6 +2289,17 @@ moveup_expr (expr_t expr, insn_t through_insn, bool inside_insn_group,
   if (!enable_schedule_as_rhs_p || !EXPR_SEPARABLE_P (expr))
 return MOVEUP_EXPR_NULL;
 
+  /* When renaming a hard register to a pseudo before reload, extra
+	 dependencies can occur from the implicit clobbers of the insn.
+	 Filter out such cases here.  */
+  if (!reload_completed  REG_P (EXPR_LHS (expr))
+	   HARD_REGISTER_P (EXPR_LHS (expr))
+	   implicit_clobber_conflict_p (through_insn, expr))
+	{
+	  if (sched_verbose = 6)
+	sel_print (implicit clobbers conflict detected, );
+	  return MOVEUP_EXPR_NULL;
+	}
   EXPR_TARGET_AVAILABLE (expr

Re: [PATCH] PR 54472

2012-11-09 Thread Andrey Belevantsev


On 30.10.2012 12:09, Andrey Belevantsev wrote:

Hello,

This PR is due to the selective scheduling missing the dependencies with
implicit_sets.  From the sched-deps.c code it looks like implicit sets
generate anti dependencies with either sets, uses or clobbers, so that's
that I've done with the below patch.  Vlad, as it looks you've added
implicit sets, does the above conclusion look correct?  I will commit the
patch then after bootstrapping and testing will complete.



Now committed after offline discussion with Vlad and Alexander.

Andrey

[PATCH] PR 54472

2012-10-30 Thread Andrey Belevantsev


Hello,

This PR is due to the selective scheduling missing the dependencies with 
implicit_sets.  From the sched-deps.c code it looks like implicit sets 
generate anti dependencies with either sets, uses or clobbers, so that's 
that I've done with the below patch.  Vlad, as it looks you've added 
implicit sets, does the above conclusion look correct?  I will commit the 
patch then after bootstrapping and testing will complete.


Yours,
Andrey

2012-10-30  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/54472

* sel-sched-ir.c (has_dependence_note_reg_set): Handle
implicit sets.
(has_dependence_note_reg_clobber,
has_dependence_note_reg_use): Likewise.

2012-10-30  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/54472

* gcc.dg/pr54472.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 2a7a170..220568a 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3185,7 +3185,7 @@ has_dependence_note_reg_set (int regno)
  || reg_last-clobbers != NULL)
*dsp = (*dsp  ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last-uses)
+  if (reg_last-uses || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3205,7 +3205,7 @@ has_dependence_note_reg_clobber (int regno)
   if (reg_last-sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last-uses)
+  if (reg_last-uses || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3225,7 +3225,7 @@ has_dependence_note_reg_use (int regno)
   if (reg_last-sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_TRUE;

-  if (reg_last-clobbers)
+  if (reg_last-clobbers || reg_last-implicit_sets)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;

   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
diff --git a/gcc/testsuite/gcc.dg/pr54472.c b/gcc/testsuite/gcc.dg/pr54472.c
new file mode 100644
index 000..9395203
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54472.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fschedule-insns -fselective-scheduling } */
+
+int main ()
+{
+  int a[3][3][3];
+  __builtin_memset (a, 0, sizeof a);
+  return 0;
+}

Re: Fix PR 53701

2012-10-22 Thread Andrey Belevantsev


On 16.10.2012 11:50, Andrey Belevantsev wrote:


The below is the port of this patch to 4.7, took longer than expected but
still.  Will commit after retesting on x86-64 (testing on ia64 is already
fine) and with the fix for PR 53975.


Now the same patch is also committed to 4.6 after more wait and testing.

Andrey

Re: [PATCH] PR 53975

2012-10-16 Thread Andrey Belevantsev


Hello,

On 31.07.2012 15:08, Andrey Belevantsev wrote:

Hello,

This PR is about wrong speculation of an insn that doesn't support storing
NaT bits done by the selective scheduler (more details in the PR audit
trail).  The reason for this is the wrong one-liner patch committed last
year, the fix is to revert that patch and to clarify the comment before the
patched code.

Bootstrapped and tested on ia64, approved by Alexander offline.  No test as
I don't know how to check whether an insn got moved through another insn.


Finally I've got to porting this to 4.7.  The patch was successfully tested 
on ia64, I'll commit this after double checking on x86-64.  The patch 
should be committed with the fix for PR 53701, which I will do right after 
this one.


Andrey

2012-10-16  Andrey Belevantsev  a...@ispras.ru

Backport from mainline
2012-07-31  Andrey Belevantsev  a...@ispras.ru
PR target/53975

* sel-sched-ir.c (has_dependence_note_reg_use): Clarify comment.
Revert
2011-08-04  Sergey Grechanik  mouseent...@ispras.ru
* sel-sched-ir.c (has_dependence_note_reg_use): Call ds_full_merge
only if producer writes to the register given by regno.

Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c  (revision 192488)
--- gcc/sel-sched-ir.c  (working copy)
*** has_dependence_note_reg_use (int regno)
*** 3224,3230 
if (reg_last-clobbers)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;

!   /* Handle BE_IN_SPEC.  */
if (reg_last-uses)
{
  ds_t pro_spec_checked_ds;
--- 3224,3234 
if (reg_last-clobbers)
*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;

!   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
!is actually a check insn.  We need to do this for any register
!read-read dependency with the check unless we track properly
!all registers written by BE_IN_SPEC-speculated insns, as
!we don't have explicit dependence lists.  See PR 53975.  */
if (reg_last-uses)
{
  ds_t pro_spec_checked_ds;
*** has_dependence_note_reg_use (int regno)
*** 3232,3240 
  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS 
(has_dependence_data.pro);

  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);

! if (pro_spec_checked_ds != 0
!  bitmap_bit_p (INSN_REG_SETS (has_dependence_data.pro), regno))
!   /* Merge BE_IN_SPEC bits into *DSP.  */
*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
  NULL_RTX, NULL_RTX);
}
--- 3236,3242 
  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS 
(has_dependence_data.pro);

  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);

! if (pro_spec_checked_ds != 0)
*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
  NULL_RTX, NULL_RTX);
}

Re: Fix PR 53701

2012-10-16 Thread Andrey Belevantsev


Hello,

On 09.08.2012 17:19, Alexander Monakov wrote:



On Thu, 9 Aug 2012, Andrey Belevantsev wrote:


Hello,

The problem in question is uncovered by the recent speculation patch, it is in
the handling of expressions blocked by bookkeeping.  Those are expressions
that become unavailable due to the newly created bookkeeping copies.  In the
original algorithm the supported insns and transformations cannot lead to this
result, but when handling non-separable insns or creating speculative checks
that unpredictably block certain insns the situation can arise.  We just
filter out all such expressions from the final availability set for
correctness.

The PR happens because the expression being filtered out can be transformed
while being moved up, thus we need to look up not only its exact pattern but
also all its previous forms saved in its history of changes.  The patch does
exactly that, I also clarified the comments w.r.t. this situation.

Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too.
OK for trunk?  Also need to backport this to 4.7 with PR 53975, say on the
next week.


This is OK.


The below is the port of this patch to 4.7, took longer than expected but 
still.  Will commit after retesting on x86-64 (testing on ia64 is already 
fine) and with the fix for PR 53975.


Andrey

2012-10-16  Andrey Belevantsev  a...@ispras.ru

Backport from mainline
2012-08-09  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/53701
* sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment.
Process not only expr's vinsns but all old vinsns from expr's
history of changes.
(update_and_record_unavailable_insns): Clarify comment. 

testsuite:
2012-10-16  Andrey Belevantsev  a...@ispras.ru

Backport from mainline
2012-08-09  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/53701
* gcc.dg/pr53701.c: New test.


Index: gcc/testsuite/gcc.dg/pr53701.c
===
*** gcc/testsuite/gcc.dg/pr53701.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr53701.c  (revision 0)
***
*** 0 
--- 1,59 
+ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+ /* { dg-options -O3 -fselective-scheduling2 -fsel-sched-pipelining } */
+ typedef unsigned short int uint16_t;
+ typedef unsigned long int uintptr_t;
+ typedef struct GFX_VTABLE
+ {
+   int color_depth;
+   unsigned char *line[];
+ }
+ BITMAP;
+ extern int _drawing_mode;
+ extern BITMAP *_drawing_pattern;
+ extern int _drawing_y_anchor;
+ extern unsigned int _drawing_x_mask;
+ extern unsigned int _drawing_y_mask;
+ extern uintptr_t bmp_write_line (BITMAP *, int);
+   void
+ _linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color)
+ {
+   int w;
+   if (_drawing_mode == 0)
+   {
+ int x, curw;
+ unsigned short *sline =
+   (unsigned short *) (_drawing_pattern-
+   line[((dy) -
+ _drawing_y_anchor)  _drawing_y_mask]);
+ unsigned short *s;
+ unsigned short *d =
+   ((unsigned short *) (bmp_write_line (dst, dy)) + (dx1));
+ s = ((unsigned short *) (sline) + (x));
+ if (_drawing_mode == 2)
+ {
+ }
+ else if (_drawing_mode == 3)
+ {
+   do
+   {
+ w -= curw;
+ do
+ {
+   unsigned long c = (*(s));
+   if (!((unsigned long) (c) == 0x7C1F))
+   {
+ (*((uint16_t *) ((uintptr_t) (d))) = ((color)));
+   }
+   ((s)++);
+ }
+ while (--curw  0);
+ s = sline;
+ curw =
+   (((w) 
+ ((int) _drawing_x_mask +
+  1)) ? (w) : ((int) _drawing_x_mask + 1));
+   }
+   while (curw  0);
+ }
+   }
+ }
Index: gcc/sel-sched.c
===
*** gcc/sel-sched.c (revision 192488)
--- gcc/sel-sched.c (working copy)
*** process_use_exprs (av_set_t *av_ptr)
*** 3567,3595 
return NULL;
  }

! /* Lookup EXPR in VINSN_VEC and return TRUE if found.  */
  static bool
  vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr)
  {
!   vinsn_t vinsn;
int n;

!   FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn)
! if (VINSN_SEPARABLE_P (vinsn))
!   {
! if (vinsn_equal_p (vinsn, EXPR_VINSN (expr)))
!   return true;
!   }
! else
!   {
! /* For non-separable instructions, the blocking insn can have
!another pattern due to substitution, and we can't choose
!different register as in the above case.  Check all registers
!being written instead.  */
! if (bitmap_intersect_p (VINSN_REG_SETS (vinsn),
! VINSN_REG_SETS (EXPR_VINSN (expr
!   return true;
!   }

return false;
  }
--- 3567,3607 
return NULL;
  }

! /* Lookup EXPR in VINSN_VEC and return

Re: [PATCH] Fix sel-sched ICE with asm goto (PR rtl-optimization/54455)

2012-09-06 Thread Andrey Belevantsev


Hello,

On 05.09.2012 19:38, Jakub Jelinek wrote:

Hi!

As discussed in the PR, sel-sched doesn't handle correctly
tidying of empty blocks if fallthru predecessor ends with asm goto
that has some labels on the empty block in addition to the fallthru edge.
cfgrtl.c can handle that, so this patch just gives up on it on the sel-sched
side.  The testcase is new since the patch in the PR, tested with unpatched
and patched gcc (fails vs. works).

Bootstrapped/regtested on x86_64-linux and i686-linux (as usually, with rtl
checking).  Ok for trunk?


OK.  Sorry for being slower on this than you.

Btw, the max-sched-extend-regions-iters can be limited to 2 in the test, 
I've checked that my build still segfaults with it.  (I've mentioned once 
to Zdenek in some PR comment that this param's value of 3-4 covers most of 
the weird cfgs, so the larger values should not be tried in his opt searches.)


Andrey



2012-09-05  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/54455
* sel-sched-ir.c (maybe_tidy_empty_bb): Give up if previous fallthru
bb ends up with asm goto referencing bb's label.

* gcc.dg/54455.c: New test.

--- gcc/sel-sched-ir.c.jj   2012-08-15 10:55:30.0 +0200
+++ gcc/sel-sched-ir.c  2012-09-03 09:56:59.352233243 +0200
@@ -3686,6 +3686,22 @@ maybe_tidy_empty_bb (basic_block bb)
FOR_EACH_EDGE (e, ei, bb-preds)
  if (e-flags  EDGE_COMPLEX)
return false;
+else if (e-flags  EDGE_FALLTHRU)
+  {
+   rtx note;
+   /* If prev bb ends with asm goto, see if any of the
+  ASM_OPERANDS_LABELs don't point to the fallthru
+  label.  Do not attempt to redirect it in that case.  */
+   if (JUMP_P (BB_END (e-src))
+(note = extract_asm_operands (PATTERN (BB_END (e-src)
+ {
+   int i, n = ASM_OPERANDS_LABEL_LENGTH (note);
+
+   for (i = 0; i  n; ++i)
+ if (XEXP (ASM_OPERANDS_LABEL (note, i), 0) == BB_HEAD (bb))
+   return false;
+ }
+  }

free_data_sets (bb);

--- gcc/testsuite/gcc.dg/54455.c.jj 2012-06-15 19:53:34.312404791 +0200
+++ gcc/testsuite/gcc.dg/54455.c2012-09-05 15:05:02.328728962 +0200
@@ -0,0 +1,25 @@
+/* PR rtl-optimization/54455 */
+/* { dg-do compile } */
+/* { dg-options -O1 -fschedule-insns -fselective-scheduling --param 
max-sched-extend-regions-iters=8 } */
+
+extern void fn1 (void), fn2 (void);
+
+static inline __attribute__((always_inline)) int
+foo (int *x, long y)
+{
+  asm goto ( : : r (x), r (y) : memory : lab);
+  return 0;
+lab:
+  return 1;
+}
+
+void
+bar (int *x)
+{
+  if (foo (x, 23))
+fn1 ();
+  else
+fn2 ();
+
+  foo (x, 2);
+}

Jakub

Re: Scheduler: Save state at the end of a block

2012-08-13 Thread Andrey Belevantsev


On 13.08.2012 14:32, Bernd Schmidt wrote:

This is a small patch for sched-rgn that attempts to save DFA state at
the end of a basic block and re-use it in successor blocks. This was a
customer-requested optimization; I've not seen it make much of a
difference in any macro benchmarks.


FWIW, this was definitely helpful for sel-sched on ia64, as far as I 
recall, and likewise on some of the smaller tests.


Andrey



Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?


Bernd

Fix PR 53701

2012-08-09 Thread Andrey Belevantsev


Hello,

The problem in question is uncovered by the recent speculation patch, it is 
in the handling of expressions blocked by bookkeeping.  Those are 
expressions that become unavailable due to the newly created bookkeeping 
copies.  In the original algorithm the supported insns and transformations 
cannot lead to this result, but when handling non-separable insns or 
creating speculative checks that unpredictably block certain insns the 
situation can arise.  We just filter out all such expressions from the 
final availability set for correctness.


The PR happens because the expression being filtered out can be transformed 
while being moved up, thus we need to look up not only its exact pattern 
but also all its previous forms saved in its history of changes.  The patch 
does exactly that, I also clarified the comments w.r.t. this situation.


Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, 
too.  OK for trunk?  Also need to backport this to 4.7 with PR 53975, say 
on the next week.


Yours,
Andrey

gcc:
2012-08-09  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/53701
* sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment.
Process not only expr's vinsns but all old vinsns from expr's
history of changes.
(update_and_record_unavailable_insns): Clarify comment.

testsuite:
2012-08-09  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/53701
* gcc.dg/pr53701.c: New test.
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 3099b92..f0c6eaf 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -3564,29 +3564,41 @@ process_use_exprs (av_set_t *av_ptr)
   return NULL;
 }
 
-/* Lookup EXPR in VINSN_VEC and return TRUE if found.  */
+/* Lookup EXPR in VINSN_VEC and return TRUE if found.  Also check patterns from
+   EXPR's history of changes.  */
 static bool
 vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr)
 {
-  vinsn_t vinsn;
+  vinsn_t vinsn, expr_vinsn;
   int n;
+  unsigned i;
 
-  FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn)
-if (VINSN_SEPARABLE_P (vinsn))
-  {
-if (vinsn_equal_p (vinsn, EXPR_VINSN (expr)))
-  return true;
-  }
-else
-  {
-/* For non-separable instructions, the blocking insn can have
-   another pattern due to substitution, and we can't choose
-   different register as in the above case.  Check all registers
-   being written instead.  */
-if (bitmap_intersect_p (VINSN_REG_SETS (vinsn),
-VINSN_REG_SETS (EXPR_VINSN (expr
-  return true;
-  }
+  /* Start with checking expr itself and then proceed with all the old forms
+ of expr taken from its history vector.  */
+  for (i = 0, expr_vinsn = EXPR_VINSN (expr);
+   expr_vinsn;
+   expr_vinsn = (i  VEC_length (expr_history_def,
+ EXPR_HISTORY_OF_CHANGES (expr))
+		 ? VEC_index (expr_history_def,
+  EXPR_HISTORY_OF_CHANGES (expr),
+  i++)-old_expr_vinsn
+		 : NULL))
+FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn)
+  if (VINSN_SEPARABLE_P (vinsn))
+	{
+	  if (vinsn_equal_p (vinsn, expr_vinsn))
+	return true;
+	}
+  else
+	{
+	  /* For non-separable instructions, the blocking insn can have
+	 another pattern due to substitution, and we can't choose
+	 different register as in the above case.  Check all registers
+	 being written instead.  */
+	  if (bitmap_intersect_p (VINSN_REG_SETS (vinsn),
+  VINSN_REG_SETS (expr_vinsn)))
+	return true;
+	}
 
   return false;
 }
@@ -5694,8 +5706,8 @@ update_and_record_unavailable_insns (basic_block book_block)
   || EXPR_TARGET_AVAILABLE (new_expr)
 		 != EXPR_TARGET_AVAILABLE (cur_expr))
 	/* Unfortunately, the below code could be also fired up on
-	   separable insns.
-	   FIXME: add an example of how this could happen.  */
+	   separable insns, e.g. when moving insns through the new
+	   speculation check as in PR 53701.  */
 vinsn_vec_add (vec_bookkeeping_blocked_vinsns, cur_expr);
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr53701.c b/gcc/testsuite/gcc.dg/pr53701.c
new file mode 100644
index 000..2c85223
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr53701.c
@@ -0,0 +1,59 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-options -O3 -fselective-scheduling2 -fsel-sched-pipelining } */
+typedef unsigned short int uint16_t;
+typedef unsigned long int uintptr_t;
+typedef struct GFX_VTABLE
+{
+  int color_depth;
+  unsigned char *line[];
+}
+BITMAP;
+extern int _drawing_mode;
+extern BITMAP *_drawing_pattern;
+extern int _drawing_y_anchor;
+extern unsigned int _drawing_x_mask;
+extern unsigned int _drawing_y_mask;
+extern uintptr_t bmp_write_line (BITMAP *, int);
+  void
+_linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color)
+{
+  int w;
+  if (_drawing_mode == 0)
+  {
+int x, curw;
+unsigned

Re: [patch] one more sched-vis fix

2012-08-06 Thread Andrey Belevantsev


Hi Steven,

On 03.08.2012 12:02, Steven Bosscher wrote:

Hello,

I also need to handle SEQUENCE, of course, or slim dumping fails in
targets with branch delay slots. But who knew SEQUENCE can also appear
as a REG_NOTE value?!

The cfgrtl.c fix is purely cosmetic and obvious.

Bootstrappedtested on powerpc64-unknown-linux-gnu. Committed as obvious.


You can even close PR 47489 as fixed then :)

Andrey



Ciao!
Steven

* sched-vis (print_pattern): Handle SEQUENCE also.
* cfgrtl.c (print_rtl_with_bb): Do not print a newline between insns.

Index: sched-vis.c
===
--- sched-vis.c (revision 190016)
+++ sched-vis.c (working copy)
@@ -610,8 +610,19 @@ print_pattern (char *buf, const_rtx x, int verbose
}
break;
  case SEQUENCE:
-  /* Should never see SEQUENCE codes until after reorg.  */
-  gcc_unreachable ();
+  {
+   int i;
+
+   sprintf (t1, sequence{);
+   for (i = 0; i  XVECLEN (x, 0); i++)
+ {
+   print_pattern (t2, XVECEXP (x, 0, i), verbose);
+   sprintf (t3, %s%s;, t1, t2);
+   strcpy (t1, t3);
+ }
+   sprintf (buf, %s}, t1);
+  }
+  break;
  case ASM_INPUT:
sprintf (buf, asm {%s}, XSTR (x, 0));
break;
Index: cfgrtl.c
===
--- cfgrtl.c(revision 190112)
+++ cfgrtl.c(working copy)
@@ -1958,10 +1958,9 @@ print_rtl_with_bb (FILE *outf, const_rtx rtx_first
   dump_bb_info (outf, bb, 0, dump_flags | TDF_COMMENT,
false, true);
   if (df  (flags  TDF_DETAILS))
 df_dump_bottom (bb, outf);
+ putc ('\n', outf);
 }
 }
-
- putc ('\n', outf);
 }

free (start);

[PATCH] PR 53975

2012-07-31 Thread Andrey Belevantsev


Hello,

This PR is about wrong speculation of an insn that doesn't support storing 
NaT bits done by the selective scheduler (more details in the PR audit 
trail).  The reason for this is the wrong one-liner patch committed last 
year, the fix is to revert that patch and to clarify the comment before the 
patched code.


Bootstrapped and tested on ia64, approved by Alexander offline.  No test as 
I don't know how to check whether an insn got moved through another insn.


Andrey

PR target/53975

* sel-sched-ir.c (has_dependence_note_reg_use): Clarify comment.

Revert
2011-08-04  Sergey Grechanik  mouseent...@ispras.ru

* sel-sched-ir.c (has_dependence_note_reg_use): Call ds_full_merge
only if producer writes to the register given by regno.
Index: gcc/sel-sched-ir.c
===
*** gcc/sel-sched-ir.c	(revision 190004)
--- gcc/sel-sched-ir.c	(revision 190005)
*** has_dependence_note_reg_use (int regno)
*** 3228,3234 
if (reg_last-clobbers)
  	*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
  
!   /* Handle BE_IN_SPEC.  */
if (reg_last-uses)
  	{
  	  ds_t pro_spec_checked_ds;
--- 3228,3238 
if (reg_last-clobbers)
  	*dsp = (*dsp  ~SPECULATIVE) | DEP_ANTI;
  
!   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
! 	 is actually a check insn.  We need to do this for any register
! 	 read-read dependency with the check unless we track properly
! 	 all registers written by BE_IN_SPEC-speculated insns, as
! 	 we don't have explicit dependence lists.  See PR 53975.  */
if (reg_last-uses)
  	{
  	  ds_t pro_spec_checked_ds;
*** has_dependence_note_reg_use (int regno)
*** 3236,3244 
  	  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro);
  	  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);
  
! 	  if (pro_spec_checked_ds != 0
! 	   bitmap_bit_p (INSN_REG_SETS (has_dependence_data.pro), regno))
! 	/* Merge BE_IN_SPEC bits into *DSP.  */
  	*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
    NULL_RTX, NULL_RTX);
  	}
--- 3240,3246 
  	  pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro);
  	  pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds);
  
! 	  if (pro_spec_checked_ds != 0)
  	*dsp = ds_full_merge (*dsp, pro_spec_checked_ds,
    NULL_RTX, NULL_RTX);
  	}

Re: [patch] un-#ifdef GATHER_STATISTICS

2012-07-25 Thread Andrey Belevantsev


On 24.07.2012 21:13, Steven Bosscher wrote:

AFAIR the qsort is just for getting a stable ordering over two pools to find
the leaked regsets afterwards, the type of ordering doesn't matter.


What matters is that the compare function gives a reliable result. You
can't subtract pointers like that for qsort.

After consulting the experts on IRC, I'm going with a fix along the
lines of return (x == y ? 0 : (x  y ? -1 : 1));.


Yeah, I agree the code was dubious, so thanks for uncovering this.  If you 
have already tested the patch, consider it preapproved, otherwise I can fix 
it on this week.





Anyways,
how come this is related to your patch?  We don't use statistics in
sel-sched...  Something got miscompiled?


No, just allocated slightly differently. A bitmap_head is one pointer
bigger than before. I'm unsure how that causes this problem, though. I
suspect you would have seen the same failure with GATHER_STATISTICS
enabled.


Still interesting to know why your first patch fixed the issue in the first 
place.  I will try taking a look at the generated code in my spare time.


Yours,
Andrey



Ciao!
Steven

Re: [patch] un-#ifdef GATHER_STATISTICS

2012-07-24 Thread Andrey Belevantsev


Hello,

On 24.07.2012 18:56, Steven Bosscher wrote:

On Tue, Jul 24, 2012 at 4:37 PM, Steven Bosscherstevenb@gmail.com  wrote:

On Tue, Jul 24, 2012 at 3:08 PM, Uros Bizjakubiz...@gmail.com  wrote:

This patch (r189803) regressed a bunch of tests on x86_64 [1], [2].

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-07/msg02066.html
[2] http://gcc.gnu.org/ml/gcc-regression/2012-07/msg00177.html


These are all selective-scheduler test cases. It looks like qsort is
being used incorrectly.


This seems to fix it for me, but I don't understand why. Apparently, a
pointer subtraction doesn't result in a signed value?? In any case,
the sort on those arrays wasn't correct. Comments?

Index: sel-sched-ir.c
===
--- sel-sched-ir.c  (revision 189808)
+++ sel-sched-ir.c  (working copy)
@@ -954,7 +954,9 @@ return_regset_to_pool (regset rs)
  static int
  cmp_v_in_regset_pool (const void *x, const void *xx)
  {
-  return *((const regset *) x) - *((const regset *) xx);
+  ptrdiff_t d = (ptrdiff_t) *((const regset *) x);
+  ptrdiff_t dd = (ptrdiff_t) *((const regset *) xx);
+  return d - dd;


AFAIR the qsort is just for getting a stable ordering over two pools to 
find the leaked regsets afterwards, the type of ordering doesn't matter. 
Anyways, how come this is related to your patch?  We don't use statistics 
in sel-sched...  Something got miscompiled?


Andrey



  }
  #endif

Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Andrey Belevantsev


On 12.04.2012 18:22, Richard Guenther wrote:

2012/4/12 Andrey Belevantseva...@ispras.ru:

On 12.04.2012 17:54, Richard Guenther wrote:


2012/4/12 Andrey Belevantseva...@ispras.ru:


On 12.04.2012 16:38, Richard Guenther wrote:



On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:



On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.com  wrote:



On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:





Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?




As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency);
otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without
a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.




It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3,
atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.




Why does that not happen without your patch?  Does it never happen
without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?




It does not happen because the scheduler by itself does not do such
specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.



That surprises me.  What is so specific about this reordering?



I mean that the scheduler does things like sort the ready list according to
a number of heuristics and to the insn priority, then choose the insn that
would allow the maximum of ready insns to be issued on the current cycle.
  The heuristics are rather general.  The scheduler does not do things like
if some random insn is ready, then choose some other random insn from the
ready list and schedule it (which is what the patch does). This is what
scheduler hooks are for, to account for some target-specific heuristic.

The problem is that this particular implementation looks somewhat like an
overkill and also motivated by a single benchmark.  Testing on a wider set
of benchmarks and checking compile-time hit would make the motivation more
convincing.


Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.


It is, but the DFA only models the actual stalls that you get when an imul 
is scheduled.  What you need here is to increase priority for ready insns 
that have imuls in their critical paths, and those imuls should be close 
enough to fill in the gap (actual consumers in the case of the patch).


The lookahead done in max_issue can help with this dynamic priority 
change in general, but it considers just the ready insns, not their 
immediate consumers.  You need to make the deeper lookahead if you want to 
do this in a target independent way -- no-no from compile time POV.   A 
number of already existing hooks can help:


- sched_reorder used in this patch just sorts the ready list in any way the 
target wishes;


- adjust_priority/adjust_cost  helps to boost or to lower priority in such 
subtle cases;


- is_costly_dependence (now rs6000 only) can guide the early queue insn 
removal and its addition to the ready list;


- dfa_lookahead_guard (now ia64 only) can ban some insns from being issued 
on the current try.


Using sched_reorder is existing practice, like in s390, spu, mips, etc. 
Usually though an implementation again only considers the actual ready 
insns and not the successors.  I'd say that trying more tests would help, 
or other hooks can be tried, too, but after all this is Atom specific, so 
it's the decision of i386 maintainers.


Andrey



Richard.


Andrey





Igor, why not try different subtler mechanisms like adjust_priority,
which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out
when
you have more than one imul in the ready list?  Don't you want to bump
the
priority of the other imul found?

Andrey





And MD allows us only prefer scheduling of successive IMUL
instructions,
i.e.
If IMUL was just scheduled and ready list

Re: [PATCH] Fix PR 52203 and 52715

2012-04-13 Thread Andrey Belevantsev


Hello,

On 07.03.2012 15:46, Alexander Monakov wrote:



On Wed, 7 Mar 2012, Andrey Belevantsev wrote:


Hello,

This PR is again about insns that are recog'ed as=0 but do not change the
processor state.  As explained in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried experimenting
with an attribute marking those empty insns in MD files and asserting that
all other insns do have reservations.  As this doesn't seem to be interesting,
I give up with the idea, and the below patch makes sel-sched do exactly what
the Haifa scheduler does, i.e. do not count such insns against issue_rate when
modelling clock cycles.


But of course I wrongly microoptimized the decision of whether an insn is 
empty as shown in PR 52715, so the right fix is to check the emptiness 
right before issuing the insn.  Thus, the following patch is really needed 
(tested on ia64 and x86 again), OK?  The new hunk is the last one in the patch.


Andrey

2012-04-13  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/52203
PR rtl-optimization/52715

Revert the 2012-03-07 fix for PR 52203.
* sel-sched.c (reset_sched_cycles_in_current_ebb): Check that
the insn does not modify DFA right before issuing, adjust
issue_rate accordingly.




Tested on ia64 and x86-64, OK for trunk?  No testcase again because of the
amount of flags needed.

Andrey

2012-03-07  Andrey Belevantseva...@ispras.ru

PR rtl-optimization/52203
* sel-sched.c (estimate_insn_cost): New parameter pempty.  Adjust
all callers to pass NULL except ...
(reset_sched_cycles_in_current_ebb): ... here, save the value
in new variable 'empty'.  Increase issue_rate only for
non-empty insns.


This is OK.

Thanks.



diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 4e13230..ce38fa0 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4265,10 +4265,9 @@ invoke_aftermath_hooks (fence_t fence, rtx best_insn, int issue_more)
   return issue_more;
 }
 
-/* Estimate the cost of issuing INSN on DFA state STATE.  Write to PEMPTY
-   true when INSN does not change the processor state.  */
+/* Estimate the cost of issuing INSN on DFA state STATE.  */
 static int
-estimate_insn_cost (rtx insn, state_t state, bool *pempty)
+estimate_insn_cost (rtx insn, state_t state)
 {
   static state_t temp = NULL;
   int cost;
@@ -4278,8 +4277,6 @@ estimate_insn_cost (rtx insn, state_t state, bool *pempty)
 
   memcpy (temp, state, dfa_state_size);
   cost = state_transition (temp, insn);
-  if (pempty)
-*pempty = (memcmp (temp, state, dfa_state_size) == 0);
 
   if (cost  0)
 return 0;
@@ -4310,7 +4307,7 @@ get_expr_cost (expr_t expr, fence_t fence)
 	return 0;
 }
   else
-return estimate_insn_cost (insn, FENCE_STATE (fence), NULL);
+return estimate_insn_cost (insn, FENCE_STATE (fence));
 }
 
 /* Find the best insn for scheduling, either via max_issue or just take
@@ -7023,7 +7020,7 @@ reset_sched_cycles_in_current_ebb (void)
 {
   int cost, haifa_cost;
   int sort_p;
-  bool asm_p, real_insn, after_stall, all_issued, empty;
+  bool asm_p, real_insn, after_stall, all_issued;
   int clock;
 
   if (!INSN_P (insn))
@@ -7050,7 +7047,7 @@ reset_sched_cycles_in_current_ebb (void)
 	haifa_cost = 0;
 	}
   else
-haifa_cost = estimate_insn_cost (insn, curr_state, empty);
+haifa_cost = estimate_insn_cost (insn, curr_state);
 
   /* Stall for whatever cycles we've stalled before.  */
   after_stall = 0;
@@ -7084,7 +7081,7 @@ reset_sched_cycles_in_current_ebb (void)
   if (!after_stall
real_insn
haifa_cost  0
-   estimate_insn_cost (insn, curr_state, NULL) == 0)
+   estimate_insn_cost (insn, curr_state) == 0)
 break;
 
   /* When the data dependency stall is longer than the DFA stall,
@@ -7096,7 +7093,7 @@ reset_sched_cycles_in_current_ebb (void)
   if ((after_stall || all_issued)
real_insn
haifa_cost == 0)
-haifa_cost = estimate_insn_cost (insn, curr_state, NULL);
+haifa_cost = estimate_insn_cost (insn, curr_state);
 }
 
 	  haifa_clock += i;
@@ -7127,8 +7124,14 @@ reset_sched_cycles_in_current_ebb (void)
 
   if (real_insn)
 	{
+	  static state_t temp = NULL;
+
+	  if (!temp)
+	temp = xmalloc (dfa_state_size);
+	  memcpy (temp, curr_state, dfa_state_size);
+
 	  cost = state_transition (curr_state, insn);
-	  if (!empty)
+	  if (memcmp (temp, curr_state, dfa_state_size))
 	issued_insns++;
 
   if (sched_verbose = 2)

Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Andrey Belevantsev


On 13.04.2012 14:18, Igor Zamyatin wrote:

On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantseva...@ispras.ru  wrote:

On 12.04.2012 16:38, Richard Guenther wrote:


On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:


On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.comwrote:


On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:




Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?



As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.



It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.



Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?



It does not happen because the scheduler by itself does not do such specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.

Igor, why not try different subtler mechanisms like adjust_priority, which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
you have more than one imul in the ready list?  Don't you want to bump the
priority of the other imul found?


Could you provide some examples when this patch would harm the performance?


I thought of the cases when the other ready insns can fill up the hole and 
that would be more beneficial because e.g. they would be on more critical 
paths than the producer of your second imul.  I don't know enough of Atom 
to give an example -- maybe some long divisions?




Sched_reorder was chosen since it is used in other ports and looks
most suitable for such case, e.g. it provides access to the whole
ready list.
BTW, just increasing producer's priority seems to be more risky in
performance sense - we can incorrectly start delaying some
instructions.


Yes, but exactly because of the above example you can start incorrectly 
delaying other insns, too, as you force the insn to be the first in the 
list.  While bumping priority still leaves the scheduler sorting heuristics 
in place and actually lowers that risk.




Thought ready list doesn't contain DEBUG_INSN... Is it so? If it
contains them - this could be added easily


It does, but I'm not sure the sched_reorder hook gets them or they are 
immediately removed -- I saw similar checks in one of the targets' hooks.


Anyways, my main thought was that it is better to test on more benchmarks 
to alleviate the above concerns, so as long as the i386 maintainers are 
happy, I don't see major problems here.  A good idea could be to generalize 
the patch to handle other long latency insns as second consumers, not only 
imuls, if this is relevant for Atom.


Andrey



The case with more than one imul in the ready list wasn't considered
just because the goal was to catch the particular case when there is a
risk to get the following picture: imul-producer-imul which is less
effective than producer-imul-imul for Atom



Andrey





And MD allows us only prefer scheduling of successive IMUL instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did
you
analyze why the scheduler does not properly schedule things for you?

Richard.



  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander





Thanks,
Igor

Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Andrey Belevantsev


On 12.04.2012 16:38, Richard Guenther wrote:

On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com  wrote:

On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.com  wrote:

On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru  wrote:



Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
behavior?


As I understand from Intel's optimization reference manual, the behavior is as
follows: if the instruction immediately following IMUL has shorter latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
4-or-more cycles latency instruction can be issued after IMUL without a stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.


It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.


Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?


It does not happen because the scheduler by itself does not do such 
specific reordering.  That said, it is easy to imagine the cases where this 
patch will make things worse rather than better.


Igor, why not try different subtler mechanisms like adjust_priority, which 
is get called when an insn is added to the ready list?  E.g. increase the 
producer's priority.


The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when 
you have more than one imul in the ready list?  Don't you want to bump the 
priority of the other imul found?


Andrey




And MD allows us only prefer scheduling of successive IMUL instructions, i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did you
analyze why the scheduler does not properly schedule things for you?

Richard.



 From reading the patch, I could not understand the link between pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander

Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Andrey Belevantsev


On 12.04.2012 17:54, Richard Guenther wrote:

2012/4/12 Andrey Belevantseva...@ispras.ru:

On 12.04.2012 16:38, Richard Guenther wrote:


On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:


On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.comwrote:


On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:




Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?



As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.



It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.



Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?



It does not happen because the scheduler by itself does not do such specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.


That surprises me.  What is so specific about this reordering?


I mean that the scheduler does things like sort the ready list according 
to a number of heuristics and to the insn priority, then choose the insn 
that would allow the maximum of ready insns to be issued on the current 
cycle.  The heuristics are rather general.  The scheduler does not do 
things like if some random insn is ready, then choose some other random 
insn from the ready list and schedule it (which is what the patch does). 
This is what scheduler hooks are for, to account for some target-specific 
heuristic.


The problem is that this particular implementation looks somewhat like an 
overkill and also motivated by a single benchmark.  Testing on a wider set 
of benchmarks and checking compile-time hit would make the motivation more 
convincing.


Andrey




Igor, why not try different subtler mechanisms like adjust_priority, which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
you have more than one imul in the ready list?  Don't you want to bump the
priority of the other imul found?

Andrey





And MD allows us only prefer scheduling of successive IMUL instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did
you
analyze why the scheduler does not properly schedule things for you?

Richard.



  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander

Re: [SMS] Support new loop pattern

2012-04-10 Thread Andrey Belevantsev


Hello Ayal,

First of all, thanks for your feedback.  Now to your questions:

On 31.03.2012 3:20, Ayal Zaks wrote:

Roman, Andrey,

Sorry for the delayed response.

It would indeed be good to have SMS apply to more loop patterns, still
within the realm of *countable* loops. SMS was originally designed to
handle doloops, with a specific pattern controlling the loop, easily
identified and separable from the loop's body. The newly proposed
change to support new loop patterns is pretty invasive and sizable,
taking place entirely within modulo-sched.c. The main issue I've been
considering, is whether it would be possible instead to transform the
new loop patterns we want SMS to handle, into doloops (potentially
introducing additional induction variables to feed other uses), and
then feed the resulting loop into SMS as is? In other words, could you
fold it into doloop.c? And if so, will doing so introduce significant
overheads?


Let me perhaps explain better.  The patch itself is one core patch (this 
thread) adding the new functionality on detecting more complex loop 
patterns and the three fixes to SMS found while working on the main patch 
(the fixes are in the mails pinged at the very end of this message).  The 
three fixes are worthwhile to commit separately anyways, they are splitted 
up from the main patch for this purpose, so I would suggest to consider 
them in any case.


For the main patch, its core is as small as we could get.  It stays with 
the countable loops as for the cases where we could get overflow behavior 
or infinite loops we bail out early.  We handle only a case of simple 
same-step affine counters.  The main reason why we add support to SMS and 
not to the doloop pass are is when we do not pipeline a loop newly 
transformed to the doloop form, this loop actually slows down on the 
platforms not having a true doloop pattern.  One has to undo the doloop 
form and to get back to the original loop form to avoid this, which seems 
rather strange.  Also, the separate decrement insn that changes the 
induction variable is better be scheduled to get more precise schedule. 
And yes, I believe that making an extra induction variable just to have the 
control part without uses in the loop core will be unnecessary overhead.


Thus, I believe that if we do want SMS to handle more complex loop, then it 
is inevitable that SMS itself would be somewhat more complex.  I would 
welcome your suggestions to make the patch more clear.  One way I see is 
that the function for getting the condition of the new loop form can be 
moved to the generic RTL loop code given the agreement of other RTL 
maintainers.  Also, some new helpers can be introduced for handling this 
specific loop forms.  But it seems that the distinction between 
doloop/non-doloop loops has to stay in the code.


Yours,
Andrey


2012/3/29 Andrey Belevantseva...@ispras.ru:

Hello,

I'd like to ping again those SMS patches once we're back to Stage 1.

Ayal, maybe it would remove some burden for you if you'd review the general
SMS functionality of those patches, and we'd ask RTL folks to look at the
pieces related to RTL pattern matching and generation?



It definitely would ... especially in light of the above issue.
Thanks (for your patches, patience, pings..),
Ayal.




Yours,
Andrey


On 10.02.2012 16:15, Roman Zhuykov wrote:


Ping.
Ayal, please review this patch and these three patches too:
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html

--
Roman Zhuykov
zhr...@ispras.ru

Re: [PATCH, RTL] Fix PR 51106

2012-04-04 Thread Andrey Belevantsev


On 03.04.2012 13:36, Jakub Jelinek wrote:

On Mon, Apr 02, 2012 at 06:56:25PM +0400, Andrey Belevantsev wrote:

After Richi's RTL generation related cleanups went it, the extra
cleanup_cfg call was added so we are no longer lucky to have the
proper fallthru edge in this case.  The PR trail has the patch to
purge_dead_edges making it consider this situation, but I still
prefer the below patch that fixes only the invalid asm case.  The
reason is that I think it unlikely that after initial RTL expansion
(of which the instantiate virtual regs pass seems to be the part) we
will get the problematic situation.  However, I'm happy to test the
PR trail patch, too.


I don't like blindly changing edge into FALLTHRU, generally the edge
could be abnormal, or EH, etc. and making that fallthru is not a good idea.
I'd prefer if wherever the fallthru edge is removed the other normal edge(s)
are adjusted.


Well, as I mentioned in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51106#c18 the removal happens 
in try_optimize_cfg around line 2617, that's the code that deals with 
removing trivially empty block (in the PR case, the succ of the asm goto 
block is trivially empty).  After that we have an asm goto acting as an 
unconditional jump with a single succ edge and no fallthru bit, which seems 
perfectly fine.  We get into trouble when we actually realize that the asm 
is bogus.  Thus I've tried to fix it up at this time.  The options that we 
briefly discussed on IRC with Richi are as follows:


- Fix up try_optimize_cfg in the case of asm gotos, but it is not clear to 
me how do we do this -- we don't yet distinguish between good and bad asm 
goto at this point;


- Fix up function.c as I did but make it more robust.  Either handle more 
cases with strange edges (EH seems also possible after introducing the 
throw attribute for asms), or just remove the asm and insert the 
unconditional jump pointing to the place where the asm was, retaining all 
the flags;


- Fix up purge_dead_edges. as I did initially in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51106#c16.  When we find a 
situation of no jump at the end of block and no fallthru edge, assert this 
happens only with single_succ_p and actually make this edge fallthru. 
(Probably also watch out whether we need to make it fake or whatever.)  Or 
as Richi tried, just accept the situation of having no successors here, as 
in -- no fallthru edge on entry to purge_dead_edges and no jump means no 
successors, period.


I think that just nobody deleted unconditional jumps with 
delete_insn_and_edges previously, otherwise I don't understand why this did 
not trigger.


Thoughts?

Andrey



Jakub

Re: [PATCH, RTL] Fix PR 51106

2012-04-02 Thread Andrey Belevantsev


Hello,

On 19.01.2012 18:40, Jakub Jelinek wrote:

On Thu, Jan 19, 2012 at 06:13:58PM +0400, Andrey Belevantsev wrote:

2012-01-19  Andrey Belevantseva...@ispras.ru

PR target/51106
* function.c (instantiate_virtual_regs_in_insn): Use
delete_insn_and_edges when removing a wrong asm insn.


This is ok for trunk.



After Richi's RTL generation related cleanups went it, the extra 
cleanup_cfg call was added so we are no longer lucky to have the proper 
fallthru edge in this case.  The PR trail has the patch to purge_dead_edges 
making it consider this situation, but I still prefer the below patch that 
fixes only the invalid asm case.  The reason is that I think it unlikely 
that after initial RTL expansion (of which the instantiate virtual regs 
pass seems to be the part) we will get the problematic situation.  However, 
I'm happy to test the PR trail patch, too.


Tested fine on x86-64, ok for trunk?

Andrey

2012-04-02  Andrey Belevantsev  a...@ispras.ru

PR target/51106
PR middle-end/52650
* function.c (instantiate_virtual_regs_in_insn): Make sure to set
the proper fallthru bits when removing a wrong jump asm.

diff --git a/gcc/function.c b/gcc/function.c
index 3e903ef..a2638bb 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -1730,6 +1730,15 @@ instantiate_virtual_regs_in_insn (rtx insn)
   if (!check_asm_operands (PATTERN (insn)))
{
  error_for_asm (insn, impossible constraint in %asm%);
+ if (JUMP_P (insn))
+   {
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ edge e;
+
+ if (single_succ_p (bb)
+  !((e = single_succ_edge (bb))-flags  EDGE_FALLTHRU))
+   e-flags |= EDGE_FALLTHRU;
+   }
  delete_insn_and_edges (insn);
}
 }

Re: [SMS] Support new loop pattern

2012-03-29 Thread Andrey Belevantsev


Hello,

I'd like to ping again those SMS patches once we're back to Stage 1.

Ayal, maybe it would remove some burden for you if you'd review the general 
SMS functionality of those patches, and we'd ask RTL folks to look at the 
pieces related to RTL pattern matching and generation?


Yours,
Andrey

On 10.02.2012 16:15, Roman Zhuykov wrote:

Ping.
Ayal, please review this patch and these three patches too:
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html

--
Roman Zhuykov
zhr...@ispras.ru

[PATCH] Fix PR 52203

2012-03-07 Thread Andrey Belevantsev


Hello,

This PR is again about insns that are recog'ed as =0 but do not change the 
processor state.  As explained in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried 
experimenting with an attribute marking those empty insns in MD files and 
asserting that all other insns do have reservations.  As this doesn't seem 
to be interesting, I give up with the idea, and the below patch makes 
sel-sched do exactly what the Haifa scheduler does, i.e. do not count such 
insns against issue_rate when modelling clock cycles.


Tested on ia64 and x86-64, OK for trunk?  No testcase again because of the 
amount of flags needed.


Andrey

2012-03-07  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/52203
* sel-sched.c (estimate_insn_cost): New parameter pempty.  Adjust
all callers to pass NULL except ...
(reset_sched_cycles_in_current_ebb): ... here, save the value
in new variable 'empty'.  Increase issue_rate only for
non-empty insns.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 2af01ae..2829f60 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4265,9 +4265,10 @@ invoke_aftermath_hooks (fence_t fence, rtx 
best_insn, int issue_more)

   return issue_more;
 }

-/* Estimate the cost of issuing INSN on DFA state STATE.  */
+/* Estimate the cost of issuing INSN on DFA state STATE.  Write to PEMPTY
+   true when INSN does not change the processor state.  */
 static int
-estimate_insn_cost (rtx insn, state_t state)
+estimate_insn_cost (rtx insn, state_t state, bool *pempty)
 {
   static state_t temp = NULL;
   int cost;
@@ -4277,6 +4278,8 @@ estimate_insn_cost (rtx insn, state_t state)

   memcpy (temp, state, dfa_state_size);
   cost = state_transition (temp, insn);
+  if (pempty)
+*pempty = (memcmp (temp, state, dfa_state_size) == 0);

   if (cost  0)
 return 0;
@@ -4307,7 +4310,7 @@ get_expr_cost (expr_t expr, fence_t fence)
return 0;
 }
   else
-return estimate_insn_cost (insn, FENCE_STATE (fence));
+return estimate_insn_cost (insn, FENCE_STATE (fence), NULL);
 }

 /* Find the best insn for scheduling, either via max_issue or just take
@@ -7020,7 +7023,7 @@ reset_sched_cycles_in_current_ebb (void)
 {
   int cost, haifa_cost;
   int sort_p;
-  bool asm_p, real_insn, after_stall, all_issued;
+  bool asm_p, real_insn, after_stall, all_issued, empty;
   int clock;

   if (!INSN_P (insn))
@@ -7047,7 +7050,7 @@ reset_sched_cycles_in_current_ebb (void)
haifa_cost = 0;
}
   else
-haifa_cost = estimate_insn_cost (insn, curr_state);
+haifa_cost = estimate_insn_cost (insn, curr_state, empty);

   /* Stall for whatever cycles we've stalled before.  */
   after_stall = 0;
@@ -7081,7 +7084,7 @@ reset_sched_cycles_in_current_ebb (void)
   if (!after_stall
real_insn
haifa_cost  0
-   estimate_insn_cost (insn, curr_state) == 0)
+   estimate_insn_cost (insn, curr_state, NULL) == 0)
 break;

   /* When the data dependency stall is longer than the DFA stall,
@@ -7093,7 +7096,7 @@ reset_sched_cycles_in_current_ebb (void)
   if ((after_stall || all_issued)
real_insn
haifa_cost == 0)
-haifa_cost = estimate_insn_cost (insn, curr_state);
+haifa_cost = estimate_insn_cost (insn, curr_state, NULL);
 }

  haifa_clock += i;
@@ -7125,7 +7128,8 @@ reset_sched_cycles_in_current_ebb (void)
   if (real_insn)
{
  cost = state_transition (curr_state, insn);
- issued_insns++;
+ if (!empty)
+   issued_insns++;

   if (sched_verbose = 2)
{

[PATCH] [4.7?] Fix PR 52250

2012-03-05 Thread Andrey Belevantsev


Hello,

The PR is about selective scheduling not trying hard enough to find a 
neighbor block to stick the note list from an empty block being removed. 
Fixed by a) trying harder and b) making sure that if we fail to find the 
right place, we just drop the notes instead of asserting.


Tested on x86-64 and ia64, approved by Alexander offline.  No test as it 
has quite insane number of flags that would likely stop reproducing the 
core issue in a few weeks.


Richard, Jakub, I can commit this to trunk today or to wait until the 
release.  The patch is very safe and so can be committed in case we will 
need an RC2, if you want one bug less in a release, but I would feel 
equally happy to postpone it until 4.7.1 as the problematic situation is 
quite rare (and probably requires pipelining outer loops to trigger which 
is disabled with ia64/-O3).


Yours,
Andrey

2012-03-05  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/52250
* sel-sched-ir.c (maybe_tidy_empty_bb): Try harder to find a bb
to put note list into.  Unconditionally call move_bb_info.
(move_bb_info): Do not assert the blocks being in the same region,
just drop the note list if they are not.


diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index a93cd68..c53d2e1 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3658,7 +3658,7 @@ sel_recompute_toporder (void)
 static bool
 maybe_tidy_empty_bb (basic_block bb)
 {
-  basic_block succ_bb, pred_bb;
+  basic_block succ_bb, pred_bb, note_bb;
   VEC (basic_block, heap) *dom_bbs;
   edge e;
   edge_iterator ei;
@@ -3697,6 +3697,17 @@ maybe_tidy_empty_bb (basic_block bb)
   pred_bb = NULL;
   dom_bbs = NULL;

+  /* Save a pred/succ from the current region to attach the notes to.  */
+  note_bb = NULL;
+  FOR_EACH_EDGE (e, ei, bb-preds)
+if (in_current_region_p (e-src))
+  {
+   note_bb = e-src;
+   break;
+  }
+  if (note_bb == NULL)
+note_bb = succ_bb;
+
   /* Redirect all non-fallthru edges to the next bb.  */
   while (rescan_p)
 {
@@ -3746,10 +3757,8 @@ maybe_tidy_empty_bb (basic_block bb)
   else
 {
   /* This is a block without fallthru predecessor.  Just delete it.  */
-  gcc_assert (pred_bb != NULL);
-
-  if (in_current_region_p (pred_bb))
-   move_bb_info (pred_bb, bb);
+  gcc_assert (note_bb);
+  move_bb_info (note_bb, bb);
   remove_empty_bb (bb, true);
 }

@@ -5231,10 +5240,9 @@ sel_remove_bb (basic_block bb, bool remove_from_cfg_p)
 static void
 move_bb_info (basic_block merge_bb, basic_block empty_bb)
 {
-  gcc_assert (in_current_region_p (merge_bb));
-
-  concat_note_lists (BB_NOTE_LIST (empty_bb),
-BB_NOTE_LIST (merge_bb));
+  if (in_current_region_p (merge_bb))
+concat_note_lists (BB_NOTE_LIST (empty_bb),
+  BB_NOTE_LIST (merge_bb));
   BB_NOTE_LIST (empty_bb) = NULL_RTX;

 }

Re: [PATCH] Fix PR 51505

2012-01-30 Thread Andrey Belevantsev


On 30.01.2012 11:38, Paolo Bonzini wrote:

On 01/29/2012 04:09 PM, Eric Botcazou wrote:

As discussed in Bugzilla, this is the patch implementing Paolo's
suggestion of killing REG_EQUAL/REG_EQUIV notes from df_kill_notes. The
code assumes there is at most one such note per insn.


That's wrong though and wreaks havoc during reload, e.g.:

(insn 169 60 62 4 (set (reg:TF 158)
(mem/c:TF (plus:SI (reg/f:SI 101 %sfp)
(const_int -16 [0xfff0])) [3 S16 A64]))
960513-1.c:13 97 {*movtf_insn_sp32}
(expr_list:REG_EQUIV (mem/c:TF (plus:SI (reg/f:SI 101 %sfp)
(const_int -16 [0xfff0])) [3 S16 A64])
(expr_list:REG_EQUAL (mult:TF (reg/v:TF 110 [ d ])
(reg:TF 154))
(nil

because the REG_EQUIV note disappears behind reload's back and it isn't
prepared for that. This is the cause of the following regression on SPARC:

FAIL: gcc.c-torture/execute/960513-1.c execution, -Os


As well as:

FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O2
FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O3 -g
FAIL: gcc.c-torture/execute/stdarg-2.c execution, -Os
FAIL: gcc.c-torture/execute/stdarg-2.c
execution, -O2 -flto -flto-partition=none
FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O2 -flto

for the exact same reason.


Does this help?


That would fix the problem of multiple notes per insn (as we wanted to do 
initially), but I didn't understand whether this is the real problem or the 
problem is the reload not happy with disappearing notes.  Also I can't 
reproduce it with a cross to sparc64-linux -- when I put a breakpoint on 
the code removing the notes, I find only similarly looking insn 148 which 
gets removed from the df_analyze call at the start of IRA.  Though I see 
the fail from SPARC test results on the ML, so I guess I'm missing something...


Andrey



Paolo

Re: [PATCH] Fix PR 51505

2012-01-30 Thread Andrey Belevantsev


On 30.01.2012 17:47, Paolo Bonzini wrote:

On 01/30/2012 09:44 AM, Andrey Belevantsev wrote:


Does this help?


That would fix the problem of multiple notes per insn (as we wanted to
do initially), but I didn't understand whether this is the real problem
or the problem is the reload not happy with disappearing notes. Also I
can't reproduce it with a cross to sparc64-linux -- when I put a
breakpoint on the code removing the notes, I find only similarly looking
insn 148 which gets removed from the df_analyze call at the start of
IRA. Though I see the fail from SPARC test results on the ML, so I
guess I'm missing something...


The REG_EQUAL note can go, but the REG_EQUIV note should not (by
definition: they are valid throughout the entire function). In fact, we
could just as well apply the loop to REG_EQUAL notes only but that would
have been a bit too clever and more risky.


Eric, Paolo, thanks for the explanations!

Andrey

Re: [PATCH] Fix PR 51389

2012-01-27 Thread Andrey Belevantsev


On 25.01.2012 18:21, Richard Guenther wrote:

2012/1/25 Andrey Belevantseva...@ispras.ru:

...

Sure, I've just had the impression that the datarefs vector is no use
without the dependencies themselves.  The possibility of making
find_data_references_in_loop static also kind of hints in this direction.
Anyways, I don't have any problem with fixing compute_all_dependences
instead.


I'd prefer that.


The below tests fine and passes the test case.  compute_all_dependences now 
returns bool and has the hunk from compute_data_dependences_for_loop for 
creating a single dont-know datadep.  The only caller that needs to be 
fixed is compute_data_dependences_for_bb, others look fine.


Tree prefetching has its own knob for the same goal, 
PREFETCH_MAX_MEM_REFS_PER_LOOP, which is 200.  Do we want to remove it, to 
change it to the new parameter (might be too big), or to leave it as is?


(Interesting enough, with data deps fixed the most time on the test case is 
taken by RTL PRE, 27%.)


Andrey


2012-01-27  Andrey Belevantsev  a...@ispras.ru

PR middle-end/51389
* Makefile.in (tree-data-ref.o): Depend on $(PARAMS_H).
* tree-data-ref.h (find_data_references_in_loop): Remove declaration.
* tree-data-ref.c (find_data_references_in_loop): Make static.
(compute_all_dependences): Change return type to bool.  Bail out
for too many datarefs in a loop.  Move the hunk resetting the data
dependences vector from ...
(compute_data_dependences_for_loop): ... here.  Account for
compute_all_dependences returning false.
(compute_data_dependences_for_bb): Likewise.

* params.def (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS): New param.
* doc/invoke.texi (loop-max-datarefs-for-datadeps): Document it.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f450d3e..43295aa 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2598,7 +2598,7 @@ tree-scalar-evolution.o : tree-scalar-evolution.c $(CONFIG_H) $(SYSTEM_H) \
$(TREE_PASS_H) $(PARAMS_H) gt-tree-scalar-evolution.h
 tree-data-ref.o : tree-data-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
gimple-pretty-print.h $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
-   $(TREE_PASS_H) langhooks.h tree-affine.h
+   $(TREE_PASS_H) langhooks.h tree-affine.h $(PARAMS_H)
 sese.o : sese.c sese.h $(CONFIG_H) $(SYSTEM_H) coretypes.h tree-pretty-print.h \
$(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) tree-pass.h value-prof.h
 graphite.o : graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DIAGNOSTIC_CORE_H) \
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e3d3789..8a916ef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9115,6 +9115,13 @@ with more basic blocks than this parameter won't have loop invariant
 motion optimization performed on them.  The default value of the
 parameter is 1000 for -O1 and 1 for -O2 and above.
 
+@item loop-max-datarefs-for-datadeps
+Building data dapendencies is expensive for very large loops.  This
+parameter limits the number of data references in loops that are
+considered for data dependence analysis.  These large loops will not
+be handled then by the optimizations using loop data dependencies.
+The default value is 1000.
+
 @item max-vartrack-size
 Sets a maximum number of hash table slots to use during variable
 tracking dataflow analysis of any function.  If this limit is exceeded
diff --git a/gcc/params.def b/gcc/params.def
index 239b684..d7cdd7b 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -820,6 +820,12 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
 	  maximum number of basic blocks per function to be analyzed by Graphite,
 	  100, 0, 0)
 
+/* Avoid data dependence analysis on very large loops.  */
+DEFPARAM (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS,
+	  loop-max-datarefs-for-datadeps,
+	  Maximum number of datarefs in loop for building loop data dependencies,
+	  1000, 0, 0)
+
 /* Avoid doing loop invariant motion on very large loops.  */
 
 DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 09929c7..b8dfa31 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include tree-pass.h
 #include langhooks.h
 #include tree-affine.h
+#include params.h
 
 static struct datadep_stats
 {
@@ -4134,9 +4135,10 @@ compute_affine_dependence (struct data_dependence_relation *ddr,
 /* Compute in DEPENDENCE_RELATIONS the data dependence graph for all
the data references in DATAREFS, in the LOOP_NEST.  When
COMPUTE_SELF_AND_RR is FALSE, don't compute read-read and self
-   relations.  */
+   relations.  Return true when successful, i.e. data references number
+   is small enough to be handled.  */
 
-void
+bool
 compute_all_dependences (VEC (data_reference_p, heap) *datarefs,
 			 VEC (ddr_p, heap) **dependence_relations,
 			 VEC (loop_p, heap) *loop_nest,
@@ -4146,6 +4148,18

[PATCH] Fix PR 51389

2012-01-25 Thread Andrey Belevantsev


Hello,

In this PR data dependence analysis goes wild by trying to process 20k 
datarefs, so the patch limits the number of datarefs per loop we handle to 
1000 via a param.  On the way find_data_references_in_loop is made static 
and predcom/parloops are fixed for compute_data_dependences_for_loop 
returning false.


Bootstrapped and tested on x86_64-linux, no testcase as it is really a 
memory hog.  Strictly speaking, this could be a regression only from the 
time when -O3 didn't have predcom/vectorization turned on by default, so -- 
ok for trunk or 4.8?  Branches?


Andrey

2012-01-25  Andrey Belevantsev  a...@ispras.ru

PR middle-end/51389

* Makefile.in (tree-data-ref.o): Depend on $(PARAMS_H).
* tree-data-ref.c (find_data_references_in_loop): Make static.
Bail out for too many datarefs in a loop.
* tree-data-ref.h (find_data_references_in_loop): Remove declaration.
* tree-predcom.c (tree_predictive_commoning_loop): Bail out when
compute_data_dependences_for_loop returns false.
* tree-parloops.c (loop_parallel_p): Likewise.
* params.def (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP): New param.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f450d3e..43295aa 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2598,7 +2598,7 @@ tree-scalar-evolution.o : tree-scalar-evolution.c 
$(CONFIG_H) $(SYSTEM_H) \

$(TREE_PASS_H) $(PARAMS_H) gt-tree-scalar-evolution.h
 tree-data-ref.o : tree-data-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
gimple-pretty-print.h $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
-   $(TREE_PASS_H) langhooks.h tree-affine.h
+   $(TREE_PASS_H) langhooks.h tree-affine.h $(PARAMS_H)
 sese.o : sese.c sese.h $(CONFIG_H) $(SYSTEM_H) coretypes.h 
tree-pretty-print.h \

$(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) tree-pass.h value-prof.h
 graphite.o : graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h 
$(DIAGNOSTIC_CORE_H) \

diff --git a/gcc/params.def b/gcc/params.def
index 239b684..e50fa26 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -820,6 +820,12 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
  maximum number of basic blocks per function to be analyzed by 
Graphite,
  100, 0, 0)

+/* Avoid data dependence analysis on very large loops.  */
+DEFPARAM (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP,
+ datadeps-max-datarefs-in-loop,
+ Maximum number of datarefs in loop for loop data dependence 
analysis,
+ 1000, 0, 0)
+
 /* Avoid doing loop invariant motion on very large loops.  */

 DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a0d86ec..5a7659a 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include tree-pass.h
 #include langhooks.h
 #include tree-affine.h
+#include params.h

 static struct datadep_stats
 {
@@ -4338,7 +4339,7 @@ find_data_references_in_bb (struct loop *loop, 
basic_block bb,

TODO: This function should be made smarter so that it can handle address
arithmetic as if they were array accesses, etc.  */

-tree
+static tree
 find_data_references_in_loop (struct loop *loop,
  VEC (data_reference_p, heap) **datarefs)
 {
@@ -4351,7 +4352,9 @@ find_data_references_in_loop (struct loop *loop,
 {
   bb = bbs[i];

-  if (find_data_references_in_bb (loop, bb, datarefs) == chrec_dont_know)
+  if (find_data_references_in_bb (loop, bb, datarefs) == chrec_dont_know
+ || ((int) VEC_length (data_reference_p, *datarefs)
+  PARAM_VALUE (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP)))
 {
   free (bbs);
   return chrec_dont_know;
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 0f12962..66258ab 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -394,8 +394,6 @@ extern bool compute_data_dependences_for_loop (struct 
loop *, bool,

 extern bool compute_data_dependences_for_bb (basic_block, bool,
  VEC (data_reference_p, heap) **,
  VEC (ddr_p, heap) **);
-extern tree find_data_references_in_loop (struct loop *,
-  VEC (data_reference_p, heap) **);
 extern void print_direction_vector (FILE *, lambda_vector, int);
 extern void print_dir_vectors (FILE *, VEC (lambda_vector, heap) *, int);
 extern void print_dist_vectors (FILE *, VEC (lambda_vector, heap) *, int);
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 339ddcc..c5b7655 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -387,8 +387,14 @@ loop_parallel_p (struct loop *loop, struct obstack * 
parloop_obstack)

   datarefs = VEC_alloc (data_reference_p, heap, 10);
   dependence_relations = VEC_alloc (ddr_p, heap, 10 * 10);
   loop_nest = VEC_alloc (loop_p, heap, 3);
-  compute_data_dependences_for_loop (loop, true, loop_nest

[PATCH] Fix PR 48374

2012-01-25 Thread Andrey Belevantsev


Hello,

This patch fixes another problem with sel-sched not happy having bbs with 
zero successors.  Bootstrapped and tested on x86_64/linux.


Again, this is not a regression as __builtin_unreachable did not exist 
before sel-sched, but the patch is very safe and obvious.  It is my fault 
not committing this earlier (the patch in the audit trail was created in 
April 2011).  What do release managers think?


Andrey

gcc/
2012-01-25  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/48374
* sel-sched-ir.h (get_all_loop_exits): Check for zero successors.

testsuite/
2012-01-25  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/48374
* gcc.dg/pr48374.c: New test.

diff --git a/gcc/sel-sched-ir.h b/gcc/sel-sched-ir.h
index c8f8be6..ede08e4 100644
--- a/gcc/sel-sched-ir.h
+++ b/gcc/sel-sched-ir.h
@@ -1119,7 +1119,8 @@ get_all_loop_exits (basic_block bb)
   /* If bb is empty, and we're skipping to loop exits, then
  consider bb as a possible gate to the inner loop now.  */
   while (sel_bb_empty_or_nop_p (bb)
- in_current_region_p (bb))
+ in_current_region_p (bb)
+ EDGE_COUNT (bb-succs)  0)
 {
   bb = single_succ (bb);

diff --git a/gcc/testsuite/gcc.dg/pr48374.c b/gcc/testsuite/gcc.dg/pr48374.c
new file mode 100644
index 000..24826d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr48374.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options -O -fschedule-insns2 -fsel-sched-pipelining 
-fsel-sched-pipelining-outer-loops -fselective-scheduling2 --param 
max-sched-extend-regions-iters=2  } */

+
+void foo (int y)
+{
+  switch (y)
+{
+case 3:
+case 5:
+case 7:
+case 11:
+  break;
+default:
+  __builtin_unreachable ();
+}
+}
+

Re: [PATCH] Fix PR 51389

2012-01-25 Thread Andrey Belevantsev


On 25.01.2012 16:38, Richard Guenther wrote:

2012/1/25 Andrey Belevantseva...@ispras.ru:

Hello,

In this PR data dependence analysis goes wild by trying to process20k
datarefs, so the patch limits the number of datarefs per loop we handle to
1000 via a param.  On the way find_data_references_in_loop is made static
and predcom/parloops are fixed for compute_data_dependences_for_loop
returning false.

Bootstrapped and tested on x86_64-linux, no testcase as it is really a
memory hog.  Strictly speaking, this could be a regression only from the
time when -O3 didn't have predcom/vectorization turned on by default, so --
ok for trunk or 4.8?  Branches?


You limit the number of data references we find - but that isn't really
the scalability issue (it's linear in the number of stmts).  It's really
compute_all_dependences that hits the quadraticness, thus _that_
function should fail instead (I realize it doesn't return any success/failure
state yet, but the number of callers is limited).
Sure, I've just had the impression that the datarefs vector is no use 
without the dependencies themselves.  The possibility of making 
find_data_references_in_loop static also kind of hints in this direction. 
Anyways, I don't have any problem with fixing compute_all_dependences instead.



Btw, new params need documentation in doc/invoke.texi.

Bah, I convinced myself params.def was the only documentation :-)


The tree-predcom.c and tree-parloops.c changes are ok for trunk anyway.

I will commit them separately.

Andrey

[PATCH, RTL] Fix PR 51106

2012-01-19 Thread Andrey Belevantsev


Hello,

As discussed with Jakub in the PR, the problem is that a jump asm goto insn 
is removed without purging the dead edges.  Bootstrapped and tested on 
x86-64.  I had to use dg-prune-output to remove the error message and to 
make the test pass, as we only checking for an ICE.


OK for trunk/4.6/4.5?

Andrey

gcc/
2012-01-19  Andrey Belevantsev  a...@ispras.ru

PR target/51106
	* function.c (instantiate_virtual_regs_in_insn): Use delete_insn_and_edges 
when removing a wrong asm insn.


testsuite/
2012-01-19  Jakub Jelinek  ja...@redhat.com

PR target/51106
* gcc.c-torture/compile/pr51106.c: New test.
diff --git a/gcc/function.c b/gcc/function.c
index fcb79f5..94e51f4 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -1737,7 +1737,7 @@ instantiate_virtual_regs_in_insn (rtx insn)
   if (!check_asm_operands (PATTERN (insn)))
 	{
 	  error_for_asm (insn, impossible constraint in %asm%);
-	  delete_insn (insn);
+	  delete_insn_and_edges (insn);
 	}
 }
   else
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr51106.c b/gcc/testsuite/gcc.c-torture/compile/pr51106.c
new file mode 100644
index 000..2f1c51d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr51106.c
@@ -0,0 +1,19 @@
+/* { dg-options -w } */
+/* { dg-prune-output impossible constraint } */
+int
+foo (int x)
+{
+  asm goto ( : : i (x) : : lab);
+  return 1;
+lab:
+  return 0;
+}
+
+int
+bar (int x)
+{
+  asm goto ( : : i (x) : : lab);
+  __builtin_unreachable ();
+lab:
+  return 0;
+}

[PATCH] Fix PR 51505

2012-01-18 Thread Andrey Belevantsev


Hello,

As discussed in Bugzilla, this is the patch implementing Paolo's suggestion 
of killing REG_EQUAL/REG_EQUIV notes from df_kill_notes.  The code assumes 
there is at most one such note per insn.  Bootstrapped and tested on 
x86-64, ok for trunk?


Andrey

gcc:
2012-01-18  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/51505
* df-problems.c (df_kill_notes): New parameter live.  Update comment.
Remove REG_EQUAL/REG_EQUIV notes referring to dead registers.
(df_note_bb_compute): Update the call to df_kill_notes.

testsuite:
2012-01-18  Andrey Belevantsev  a...@ispras.ru

PR rtl-optimization/51505
* gcc.dg/pr51505.c: New test.
diff --git a/gcc/df-problems.c b/gcc/df-problems.c
index 8928454..f9b0bc1 100644
--- a/gcc/df-problems.c
+++ b/gcc/df-problems.c
@@ -2748,10 +2748,12 @@ df_ignore_stack_reg (int regno ATTRIBUTE_UNUSED)
 
 
 /* Remove all of the REG_DEAD or REG_UNUSED notes from INSN and add
-   them to OLD_DEAD_NOTES and OLD_UNUSED_NOTES.  */
+   them to OLD_DEAD_NOTES and OLD_UNUSED_NOTES.  Remove also
+   REG_EQUAL/REG_EQUIV notes referring to dead pseudos using LIVE
+   as the bitmap of currently live registers.  */
 
 static void
-df_kill_notes (rtx insn)
+df_kill_notes (rtx insn, bitmap live)
 {
   rtx *pprev = REG_NOTES (insn);
   rtx link = *pprev;
@@ -2798,6 +2800,45 @@ df_kill_notes (rtx insn)
 	}
 	  break;
 
+	case REG_EQUAL:
+	case REG_EQUIV:
+	  {
+	/* Remove the notes that refer to dead registers.  As we have at most
+	   one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note
+	   so we need to purge the complete EQ_USES vector when removing
+	   the note using df_notes_rescan.  */
+	df_ref *use_rec;
+	bool deleted = false;
+
+	for (use_rec = DF_INSN_EQ_USES (insn); *use_rec; use_rec++)
+	  {
+		df_ref use = *use_rec;
+		if (DF_REF_REGNO (use)  FIRST_PSEUDO_REGISTER
+		 (DF_REF_FLAGS (use)  DF_REF_IN_NOTE)
+		 ! bitmap_bit_p (live, DF_REF_REGNO (use)))
+		  {
+		deleted = true;
+		break;
+		  }
+	  }
+	if (deleted)
+	  {
+		rtx next;
+#ifdef REG_DEAD_DEBUGGING
+		df_print_note (deleting: , insn, link);
+#endif
+		next = XEXP (link, 1);
+		free_EXPR_LIST_node (link);
+		*pprev = link = next;
+		df_notes_rescan (insn);
+	  }
+	else
+	  {
+		pprev = XEXP (link, 1);
+		link = *pprev;
+	  }
+	break;
+	  }
 	default:
 	  pprev = XEXP (link, 1);
 	  link = *pprev;
@@ -3299,7 +3340,7 @@ df_note_bb_compute (unsigned int bb_index,
   debug_insn = DEBUG_INSN_P (insn);
 
   bitmap_clear (do_not_gen);
-  df_kill_notes (insn);
+  df_kill_notes (insn, live);
 
   /* Process the defs.  */
   if (CALL_P (insn))
diff --git a/gcc/testsuite/gcc.dg/pr51505.c b/gcc/testsuite/gcc.dg/pr51505.c
new file mode 100644
index 000..dbcd322
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr51505.c
@@ -0,0 +1,19 @@
+/* PR rtl-optimization/51505 */
+/* { dg-do compile } */
+/* { dg-options -O --param max-cse-insns=1 } */
+struct S
+{
+char a[256];
+};
+
+int bar(struct S, char[16]);
+
+void foo ()
+{
+  struct S u, s1, s2;
+  char e[256];
+  char i;
+  e[i] = ~s1.a[i]  s2.a[i];
+  if (bar(u, e))
+__builtin_abort ();
+}

Re: [PATCH] Fix PR 51505

2012-01-18 Thread Andrey Belevantsev


On 18.01.2012 21:28, Paolo Bonzini wrote:

On 01/18/2012 05:41 PM, Andrey Belevantsev wrote:
Ok, thanks for working on this.

Installed, do you want this for 4.6/4.5?

Andrey

Re: [PATCH] Copy over REG_ARGS_SIZE notes in sel-sched (PR debug/51557)

2011-12-16 Thread Andrey Belevantsev


On 16.12.2011 11:53, Jakub Jelinek wrote:

On Fri, Dec 16, 2011 at 09:47:52AM +0400, Andrey Belevantsev wrote:

Thus, if an insn has any note that prevents it from being cloned, we
need to fix the above place.  (If an insn should not even be moved
from its block, then CANT_MOVE should be set to 1 in the same
function.)  Then create_copy_of_insn_rtx can safely copy all the
notes as we already know that only clonable insns get to this
function. The REG_EQUAL/REG_EQUIV notes may be dropped -- I thought
these are supposed to be recomputed by df these days, am I wrong?


REG_DEAD/REG_UNUSED notes are recomputed by df, I don't think
anything recomputes REG_EQUAL/REG_EQUIV.

Eric agrees with you, so the patch is OK.  Thanks for the patch and for 
enlightening me.  And please feel free to CC/assign me on any PRs 
mentioning selective scheduler.


Andrey

Re: [PATCH] Copy over REG_ARGS_SIZE notes in sel-sched (PR debug/51557)

2011-12-15 Thread Andrey Belevantsev


On 16.12.2011 0:42, Jakub Jelinek wrote:

On Thu, Dec 15, 2011 at 09:37:45AM -0800, Richard Henderson wrote:

On 12/15/2011 09:32 AM, Jakub Jelinek wrote:

PR debug/51557
* sel-sched-ir.c (create_copy_of_insn_rtx): Copy REG_ARGS_SIZE
notes.

* gcc.dg/pr51557.c: New test.


There are plenty of other notes that could as well need duplication.
E.g. REG_INC, REG_FRAME_RELATED_EXPR, REG_CFA_*, REG_EH_REGION.


I wasn't sure if it is safe to duplicate them.  I guess most of them are
safe, not sure about REG_EQUAL/REG_EQUIV.
I can certainly try to copy all the notes, e.g.
emit_copy_of_insn_after seems to copy all of them but REG_LABEL_OPERAND.
We need to clone only those insns that are safe.  In 
sel-sched-ir.c:init_global_and_expr_for_insn we check a number of 
conditions that indicate the insn being un-copyable (force_unique_p is set 
to true for those; the relevant code starts from line 2954).  Previously I 
fixed this about a year ago wrt prologue and epilogue insns, you can check 
the thread with the http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00013.html 
message.


Thus, if an insn has any note that prevents it from being cloned, we need 
to fix the above place.  (If an insn should not even be moved from its 
block, then CANT_MOVE should be set to 1 in the same function.)  Then 
create_copy_of_insn_rtx can safely copy all the notes as we already know 
that only clonable insns get to this function. The REG_EQUAL/REG_EQUIV 
notes may be dropped -- I thought these are supposed to be recomputed by df 
these days, am I wrong?


Jakub, I am happy to prepare a patch along these lines and to adjust it to 
mark un-copyable the insns with any notes you or Richard think may be 
problematic.  It would also be great to have this information recorded in a 
general helper in rtlanal.c or somewhere, so we don't have to guess every 
time this comes up; what do you think?


Andrey



Jakub

1 2 >

1 - 100 of 117 matches

Mail list logo