Re: [PATCH] Port GCC documentation to Sphinx
Hi Martin, On 29.06.2021 13:09, Martin Liška wrote: > On 6/28/21 5:33 PM, Joseph Myers wrote: >> Are formatted manuals (HTML, PDF, man, info) corresponding to this patch >> version also available for review? > > I've just uploaded them here: > https://splichal.eu/gccsphinx-final/ > > Martin I've randomly looked at the PDF version of the GCC internals manual and the table of contents there only has an introduction and an index (looks like all other chapters went under introduction). Other PDFs have the first level chapters in the contents. Maybe it will be a good idea to include the lower level chapters as well, or at least to fix the gccint one :) Best, Andrey
Fix PR 86979
Hello, As explained in the PR trail, we incorrectly update the availability sets in the rare case of several successors and one of them having another fence. Fixed as follows. Ok for trunk? Best, Andrey 2019-03-15 Andrey Belevantsev PR middle-end/89676 * sel-sched.c (compute_av_set_at_bb_end): When we have an ineligible successor, use NULL as its av set. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 315f2c0c0ab..2053694b196 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -2820,10 +2820,12 @@ compute_av_set_at_bb_end (insn_t insn, ilist_t p, int ws) FOR_EACH_VEC_ELT (sinfo->succs_ok, is, succ) { basic_block succ_bb = BLOCK_FOR_INSN (succ); + av_set_t av_succ = (is_ineligible_successor (succ, p) + ? NULL + : BB_AV_SET (succ_bb)); gcc_assert (BB_LV_SET_VALID_P (succ_bb)); -mark_unavailable_targets (av1, BB_AV_SET (succ_bb), - BB_LV_SET (succ_bb)); +mark_unavailable_targets (av1, av_succ, BB_LV_SET (succ_bb)); } /* Finally, check liveness restrictions on paths leaving the region. */
Re: [PATCH] sched-ebb.c: avoid moving table jumps (PR rtl-optimization/88423)
Hi David, On 18.01.2019 19:58, David Malcolm wrote: > On Fri, 2019-01-18 at 12:32 -0500, David Malcolm wrote: > > [CCing Abel] > >> PR rtl-optimization/88423 reports an ICE within sched-ebb.c's >> begin_move_insn, failing the assertion at line 175, where there's >> no fall-through edge: >> >> 171 rtx_insn *x = NEXT_INSN (insn); >> 172 if (e) >> 173 gcc_checking_assert (NOTE_P (x) || LABEL_P (x)); >> 174 else >> 175 gcc_checking_assert (BARRIER_P (x)); >> >> "insn" is a jump_insn for a table jump, and its NEXT_INSN is the >> placeholder code_label, followed by the jump_table_data. This code was added at the time of our work for the speculation support in ia64. I would guess it was just never designed to support tablejumps, rather the jumps to recovery blocks or some such. But I might be wrong, it was back in 2006. If you look at fix_jump_move, which was added about that time, it doesn't have any traces of tablejumps as well. >> >> It's not clear to me if such a jump_insn can be repositioned within >> the insn stream, or if the scheduler is able to do so. I believe a >> tablejump is always at the end of such a head/tail insn sub-stream. >> Is it a requirement that the placeholder code_label for the jump_insn >> is always its NEXT_INSN? >> >> The loop at the start of schedule_ebb adjusts the head and tail >> of the insns to be scheduled so that it skips leading and trailing >> notes >> and debug insns. >> >> This patch adjusts that loop to also skip trailing jump_insn >> instances >> that are table jumps, so that we don't attempt to move such table >> jumps. >> >> This fixes the ICE, but I'm not very familiar with this part of the >> compiler - so I have two questions: >> >> (1) What does GCC mean by "ebb" in this context? >> >> My understanding is that the normal definition of an "extended basic >> block" (e.g. Muchnick's book pp175-177) is that it's a maximal >> grouping >> of basic blocks where only one BB in each group has multiple in-edges >> and all other BBs in the group have a single in-edge (and thus e.g. >> there's a tree-like structure of BBs within each EBB). >> >> From what I can tell, schedule_ebbs is iterating over BBs, looking >> for >> runs of BBs joined by next_bb that are connected by fallthrough edges >> and don't have labels (and aren't flagged with BB_DISABLE_SCHEDULE). >> It uses this run of BBs to generate a run of instructions within the >> NEXT_INSN/PREV_INSN doubly-linked list, which it passes as "head" >> and "tail" to schedule_ebb. >> >> This sounds like it will be a group of basic blocks with single in- >> edges >> internally, but it isn't a *maximal* group of such BBs - but perhaps >> it's "maximal" in the sense of what the NEXT_INSN/PREV_INSN >> representation can cope with? You are right, it's not a tree structure in the sense of classical EBBs but rather a trace. There was a code to also perform tail duplication in order to have better traces for scheduling, but the corresponding option got deprecated back in 2010. >> >> There (presumably) can't be a fallthrough edge after a table jump, so >> a table jump could only ever be at the end of such a chain, never in >> the >> middle. >> >> (2) Is it OK to omit "tail" from consideration here, from a dataflow >> and insn-dependency point-of-view? Presumably the scheduler is >> written >> to ensure that data used by subsequent basic blocks will still be >> available >> after the insns within an "EBB" are reordered, so presumably any data >> uses *within* the jump_insn are still going to be available - but, as >> I >> said, I'm not very familiar with this part of the code. (I think I'm >> also >> assuming that the jump_insn can't clobber data, just the PC) For that, I'm not sure. Your patch will leave the tablejump unscheduled at all, i.e. any fields like INSN_TICK would be unfilled and thus the later passes like bundling on ia64 will not work. Maybe we can just stop tablejumps from moving within its block, Alexander? Andrey >> >> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. >> >> OK for trunk? >> >> gcc/ChangeLog: >> PR rtl-optimization/88423 >> * sched-ebb.c (schedule_ebb): Don't move the jump_insn for a >> table >> jump. >> >> gcc/testsuite/ChangeLog: >> PR rtl-optimization/88423 >> * gcc.c-torture/compile/pr88423.c: New test. >> --- >> gcc/sched-ebb.c | 4 >> gcc/testsuite/gcc.c-torture/compile/pr88423.c | 5 + >> 2 files changed, 9 insertions(+) >> create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr88423.c >> >> diff --git a/gcc/sched-ebb.c b/gcc/sched-ebb.c >> index d459e09..1fe0b76 100644 >> --- a/gcc/sched-ebb.c >> +++ b/gcc/sched-ebb.c >> @@ -485,6 +485,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, >> bool modulo_scheduling) >> tail = PREV_INSN (tail); >>else if (LABEL_P (head)) >> head = NEXT_INSN (head); >> + else if (tablejump_p (tail, NULL,
Fix PR 85423 (Re: Fix PRs 80463, 83972, 83480)
Hello, So this PR shows that I have incorrectly mirrored the conditional from sched-deps.c that creates the dependence from a debug insn on the previous insn (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463#c3 for the hunk). Thus we have incorrectly discarded some legitimate debug-debug dependencies. The straightforward fix works for all of four PRs, tested on x86-64. I have put the test in gcc.dg though it requires -march=nano. Do you want me to create an extra machine-dependent test? Best, Andrey 2018-04-23 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/85423 * sel-sched-ir.c (has_dependence_note_mem_dep): Only discard dependencies to debug insns when previous insn is non-debug. * gcc.dg/pr85423.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index ee970522890..85ff5bd3eb4 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3308,7 +3308,7 @@ has_dependence_note_dep (insn_t pro, ds_t ds ATTRIBUTE_UNUSED) that a bookkeeping copy should be movable as the original insn. Detect that here and allow that movement if we allowed it before in the first place. */ - if (DEBUG_INSN_P (real_con) + if (DEBUG_INSN_P (real_con) && !DEBUG_INSN_P (real_pro) && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con)) return; diff --git a/gcc/testsuite/gcc.dg/pr85423.c b/gcc/testsuite/gcc.dg/pr85423.c new file mode 100644 index 000..21d4a2eb4b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr85423.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments -fno-guess-branch-probability -fno-peephole2 -fno-ssa-phiopt -fno-tree-pre --param max-jump-thread-duplication-stmts=8 -w" } */ + +int vn, xm; + +void +i1 (int); + +void +mb (int *ap, int ev) +{ + while (vn < 1) +{ + i1 (vn); + + ev += *ap && ++vn; + + while (xm < 1) +++xm; + + if (*ap == 0) +*ap = ev; + + ++vn; +} +}
Add test from PR 83852 (was Re: Fix PR 83962)
On 09.04.2018 12:16, Andrey Belevantsev wrote: > On 06.04.2018 18:59, Alexander Monakov wrote: >> On Tue, 3 Apr 2018, Andrey Belevantsev wrote: >> >>> Hello, >>> >>> This issues is about the correct order in which we need to call the >>> routines that fix up the control flow for us. >> >> OK with formatting both in the new comment and the Changelog fixed. > > Thanks, fixed that in rev. 259229. I've found out that this patch also fixes PR 83852, so I've committed the test from that PR as obvious after verifying that it works on cross-ppc compiler and on x86-64. Andrey Index: gcc.dg/pr83852.c === *** gcc.dg/pr83852.c(revision 0) --- gcc.dg/pr83852.c(revision 259373) *** *** 0 --- 1,33 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-std=gnu99 -O2 -fselective-scheduling -fno-if-conversion -fno-tree-dse -w" } */ + long long int uo; + unsigned int vt; + + void + r5 (long long int h8, long long int pu) + { + short int wj; + long long int *mh = h8; + + for (wj = 0; wj < 3; ++wj) + { + int oq; + long long int ns, xf; + + h8 += 2; + oq = !!h8 && !!wj; + ++uo; + vt ^= oq + uo; + ns = !!uo && !!vt; + xf = (h8 != 0) ? mh : 1; + pu += ns < xf; + } + + for (pu = 0; pu < 1; ++pu) + { + int *sc; + + sc = (int *) + *sc = 0; + } + } Index: ChangeLog === *** ChangeLog (revision 259372) --- ChangeLog (revision 259373) *** *** 1,3 --- 1,8 + 2018-04-13 Andrey Belevantsev <a...@ispras.ru> + + PR rtl-optimization/83852 + * gcc.dg/pr83852.c: New testcase. + 2018-04-13 Andreas Krebbel <kreb...@linux.ibm.com> PR testsuite/85326
Re: [PATCH] sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)
On 12.04.2018 0:55, Alexander Monakov wrote: > As noted in PR 85354, we cannot simply invoke cfg_cleanup after dominators are > computed, because they may become invalid but neither freed nor recomputed, so > this trips checking in flow_loops_find. > > We can move cleanup_cfg earlier (and run it for all sel-sched invocations, not > only when pipelining). OK. Sorry, I should have noticed that before, and our ia64 tester also misses libraries required for graphite. Best, Andrey > > Bootstrapped/regtested on x86_64 and ppc64 (my previous testing missed this > issue: the testcase requires graphite, but libisl wasn't present). > > PR rtl-optimization/85354 > * sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call... > * sel-sched.c (sel_global_init): ... here. > > diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c > index 50a7daafba6..ee970522890 100644 > --- a/gcc/sel-sched-ir.c > +++ b/gcc/sel-sched-ir.c > @@ -30,7 +30,6 @@ along with GCC; see the file COPYING3. If not see > #include "cfgrtl.h" > #include "cfganal.h" > #include "cfgbuild.h" > -#include "cfgcleanup.h" > #include "insn-config.h" > #include "insn-attr.h" > #include "recog.h" > @@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop) > void > sel_init_pipelining (void) > { > - /* Remove empty blocks: their presence can break assumptions elsewhere, > - e.g. the logic to invoke update_liveness_on_insn in sel_region_init. */ > - cleanup_cfg (0); >/* Collect loop information to be used in outer loops pipelining. */ >loop_optimizer_init (LOOPS_HAVE_PREHEADERS > | LOOPS_HAVE_FALLTHRU_PREHEADERS > diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c > index cd29df35666..59762964c6e 100644 > --- a/gcc/sel-sched.c > +++ b/gcc/sel-sched.c > @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3. If not see > #include "tm_p.h" > #include "regs.h" > #include "cfgbuild.h" > +#include "cfgcleanup.h" > #include "insn-config.h" > #include "insn-attr.h" > #include "params.h" > @@ -7661,6 +7662,10 @@ sel_sched_region (int rgn) > static void > sel_global_init (void) > { > + /* Remove empty blocks: their presence can break assumptions elsewhere, > + e.g. the logic to invoke update_liveness_on_insn in sel_region_init. */ > + cleanup_cfg (0); > + >calculate_dominance_info (CDI_DOMINATORS); >alloc_sched_pools (); >
Re: [PATCH] sched-deps: respect deps->readonly in macro-fusion (PR 84566)
On 10.04.2018 13:40, Alexander Monakov wrote: > Hi, > > this fixes a simple "regression" under the qsort_chk umbrella: sched-deps > analysis has deps->readonly flag, but macro-fusion code does not respect it > and mutates instructions. This breaks an assumption in sel_rank_for_schedule > and manifests as qsort checking error. > > Since sched_macro_fuse_insns is only called to set SCHED_GROUP_P on suitable > insns, guard the call with !deps->readonly. > > Bootstrapped/regtested on x86_64 with sel-sched active and > --with-cpu=sandybridge to exercise macro-fusion code and verified on aarch64 > cross-compiler that the failing testcase given in the PR is fixed. > > OK to apply? Fine with me but you need a scheduler maintainer approval. Andrey > > Thanks. > Alexander > > PR rtl-optimization/84566 > * sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking > sched_macro_fuse_insns. > > diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c > index 9a5cbebea40..120b5f0ddc1 100644 > --- a/gcc/sched-deps.c > +++ b/gcc/sched-deps.c > @@ -2897,7 +2897,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, > rtx_insn *insn) >&& code == SET); > >/* Group compare and branch insns for macro-fusion. */ > - if (targetm.sched.macro_fusion_p > + if (!deps->readonly > + && targetm.sched.macro_fusion_p >&& targetm.sched.macro_fusion_p ()) > sched_macro_fuse_insns (insn); > >
Re: [PATCH] sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)
On 10.04.2018 14:09, Alexander Monakov wrote: > Hi, > > The add_branch_dependencies function is fairly unusual in that it creates > dependence edges "out of thin air" for all sorts of instructions preceding > BB end. I think that is really unfortunate (explicit barriers in RTL would > be more natural), but I've already complained about that in the PR. > > The bug/regression is that this function was not run for sel-sched, but the > testcase uncovers that moving a USE away from the return insn can break > assumptions in mode-switching code. > > Solve this by running the first part of add_branch_dependencies where it > sets CANT_MOVE flags on immovable non-branch insns. > > Bootstrapped/regtested on x86_64 with sel-sched active. OK to apply? Looks fine to me but I cannot approve -- maybe Vladimir can take a look? Andrey > > Alexander > > > PR target/84301 > * sched-rgn.c (add_branch_dependences): Move sel_sched_p check here... > (compute_block_dependences): ... from here. > > * gcc.target/i386/pr84301.c: New test. > > diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c > index 8c3a740b70e..3c67fccb9b1 100644 > --- a/gcc/sched-rgn.c > +++ b/gcc/sched-rgn.c > @@ -2497,6 +2497,11 @@ add_branch_dependences (rtx_insn *head, rtx_insn *tail) >while (insn != head && DEBUG_INSN_P (insn)); > } > > + /* Selective scheduling handles control dependencies by itself, and > + CANT_MOVE flags ensure that other insns will be kept in place. */ > + if (sel_sched_p ()) > +return; > + >/* Make sure these insns are scheduled last in their block. */ >insn = last; >if (insn != 0) > @@ -2725,9 +2730,7 @@ compute_block_dependences (int bb) > >sched_analyze (_deps, head, tail); > > - /* Selective scheduling handles control dependencies by itself. */ > - if (!sel_sched_p ()) > -add_branch_dependences (head, tail); > + add_branch_dependences (head, tail); > >if (current_nr_blocks > 1) > propagate_deps (bb, _deps); > diff --git a/gcc/testsuite/gcc.target/i386/pr84301.c > b/gcc/testsuite/gcc.target/i386/pr84301.c > index e69de29bb2d..f1708b8ea6c 100644 > --- a/gcc/testsuite/gcc.target/i386/pr84301.c > +++ b/gcc/testsuite/gcc.target/i386/pr84301.c > @@ -0,0 +1,15 @@ > +/* PR target/84301 */ > +/* { dg-do compile } */ > +/* { dg-options "-march=bdver1 -O1 -fexpensive-optimizations > -fschedule-insns -fselective-scheduling -fno-dce -fno-tree-dce --param > max-pending-list-length=0 --param selsched-max-lookahead=2" } */ > + > +int lr; > +long int xl; > + > +int > +v4 (void) > +{ > + int mp; > + > + ++xl; > + mp = (lr - xl) > 1; > +} >
Re: [PATCH] sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)
Hello, On 10.04.2018 17:40, Alexander Monakov wrote: > Hi, > > We have this code in sel-sched.c sel_region_init(): > > 6918 /* Init correct liveness sets on each instruction of a single-block > loop. > 6919 This is the only situation when we can't update liveness when > calling > 6920 compute_live for the first insn of the loop. */ > 6921 if (current_loop_nest) > 6922 { > 6923 int header = > 6924 (sel_is_loop_preheader_p (BASIC_BLOCK_FOR_FN (cfun, BB_TO_BLOCK > (0))) > 6925 ? 1 > 6926 : 0); > 6927 > 6928 if (current_nr_blocks == header + 1) > 6929 update_liveness_on_insn > 6930 (sel_bb_head (BASIC_BLOCK_FOR_FN (cfun, BB_TO_BLOCK > (header; > 6931 } > > It appears it does not account for presence of empty BBs between the preheader > and the "actual header" BB hosting the first insn of the loop. The testcase in > the PR provides an example where we don't run update_liveness_on_insn here, > but > then remove the empty BB and recurse endlessly in compute_live_after_bb. > > This patch solves this in a brute-but-straightforward manner by invoking > cleanup_cfg just before loop_optimizer_init. > > Bootstrapped/regtested on x86_64 together with the other two patches and also > checked on a powerpc cross-compiler that the testcase is fixed. OK to apply? OK. The comment before cleanup_cfg will not hurt but do as you wish. Thank you, Andrey > > Thanks. > Alexander > > PR rtl-optimization/85659 > * sel-sched-ir.c (sel_init_pipelining): Invoke cleanup_cfg. > > * gcc.dg/pr84659.c: New test. > > > diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c > index ee970522890..aef380a7e80 100644 > --- a/gcc/sel-sched-ir.c > +++ b/gcc/sel-sched-ir.c > @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. If not see > #include "cfgrtl.h" > #include "cfganal.h" > #include "cfgbuild.h" > +#include "cfgcleanup.h" > #include "insn-config.h" > #include "insn-attr.h" > #include "recog.h" > @@ -6121,6 +6122,7 @@ make_regions_from_loop_nest (struct loop *loop) > void > sel_init_pipelining (void) > { > + cleanup_cfg (0); >/* Collect loop information to be used in outer loops pipelining. */ >loop_optimizer_init (LOOPS_HAVE_PREHEADERS > | LOOPS_HAVE_FALLTHRU_PREHEADERS > diff --git a/gcc/testsuite/gcc.dg/pr84659.c b/gcc/testsuite/gcc.dg/pr84659.c > index e69de29bb2d..94c885f3869 100644 > --- a/gcc/testsuite/gcc.dg/pr84659.c > +++ b/gcc/testsuite/gcc.dg/pr84659.c > @@ -0,0 +1,19 @@ > +/* PR rtl-optimization/84659 */ > +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ > +/* { dg-options "-O2 -fselective-scheduling -fsel-sched-pipelining > -fno-split-wide-types -fno-strict-aliasing -fno-tree-dce" } */ > + > +void > +jk (int **lq, int *k4, long long int qv, int od) > +{ > + while (**lq < 1) > +{ > + int uo; > + > + uo = ((od == 0) ? qv : *k4) != 1; > + ++**lq; > +} > + > + for (;;) > +{ > +} > +} >
Re: Fix PRs 80463, 83972, 83480
On 09.04.2018 21:23, Jakub Jelinek wrote: > On Mon, Apr 09, 2018 at 01:30:12PM +0300, Andrey Belevantsev wrote: >> I think that should be fine, that is, as long as the insn moved up through >> all those debug insns, the copy will do that as well. It's that >> problematic conditional in sched-deps.c that we should take care of. >> >> I've reworded the comment and committed the attached patch. Thanks for >> your help. > > The C++ testcase FAILs everywhere: > FAIL: g++.dg/pr80463.C -std=gnu++98 (test for excess errors) > Excess errors: > cc1plus: warning: var-tracking-assignments changes selective scheduling > > The other testcases in the patch used -w probably to disable the same > warning, so I've committed following as obvious to trunk after regtesting it > on x86_64-linux and i686-linux: Thank you very much, I have missed this when committing. Sorry for the noise. Andrey > > 2018-04-09 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/80463 > * g++.dg/pr80463.C: Add -w to dg-options. > > --- gcc/testsuite/g++.dg/pr80463.C.jj 2018-04-09 20:15:47.226631780 +0200 > +++ gcc/testsuite/g++.dg/pr80463.C2018-04-09 20:19:43.783616136 +0200 > @@ -1,5 +1,5 @@ > /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ > -/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments" > } */ > +/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments > -w" } */ > > int *a; > int b, c; > > > Jakub >
Re: Fix PRs 80463, 83972, 83480
On 06.04.2018 19:18, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> In these PRs we deal with the dependencies we are forced to make between a >> debug insn and its previous insn (unless bb head). In the latter case, if >> such an insn has been moved, we fixup its movement so it aligns with the >> sel-sched invariants. We also carefully adjust seqnos in the case we had a >> single debug insn left in the block. > > This is OK with overlong lines fixed and the new comment reworded for clarity > (see below). > >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev <a...@ispras.ru> >> >> PR rtl-optimization/80463 >> PR rtl-optimization/83972 >> PR rtl-optimization/83480 >> >> * sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the >> correct producer for the insn. >> (tidy_control_flow): Fixup seqnos in case of debug insns. >> >> * gcc.dg/pr80463.c: New test. >> * g++.dg/pr80463.C: Likewise. >> * gcc.dg/pr83972.c: Likewise. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem >> ATTRIBUTE_UNUSED, >> >> /* Note a dependence. */ >> static void >> -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED, >> +has_dependence_note_dep (insn_t pro, >> ds_t ds ATTRIBUTE_UNUSED) >> { >> - if (!sched_insns_conditions_mutex_p (has_dependence_data.pro, >> - VINSN_INSN_RTX >> (has_dependence_data.con))) >> + insn_t real_pro = has_dependence_data.pro; >> + insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con); >> + >> + /* We do not allow for debug insns to move through others unless they >> + are at the start of bb. Such insns may create bookkeeping copies >> + that would not be able to move up breaking sel-sched invariants. > > I have trouble parsing this, it seems a word is accidentally between "move up" > and "breaking". Also the "such" is a bit ambiguous. > >> + Detect that here and allow that movement if we allowed it before >> + in the first place. */ >> + if (DEBUG_INSN_P (real_con) >> + && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con)) >> +return; > > Should we be concerned about debug insns appearing in sequence here? E.g. if > pro and real_con are not consecutive, but all insns in between are debug > insns? I think that should be fine, that is, as long as the insn moved up through all those debug insns, the copy will do that as well. It's that problematic conditional in sched-deps.c that we should take care of. I've reworded the comment and committed the attached patch. Thanks for your help. Andrey > >> + >> + if (!sched_insns_conditions_mutex_p (real_pro, real_con)) >> { >>ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where]; >> >> @@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying) >> >>gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU); >> >> + /* We could have skipped some debug insns which did not get removed >> with the block, >> + and the seqnos could become incorrect. Fix them up here. */ >> + if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first || >> sel_bb_end (xbb) != last)) >> + { >> + if (!sel_bb_empty_p (xbb->prev_bb)) >> + { >> + int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb)); >> + if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb))) >> + for (insn_t insn = sel_bb_head (xbb); insn != first; insn = >> NEXT_INSN (insn)) >> + INSN_SEQNO (insn) = prev_seqno + 1; >> + } >> + } >> + >>/* It can turn out that after removing unused jump, basic block >> that contained that jump, becomes empty too. In such case >> remove it too. */ >> diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C >> new file mode 100644 >> index 000..5614c28ca45 >> --- /dev/null >> +++ b/gcc/testsuite/g++.dg/pr80463.C >> @@ -0,0 +1,20 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-g -fselective-scheduling
Re: Fix PR 83913
On 06.04.2018 19:10, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issue ended up being fixed the way different from described in the PR. >> We do not want to walk away from the invariant "zero SCHED_TIMES -- insn >> is not scheduled" even for bookkeeping copies (testing showed it trips over >> asserts designed to catch this). Rather we choose merging exprs in the way >> the larger sched-times wins. > > My understanding is this is not a "complete" solution to the problem, and a > chance for a similar blowup on some other testcase remains. Still, avoiding > picking the minimum sched-times value should be a good mitigation. Well, it's not much different with any other situation when we pose a limit on pipelining with the sched-times values. At least for now I can't think of something better. Adjusted the comment as per your suggestion and committed the attached patch. Andrey > > Please consider adding a comment that the average sched-times value is taken > as a compromise to thwart "endless" pipelining of bookkeeping-producing insns > available anywhere vs. pipelining of useful insns, or something like that? > > OK with that considered/added. > >> >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev <a...@ispras.ru> >> >> PR rtl-optimization/83913 >> >> * sel-sched-ir.c (merge_expr_data): Choose the middle between two >> different sched-times >> when merging exprs. >> >> * gcc.dg/pr83913.c: New test. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t >> split_point) >>if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from)) >> EXPR_PRIORITY (to) = EXPR_PRIORITY (from); >> >> - if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from)) >> -EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from); >> + if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from)) >> +EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES >> (to) >> + + 1) / 2); >> >>if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from)) >> EXPR_ORIG_BB_INDEX (to) = 0; >> @@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem >> ATTRIBUTE_UNUSED, >> >> /* Note a dependence. */ >> static void >> diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c >> new file mode 100644 >> index 000..c898d71a261 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr83913.c >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling >> -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate >> -fno-rerun-cse-after-loop -fno-web" } */ >> + >> +int jo, z4; >> + >> +int >> +be (long unsigned int l7, int nt) >> +{ >> + int en; >> + >> + jo = l7; >> + for (en = 0; en < 24; ++en) >> +{ >> + jo = (jo / z4) * (!!jo >= ((!!nt) & 2)); >> + if (jo == 0) >> +nt = 0; >> + else >> +{ >> + nt = z4; >> + ++z4; >> + nt = (long unsigned int) nt == (l7 + 1); >> +} >> +} >> + >> + return nt; >> +} >> >> >> Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 259229) --- gcc/ChangeLog (revision 259230) *** *** 1,5 --- 1,12 2018-04-09 Andrey Belevantsev <a...@ispras.ru> + PR rtl-optimization/83913 + + * sel-sched-ir.c (merge_expr_data): Choose the middle between two + different sched-times when merging exprs. + + 2018-04-09 Andrey Belevantsev <a...@ispras.ru> + PR rtl-optimization/83962 * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call Index: gcc/testsuite/gcc.dg/pr83913.c === *** gcc/testsuite/gcc.dg/pr83913.c (revision 0) --- gcc/testsuite/gcc.dg/pr83913.c (revision 259230) *** *** 0 --- 1,26 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O2 -funroll-all-loops -fselective-scheduling -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-for
Re: Fix PR 83962
On 06.04.2018 18:59, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issues is about the correct order in which we need to call the >> routines that fix up the control flow for us. > > OK with formatting both in the new comment and the Changelog fixed. Thanks, fixed that in rev. 259229. Andrey > >> Best, >> Andrey >> >> 2018-04-03 Andrey Belevantsev <a...@ispras.ru> >> >> PR rtl-optimization/83962 >> >> * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call >> tidy_fallthru_edge >> and tidy_control_flow. >> >> * gcc.dg/pr83962.c: New test. >> >> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c >> index a965d2ec42f..f6de96a7f3d 100644 >> --- a/gcc/sel-sched-ir.c >> +++ b/gcc/sel-sched-ir.c >> @@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying) >>&& INSN_SCHED_TIMES (BB_END (xbb)) == 0 >>&& !IN_CURRENT_FENCE_P (BB_END (xbb))) >> { >> - if (sel_remove_insn (BB_END (xbb), false, false)) >> -return true; >> + /* We used to call sel_remove_insn here that can trigger >> tidy_control_flow >> + before we fix up the fallthru edge. Correct that ordering by >> +explicitly doing the latter before the former. */ >> + clear_expr (INSN_EXPR (BB_END (xbb))); >>tidy_fallthru_edge (EDGE_SUCC (xbb, 0)); >> + if (tidy_control_flow (xbb, false)) >> + return true; >> } >> >>first = sel_bb_head (xbb); >> diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c >> new file mode 100644 >> index 000..0547e218715 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr83962.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ >> +/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2 >> -fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */ >> +unsigned int ca; >> + >> +void >> +v6 (long long unsigned int as, int p9) >> +{ >> + while (p9 < 1) >> +as = (as != ca) || (as > 1); >> +} >>
Re: Fix PR 83530
On 06.04.2018 18:55, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issue is when we cannot correctly make insn tick data for the >> unscheduled code (but processed by the modulo scheduler). Fixed by closely >> following usual scheduling process as described in the PR. > > This is ok with the following nit-picks fixed. Thank, I've committed the attached. Andrey > >> 2018-04-03 Andrey Belevantsev <a...@ispras.ru> >> >> PR rtl-optimization/83530 >> >> * sel-sched.c (force_next_insn): New global variable. >> (remove_insn_for_debug): When force_next_insn is true, also leave only >> next insn >> in the ready list. >> (sel_sched_region): When the region wasn't scheduled, make another pass >> over it >> with force_next_insn set to 1. > > Overlong lines. > >> * gcc.dg/pr8350.c: New test. > > Typo in test name. > >> --- a/gcc/sel-sched.c >> +++ b/gcc/sel-sched.c >> @@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying) >> distinguishing between bookkeeping copies and original insns. */ >> static int max_uid_before_move_op = 0; >> >> +/* When true, we're always scheduling next insn on the already scheduled >> code >> + to get the right insn data for the following bundling or other passes. >> */ >> +static int force_next_insn = 0; >> + >> /* Remove from AV_VLIW_P all instructions but next when debug counter >> tells us so. Next instruction is fetched from BNDS. */ >> static void >> remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p) >> { >> - if (! dbg_cnt (sel_sched_insn_cnt)) >> + if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn) >> /* Leave only the next insn in av_vliw. */ >> { >>av_set_iterator av_it; >> @@ -7642,7 +7646,13 @@ sel_sched_region (int rgn) >> sel_sched_region_1 (); >>else >> /* Force initialization of INSN_SCHED_CYCLEs for correct bundling. */ > > I believe this comment needs updating. > > Please also consider moving both assignments of reset_sched_cycles_p to > after the if-else statement, just before sel_region_finish call. > >> -reset_sched_cycles_p = true; >> +{ >> + reset_sched_cycles_p = false; >> + pipelining_p = false; >> + force_next_insn = 1; >> + sel_sched_region_1 (); >> + force_next_insn = 0; >> +} >> >>sel_region_finish (reset_sched_cycles_p); >> } Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 259227) --- gcc/ChangeLog (revision 259228) *** *** 1,3 --- 1,13 + 2018-04-09 Andrey Belevantsev <a...@ispras.ru> + + PR rtl-optimization/83530 + + * sel-sched.c (force_next_insn): New global variable. + (remove_insn_for_debug): When force_next_insn is true, also leave only + next insn in the ready list. + (sel_sched_region): When the region wasn't scheduled, make another pass + over it with force_next_insn set to 1. + 2018-04-08 Monk Chiang <sh.chian...@gmail.com> * config.gcc (nds32le-*-*, nds32be-*-*): Add nds32/nds32_intrinsic.h Index: gcc/testsuite/gcc.dg/pr83530.c === *** gcc/testsuite/gcc.dg/pr83530.c (revision 0) --- gcc/testsuite/gcc.dg/pr83530.c (revision 259228) *** *** 0 --- 1,15 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */ + int vm, z0; + short int mz; + + int + ny (void) + { + int ch; + + for (ch = 0; ch < 6; ++ch) + vm += ch / vm; + + return z0 + mz; + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 259227) --- gcc/testsuite/ChangeLog (revision 259228) *** *** 1,3 --- 1,8 + 2018-04-09 Andrey Belevantsev <a...@ispras.ru> + + PR rtl-optimization/83530 + * gcc.dg/pr83530.c: New test. + 2018-04-07 Thomas Koenig <tkoe...@gcc.gnu.org> PR middle-end/82976 Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 259227) --- gcc/sel-sched.c (revision 259228) *** remove_temp_moveop_nops (bool full_tidyi *** 5004,5015 distinguishing between bookkeeping copies and original insns. */ static int max_uid_before_move_op = 0; /* Remove from AV_VLIW_P
Re: Fix PR 83913
On 03.04.2018 19:02, Alexander Monakov wrote: > On Tue, 3 Apr 2018, Andrey Belevantsev wrote: > >> Hello, >> >> This issue ended up being fixed the way different from described in the PR. >> We do not want to walk away from the invariant "zero SCHED_TIMES -- insn >> is not scheduled" even for bookkeeping copies (testing showed it trips over >> asserts designed to catch this). Rather we choose merging exprs in the way >> the larger sched-times wins. > > ... but the Changelog and the actual patch take the average rather than the > maximum sched-time? :) I believe either way would be acceptable, but please > clarify the intent. Sorry, the average is the intent. Just to have a bit more of pipelining chances. Andrey > > Alexander >
Fix PRs 80463, 83972, 83480
Hello, In these PRs we deal with the dependencies we are forced to make between a debug insn and its previous insn (unless bb head). In the latter case, if such an insn has been moved, we fixup its movement so it aligns with the sel-sched invariants. We also carefully adjust seqnos in the case we had a single debug insn left in the block. Best, Andrey 2018-04-03 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/80463 PR rtl-optimization/83972 PR rtl-optimization/83480 * sel-sched-ir.c (has_dependence_note_mem_dep): Take into account the correct producer for the insn. (tidy_control_flow): Fixup seqnos in case of debug insns. * gcc.dg/pr80463.c: New test. * g++.dg/pr80463.C: Likewise. * gcc.dg/pr83972.c: Likewise. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index a965d2ec42f..f6de96a7f3d 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3293,11 +3293,22 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED, /* Note a dependence. */ static void -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED, +has_dependence_note_dep (insn_t pro, ds_t ds ATTRIBUTE_UNUSED) { - if (!sched_insns_conditions_mutex_p (has_dependence_data.pro, - VINSN_INSN_RTX (has_dependence_data.con))) + insn_t real_pro = has_dependence_data.pro; + insn_t real_con = VINSN_INSN_RTX (has_dependence_data.con); + + /* We do not allow for debug insns to move through others unless they + are at the start of bb. Such insns may create bookkeeping copies + that would not be able to move up breaking sel-sched invariants. + Detect that here and allow that movement if we allowed it before + in the first place. */ + if (DEBUG_INSN_P (real_con) + && INSN_UID (NEXT_INSN (pro)) == INSN_UID (real_con)) +return; + + if (!sched_insns_conditions_mutex_p (real_pro, real_con)) { ds_t *dsp = _dependence_data.has_dep_p[has_dependence_data.where]; @@ -3890,6 +3905,19 @@ tidy_control_flow (basic_block xbb, bool full_tidying) gcc_assert (EDGE_SUCC (xbb->prev_bb, 0)->flags & EDGE_FALLTHRU); + /* We could have skipped some debug insns which did not get removed with the block, + and the seqnos could become incorrect. Fix them up here. */ + if (MAY_HAVE_DEBUG_INSNS && (sel_bb_head (xbb) != first || sel_bb_end (xbb) != last)) + { + if (!sel_bb_empty_p (xbb->prev_bb)) + { + int prev_seqno = INSN_SEQNO (sel_bb_end (xbb->prev_bb)); + if (prev_seqno > INSN_SEQNO (sel_bb_head (xbb))) + for (insn_t insn = sel_bb_head (xbb); insn != first; insn = NEXT_INSN (insn)) + INSN_SEQNO (insn) = prev_seqno + 1; + } + } + /* It can turn out that after removing unused jump, basic block that contained that jump, becomes empty too. In such case remove it too. */ diff --git a/gcc/testsuite/g++.dg/pr80463.C b/gcc/testsuite/g++.dg/pr80463.C new file mode 100644 index 000..5614c28ca45 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr80463.C @@ -0,0 +1,20 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-g -fselective-scheduling2 -O2 -fvar-tracking-assignments" } */ + +int *a; +int b, c; +void +d () +{ + for (int e; c; e++) +switch (e) + { + case 0: + a[e] = 1; + case 1: + b = 2; + break; + default: + a[e] = 3; + } +} diff --git a/gcc/testsuite/gcc.dg/pr80463.c b/gcc/testsuite/gcc.dg/pr80463.c new file mode 100644 index 000..cebf2fef1f3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80463.c @@ -0,0 +1,54 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-g -O2 -fvar-tracking-assignments -fselective-scheduling2 -ftree-loop-vectorize -fnon-call-exceptions -fno-tree-vrp -fno-gcse-lm -fno-tree-loop-im -fno-reorder-blocks-and-partition -fno-reorder-blocks -fno-move-loop-invariants -w" } */ + +short int t2; +int cd, aa, ft; + +void +dh (void) +{ + int qs = 0; + + if (t2 < 1) +{ + int bq = 0; + + while (bq < 1) +{ +} + + while (t2 < 1) +{ + if (t2 == 0) +{ + bq = 0; + cd = !!cd; +} + else +{ + bq = 1; + cd = bq > qs; +} + + t2 += cd; + bq = (t2 / qs) == bq; + + if (aa != ft) +{ + qs %= 0; + while (bq != 0) +{ + ro: + ; +} +} + + ++t2; +} + + ia: + goto ro; +} + + goto ia; +} diff --git a/gcc/testsuite/gcc.dg/pr83972.c b/gcc/testsuite/gcc.dg/pr83972.c new file mode 100644 index 000..b8de42cef0a ---
Fix PR 83913
Hello, This issue ended up being fixed the way different from described in the PR. We do not want to walk away from the invariant "zero SCHED_TIMES -- insn is not scheduled" even for bookkeeping copies (testing showed it trips over asserts designed to catch this). Rather we choose merging exprs in the way the larger sched-times wins. Best, Andrey 2018-04-03 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/83913 * sel-sched-ir.c (merge_expr_data): Choose the middle between two different sched-times when merging exprs. * gcc.dg/pr83913.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index a965d2ec42f..f6de96a7f3d 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -1837,8 +1837,9 @@ merge_expr_data (expr_t to, expr_t from, insn_t split_point) if (EXPR_PRIORITY (to) < EXPR_PRIORITY (from)) EXPR_PRIORITY (to) = EXPR_PRIORITY (from); - if (EXPR_SCHED_TIMES (to) > EXPR_SCHED_TIMES (from)) -EXPR_SCHED_TIMES (to) = EXPR_SCHED_TIMES (from); + if (EXPR_SCHED_TIMES (to) != EXPR_SCHED_TIMES (from)) +EXPR_SCHED_TIMES (to) = ((EXPR_SCHED_TIMES (from) + EXPR_SCHED_TIMES (to) + + 1) / 2); if (EXPR_ORIG_BB_INDEX (to) != EXPR_ORIG_BB_INDEX (from)) EXPR_ORIG_BB_INDEX (to) = 0; @@ -3293,11 +3294,22 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED, /* Note a dependence. */ static void diff --git a/gcc/testsuite/gcc.dg/pr83913.c b/gcc/testsuite/gcc.dg/pr83913.c new file mode 100644 index 000..c898d71a261 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr83913.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -funroll-all-loops -fselective-scheduling -fsel-sched-pipelining -fschedule-insns -fno-dce -fno-forward-propagate -fno-rerun-cse-after-loop -fno-web" } */ + +int jo, z4; + +int +be (long unsigned int l7, int nt) +{ + int en; + + jo = l7; + for (en = 0; en < 24; ++en) +{ + jo = (jo / z4) * (!!jo >= ((!!nt) & 2)); + if (jo == 0) +nt = 0; + else +{ + nt = z4; + ++z4; + nt = (long unsigned int) nt == (l7 + 1); +} +} + + return nt; +}
Fix PR 83962
Hello, This issues is about the correct order in which we need to call the routines that fix up the control flow for us. Best, Andrey 2018-04-03 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/83962 * sel-sched-ir.c (tidy_control_flow): Correct the order in which we call tidy_fallthru_edge and tidy_control_flow. * gcc.dg/pr83962.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index a965d2ec42f..f6de96a7f3d 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3839,9 +3839,13 @@ tidy_control_flow (basic_block xbb, bool full_tidying) && INSN_SCHED_TIMES (BB_END (xbb)) == 0 && !IN_CURRENT_FENCE_P (BB_END (xbb))) { - if (sel_remove_insn (BB_END (xbb), false, false)) -return true; + /* We used to call sel_remove_insn here that can trigger tidy_control_flow + before we fix up the fallthru edge. Correct that ordering by +explicitly doing the latter before the former. */ + clear_expr (INSN_EXPR (BB_END (xbb))); tidy_fallthru_edge (EDGE_SUCC (xbb, 0)); + if (tidy_control_flow (xbb, false)) + return true; } first = sel_bb_head (xbb); diff --git a/gcc/testsuite/gcc.dg/pr83962.c b/gcc/testsuite/gcc.dg/pr83962.c new file mode 100644 index 000..0547e218715 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr83962.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-std=gnu99 -O1 -fselective-scheduling2 -fschedule-insns2 -fcse-follow-jumps -fno-ssa-phiopt -fno-guess-branch-probability" } */ +unsigned int ca; + +void +v6 (long long unsigned int as, int p9) +{ + while (p9 < 1) +as = (as != ca) || (as > 1); +}
Fix PR 83530
Hello, This issue is when we cannot correctly make insn tick data for the unscheduled code (but processed by the modulo scheduler). Fixed by closely following usual scheduling process as described in the PR. Best, Andrey 2018-04-03 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/83530 * sel-sched.c (force_next_insn): New global variable. (remove_insn_for_debug): When force_next_insn is true, also leave only next insn in the ready list. (sel_sched_region): When the region wasn't scheduled, make another pass over it with force_next_insn set to 1. * gcc.dg/pr8350.c: New test. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 76092f9587a..fca2b69c5ee 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -5004,12 +5004,16 @@ remove_temp_moveop_nops (bool full_tidying) distinguishing between bookkeeping copies and original insns. */ static int max_uid_before_move_op = 0; +/* When true, we're always scheduling next insn on the already scheduled code + to get the right insn data for the following bundling or other passes. */ +static int force_next_insn = 0; + /* Remove from AV_VLIW_P all instructions but next when debug counter tells us so. Next instruction is fetched from BNDS. */ static void remove_insns_for_debug (blist_t bnds, av_set_t *av_vliw_p) { - if (! dbg_cnt (sel_sched_insn_cnt)) + if (! dbg_cnt (sel_sched_insn_cnt) || force_next_insn) /* Leave only the next insn in av_vliw. */ { av_set_iterator av_it; @@ -7642,7 +7646,13 @@ sel_sched_region (int rgn) sel_sched_region_1 (); else /* Force initialization of INSN_SCHED_CYCLEs for correct bundling. */ -reset_sched_cycles_p = true; +{ + reset_sched_cycles_p = false; + pipelining_p = false; + force_next_insn = 1; + sel_sched_region_1 (); + force_next_insn = 0; +} sel_region_finish (reset_sched_cycles_p); } diff --git a/gcc/testsuite/gcc.dg/pr83530.c b/gcc/testsuite/gcc.dg/pr83530.c new file mode 100644 index 000..f4d8927de92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr83530.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmodulo-sched -fselective-scheduling2" } */ +int vm, z0; +short int mz; + +int +ny (void) +{ + int ch; + + for (ch = 0; ch < 6; ++ch) +vm += ch / vm; + + return z0 + mz; +}
Selective scheduling fixes
Hello, I will post patches to various recent selective scheduling PRs as replies to this mail. The patches have been tested on x86-64 (default languages) and ia64 (c.c++), in case of ppc issues I've checked on the ppc cross compiler. I will also run the ia64 boostrap with enabled sel-sched but it will take more time. Best, Andrey
Re: [PATCH] sel-sched: fix zero-usefulness case in sel_rank_for_schedule (PR 83513)
On 25.12.2017 19:47, Alexander Monakov wrote: Hello, we need the following follow-up fix for priority comparison in sel_rank_for_schedule as demonstrated by PR 83513. Checked on x86_64 by running a bootstrap and also checking for no regressions in make -k check-gcc RUNTESTFLAGS="--target_board=unix/-fselective-scheduling/-fschedule-insns" OK to apply? Yes. Andrey PR rtl-optimization/83513 * sel-sched.c (sel_rank_for_schedule): Order by non-zero usefulness before priority comparison. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index c1be0136551..be3813717ba 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -3396,17 +3396,22 @@ sel_rank_for_schedule (const void *x, const void *y) else if (control_flow_insn_p (tmp2_insn) && !control_flow_insn_p (tmp_insn)) return 1; + /* Prefer an expr with non-zero usefulness. */ + int u1 = EXPR_USEFULNESS (tmp), u2 = EXPR_USEFULNESS (tmp2); + + if (u1 == 0) +{ + if (u2 == 0) +u1 = u2 = 1; + else +return 1; +} + else if (u2 == 0) +return -1; + /* Prefer an expr with greater priority. */ - if (EXPR_USEFULNESS (tmp) != 0 || EXPR_USEFULNESS (tmp2) != 0) -{ - int p2 = EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2), - p1 = EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp); - - val = p2 * EXPR_USEFULNESS (tmp2) - p1 * EXPR_USEFULNESS (tmp); -} - else -val = EXPR_PRIORITY (tmp2) - EXPR_PRIORITY (tmp) - + EXPR_PRIORITY_ADJ (tmp2) - EXPR_PRIORITY_ADJ (tmp); + val = (u2 * (EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2)) + - u1 * (EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp))); if (val) return val;
Re: [PATCH 6/9] sel-sched: Don't mess with register restores
Hello, On 01.08.2016 4:42, Segher Boessenkool wrote: > If selective scheduling copies register restores it confuses dwarf2cfi. > > 2016-06-07 Segher Boessenkool> > * sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy > instructions with a REG_CFA_RESTORE note. OK from sel-sched POV. Best, Andrey > --- > gcc/sel-sched-ir.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c > index 83f813a..4a3984a 100644 > --- a/gcc/sel-sched-ir.c > +++ b/gcc/sel-sched-ir.c > @@ -3015,6 +3015,7 @@ init_global_and_expr_for_insn (insn_t insn) >/* TRAP_IF though have an INSN code is control_flow_insn_p (). */ >|| control_flow_insn_p (insn) >|| volatile_insn_p (PATTERN (insn)) > + || find_reg_note (insn, REG_CFA_RESTORE, NULL) >|| (targetm.cannot_copy_insn_p >&& targetm.cannot_copy_insn_p (insn))) > force_unique_p = true; >
Re: Various selective scheduling fixes
Hi Christophe, On 01.04.2016 10:33, Christophe Lyon wrote: On 31 March 2016 at 16:43, Andrey Belevantsev <a...@ispras.ru> wrote: Hello, On 14.03.2016 12:10, Andrey Belevantsev wrote: Hello, In this thread I will be posting the patches for the fixed selective scheduling PRs (except the one that was already kindly checked in by Jeff). The patches were tested both on x86-64 and ia64 with the following combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the bootstrap/regtest with the second scheduler forced to sel-sched; 3) both schedulers forced to sel-sched. In all cases everything seemed to be fine. Three of the PRs are regressions, the other two showed different errors across the variety of releases tested by submitters; I think all of them are appropriate at this stage -- they do not touch anything outside of selective scheduling except the first patch where a piece of code from sched-deps.c needs to be refactored into a function to be called from sel-sched.c. I've backported all regression PRs to gcc-5-branch after testing there again with selective scheduling force enabled: PRs 64411, 0, 69032, 69102. The first one was not marked as a regression as such but the test for PR 70292, which is duplicate, works for me on gcc 5.1 thus making it a regression, too. Hi, The backport for pr69102 shows that the new testcase fails to compile (ICE) when GCC is configured as: --target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm --with-cpu=cortex-a15 --with-fpu=neon-vfpv4 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c: In function 'foo': /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1: internal compiler error: Segmentation fault 0xa64d15 crash_signal /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383 0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int) /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752 0xa31cd2 invoke_dfa_lookahead_guard /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212 0xa31cd2 find_best_expr /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415 0xa343fb fill_insns /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570 0xa343fb schedule_on_fences /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395 0xa36010 sel_sched_region_2 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533 0xa36f2a sel_sched_region_1 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575 0xa36f2a sel_sched_region(int) /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676 0xa37589 run_selective_scheduling() /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752 0xa14aed rest_of_handle_sched2 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647 0xa14aed execute /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791 See http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt Can you have a look? That's because A15 is the only place which enables autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook. But autoprefetch modeling doesn't work for selective scheduling, it uses haifa structures that are not kept up to date during sel-sched. So this is not supposed to work as soon as the param value for prefetcher lookahead depth is positive. The following patch works for me. Could you check it with your testing? If it works fine for you, I would install the patch both for trunk and gcc-5. It would be great to force sel-sched to be enabled, too. I could do that but I don't have the hardware or cross-arm target tools at the moment. * haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Disable for selective scheduler. Best, Andrey Christophe Andrey Andrey diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index ad2450b..c790830 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -5691,6 +5691,10 @@ autopref_multipass_dfa_lookahead_guard (rtx_insn *insn1, int ready_index) { int r = 0; + /* Autoprefetcher modeling is not supported by selective scheduler. */ + if (sel_sched_p ()) +return 0; + if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) <= 0) return 0;
Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly
Hello, On 12.03.2016 20:13, Jeff Law wrote: As Andrey outlined in the PR, selective-scheduling was missing a check & handling of hard registers in modes that span more than one hard reg. This caused an incorrect register selection during renaming. I verified removing the printf call from the test would not compromise the test. Then I did a normal x86 bootstrap & regression test with the patch. Of course that's essentially useless, so I also did another bootstrap and regression test with -fselective-scheduling in BOOT_CFLAGS with and without this patch. In both cases there were no regressions. I'm installing Andrey's patch on the trunk. I'm not sure this is worth addressing in gcc-5. I've looked at the patch again and as it fixes general code and has a regression marker I've included it in the bunch of other PRs that were backported to gcc-5. I forgot you were hesitant putting it to gcc-5 though :) So I can revert it from the branch if you want me to. Andrey Jeff
Re: Various selective scheduling fixes
Hello, On 14.03.2016 12:10, Andrey Belevantsev wrote: Hello, In this thread I will be posting the patches for the fixed selective scheduling PRs (except the one that was already kindly checked in by Jeff). The patches were tested both on x86-64 and ia64 with the following combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the bootstrap/regtest with the second scheduler forced to sel-sched; 3) both schedulers forced to sel-sched. In all cases everything seemed to be fine. Three of the PRs are regressions, the other two showed different errors across the variety of releases tested by submitters; I think all of them are appropriate at this stage -- they do not touch anything outside of selective scheduling except the first patch where a piece of code from sched-deps.c needs to be refactored into a function to be called from sel-sched.c. I've backported all regression PRs to gcc-5-branch after testing there again with selective scheduling force enabled: PRs 64411, 0, 69032, 69102. The first one was not marked as a regression as such but the test for PR 70292, which is duplicate, works for me on gcc 5.1 thus making it a regression, too. Andrey Andrey
[committed] Add patch for PR 70292
Hello, PR 70292 turned out to be the duplicate of PR 64411, which is already fixed on trunk, but still the testcase from the PR is worth adding. It can be tortured instead of being a target test, and also this PR showed that 64411 is a regression and should be backported to gcc-5-branch. Committed the test to trunk. I don't think we need it for gcc-5 though. Andrey Index: gcc/testsuite/gcc.dg/pr70292.c === *** gcc/testsuite/gcc.dg/pr70292.c (revision 0) --- gcc/testsuite/gcc.dg/pr70292.c (revision 234627) *** *** 0 --- 1,12 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options "-O1 -fno-inline -fno-dce -fschedule-insns -fselective-scheduling -fno-tree-dce" } */ + + void bar() {} + + int t106_1mul(unsigned int x, unsigned int y) { + int r; + if (__builtin_mul_overflow(x, y, )) { + bar(); + } + return r; + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 234626) --- gcc/testsuite/ChangeLog (revision 234627) *** *** 1,3 --- 1,8 + 2016-03-31 Andrey Belevantsev <a...@ispras.ru> + + PR target/70292 + * gcc.c-torture/pr70292.c: New test. + 2016-03-31 Marek Polacek <pola...@redhat.com> PR c/70297
Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly
On 14.03.2016 21:39, Jeff Law wrote: On 03/14/2016 03:56 AM, Andrey Belevantsev wrote: Thank you for checking this in. I've also tested this patch in the similar way (forcing selective scheduling for 2nd and both schedulers) both on x86-64 and ia64. I've posted the patches for remaining sel-sched PRs just now -- it took some time bringing our Itaniums back to life. No problem. I found it trolling the P4/P5 regression list. It was the only one that I could wrap my head around easily that night. I forgot I had a test from BZ for this PR, it wasn't attached to the patch so it didn't get committed. I've committed it now as rev. 234360. Andrey Thanks for following-up on the others. Hopefully between Alexander, Bernd and myself we can get them reviewed and work towards getting those BZs resolved. jeff Index: gcc/testsuite/gcc.target/arm/pr69307.c === *** gcc/testsuite/gcc.target/arm/pr69307.c (revision 0) --- gcc/testsuite/gcc.target/arm/pr69307.c (revision 234360) *** *** 0 --- 1,34 + /* { dg-do run } */ + /* { dg-options "-O2 -fselective-scheduling -mtune=arm1136j-s" } */ + + typedef unsigned char uint8_t; + typedef unsigned short int uint16_t; + typedef unsigned int uint32_t; + typedef unsigned long long int uint64_t; + typedef uint8_t u8; + typedef uint16_t u16; + typedef uint32_t u32; + typedef uint64_t u64; + u64 __attribute__((noinline, noclone)) + foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0, u8 u8_1, u16 u16_1, u32 u32_1, u64 u64_1, u8 u8_2, u16 u16_2, u32 u32_2, u64 u64_2, u8 u8_3, u16 u16_3, u32 u32_3, u64 u64_3) + { + u8 *p8_2 = _2; + u16 *p16_2 = _2; + u8 *p8_3 = _3; + u64 *p64_3 = _3; + p8_2 = _3; + *p8_3 -= *p64_3; + *p8_2 = (u64)*p8_2 % ((u64)*p8_2 | 3); + u8_2 = (u64)u8_2 / ((u64)*p16_2 | 1); + u16_0 = (u64)u16_0 % ((u64)*p8_2 | 3); + return u8_0 + u16_0 + u32_0 + u64_0 + u8_1 + u16_1 + u32_1 + u64_1 + u8_2 + u16_2 + u32_2 + u64_2 + u8_3 + u16_3 + u32_3 + u64_3; + } + int main() + { + u64 x = 0; + x += foo(3llu, 6llu, 15llu, 28llu, 5llu, 11llu, 20llu, 44llu, 7llu, 10llu, 20llu, 55llu, 0llu, 9llu, 17llu, 48llu); + __builtin_printf("%02x%02x%02x%02x%02x%02x%02x%02x\n", (unsigned)((x >> 0) & 0xff), (unsigned)((x >> 8) & 0xff), (unsigned)((x >> 16) & 0xff), (unsigned)((x >> 24) & 0xff), (unsigned)((x >> 32) & 0xff), (unsigned)((x >> 40) & 0xff), (unsigned)((x >> 48) & 0xff), (unsigned)((x >> 56) & 0xff)); + if (x != 0x1f3) + __builtin_abort(); + return 0; + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 234359) --- gcc/testsuite/ChangeLog (revision 234360) *** *** 1,5 --- 1,10 2016-03-21 Andrey Belevantsev <a...@ispras.ru> + PR rtl-optimization/69307 + * gcc.target/arm/pr69307.c: New test. + + 2016-03-21 Andrey Belevantsev <a...@ispras.ru> + PR rtl-optimization/69102 * gcc.c-torture/compile/pr69102.c: New test.
Re: [02/05] Fix PR 63384
On 15.03.2016 20:44, Alexander Monakov wrote: On Tue, 15 Mar 2016, Marek Polacek wrote: This test fails for me due to cc1plus: warning: var-tracking-assignments changes selective scheduling Thanks for the heads-up Marek, and sorry for the trouble. Like I said in the adjacent reply, the warning is expected (I didn't realize the testsuite would notice that, though). I think the right fix is to simply add "-w" to dg-options, and while we are at it, we should probably change -fvta-toggle to just -fvta as well (because VTA is active either way, right?). Yes, the -fvta should work. Sorry for the breakage, I guess I've misread the compare-tests output when also checking the run with forced sel-sched enabled. I can take care of the test tomorrow morning or you can do it now. Best, Andrey Andrey? Thanks. Alexander
Re: [05/05] Fix PR 69102
Hello, On 14.03.2016 12:52, Andrey Belevantsev wrote: Hello, The problem here is readonly dependence contexts in selective scheduler. We're trying to cache the effect of initializing a dependence context with remembering that context and setting a readonly bit on it. When first moving the insn 43 with REG_ARGS_SIZE note through the insn 3 (a simple eax set) we also set the last_args_size field of the context. Later, when we make a copy of insn 43 and try to move it again through insn 3, we take the cached dependency context and notice the (fake) dep with last_args_size insn, which is the old insn 43. Then the assert saying that we should be able to lift the bookkeeping copy up the same way as we did with the original insn breaks. Fixed by the attached patch that makes us notice only deps with the current producer insn. Ok for trunk? We've discussed the patch with Alexander a bit and decided to take the different approach. The core issue here is not handling the new last_args_size field in the readonly contexts. In general, the readonly context approach, when analyzing an insn with a readonly context, would create the necessary dependencies with all of the last_ fields but refrain from modifying those fields. The reason is we need to capture the effect of only the single producer in the readonly context. Failing to do so may update the last_ fields with the effect of subsequent analyses and having the fake dependencies with the insns that got into those fields instead of having the dependencies with the currently used producer. So the right fix here is to guard setting of the last_args_size field with !deps->readonly test as it is done elsewhere in the sched-deps.c. In stage 1 we will also want to set the asserts in the sel-sched dependency hooks (where I have placed early returns in the previous version of the patch) actually checking that the dependency is always created with the current producer, and such cases will be caught sooner. The new patch bootstrapped and tested on x86-64 with selective scheduler forced enabled with no regressions. It needs the maintainer outside of sel-sched as we touch sched-deps.c file. Ok for trunk? The test is the same as in previous patch. Andrey 2016-03-15 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69102 * sched-deps.c (sched_analyze_insn): Do not set last_args_size field when we have a readonly dependency context. gcc/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69102 * sel-sched.c (has_dependence_note_dep): Only take into account dependencies produced by the current producer insn. (has_dependence_note_mem_dep): Likewise. testsuite/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69102 * gcc.c-torture/compile/pr69102.c: New test. Best, Andrey diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 3d4a1d5..77ffcd0 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -3495,7 +3495,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn) { if (deps->last_args_size) add_dependence (insn, deps->last_args_size, REG_DEP_OUTPUT); - deps->last_args_size = insn; + if (!deps->readonly) + deps->last_args_size = insn; } }
Re: [01/05] Fix PR 64411
Hello, On 14.03.2016 19:45, Bernd Schmidt wrote: On 03/14/2016 05:23 PM, Alexander Monakov wrote: On Mon, 14 Mar 2016, Andrey Belevantsev wrote: In this case, we get an inconsistency between the sched-deps interface, saying we can't move an insn writing the si register through a vector insn, and the liveness analysis, saying we can. The latter doesn't take into account implicit_reg_pending_clobbers set calculated in sched-deps before register allocation. The solution is to reflect this set in our insn data (sets/uses/clobbers). Ok for trunk? One nit; the prototype of the new function: extern void get_implicit_reg_pending_clobbers (rtx_insn *, HARD_REG_SET *); has source operand on the left, destination on the right; it's probably nicer to swap them around. OK as far as selective scheduler changes go, but this also needs a general scheduler maintainer ack for the sched-deps.c change. Vladimir, can you have a look? Needs better documentation of the new function's arguments (as per general requirements for such things), but otherwise that part is ok (either arg order). The sel-sched parts should also have proper function comments however, and here: +{ + SET_REGNO_REG_SET (IDATA_REG_SETS (id), regno); +} we don't use braces around single statements. I've incorporated both yours and Alexander's comments and committed the patch as rev. 234216. Andrey Bernd
Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly
Hello Jeff, On 12.03.2016 20:13, Jeff Law wrote: As Andrey outlined in the PR, selective-scheduling was missing a check & handling of hard registers in modes that span more than one hard reg. This caused an incorrect register selection during renaming. I verified removing the printf call from the test would not compromise the test. Then I did a normal x86 bootstrap & regression test with the patch. Of course that's essentially useless, so I also did another bootstrap and regression test with -fselective-scheduling in BOOT_CFLAGS with and without this patch. In both cases there were no regressions. Thank you for checking this in. I've also tested this patch in the similar way (forcing selective scheduling for 2nd and both schedulers) both on x86-64 and ia64. I've posted the patches for remaining sel-sched PRs just now -- it took some time bringing our Itaniums back to life. Andrey I'm installing Andrey's patch on the trunk. I'm not sure this is worth addressing in gcc-5. Jeff
[05/05] Fix PR 69102
Hello, The problem here is readonly dependence contexts in selective scheduler. We're trying to cache the effect of initializing a dependence context with remembering that context and setting a readonly bit on it. When first moving the insn 43 with REG_ARGS_SIZE note through the insn 3 (a simple eax set) we also set the last_args_size field of the context. Later, when we make a copy of insn 43 and try to move it again through insn 3, we take the cached dependency context and notice the (fake) dep with last_args_size insn, which is the old insn 43. Then the assert saying that we should be able to lift the bookkeeping copy up the same way as we did with the original insn breaks. Fixed by the attached patch that makes us notice only deps with the current producer insn. Ok for trunk? gcc/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69102 * sel-sched.c (has_dependence_note_dep): Only take into account dependencies produced by the current producer insn. (has_dependence_note_mem_dep): Likewise. testsuite/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69102 * gcc.c-torture/compile/pr69102.c: New test. Best, Andrey diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index c1a9e55..b4aa933 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3277,9 +3277,14 @@ has_dependence_note_reg_use (int regno) static void has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED, rtx pending_mem ATTRIBUTE_UNUSED, - insn_t pending_insn ATTRIBUTE_UNUSED, + insn_t pending_insn, ds_t ds ATTRIBUTE_UNUSED) { + /* We're only interested in dependencies with the current producer. + We might get other insns that were saved in dependence context + as last_* or pending_* fields. */ + if (INSN_UID (pending_insn) != INSN_UID (has_dependence_data.pro)) +return; if (!sched_insns_conditions_mutex_p (has_dependence_data.pro, VINSN_INSN_RTX (has_dependence_data.con))) { @@ -3291,9 +3296,14 @@ has_dependence_note_mem_dep (rtx mem ATTRIBUTE_UNUSED, /* Note a dependence. */ static void -has_dependence_note_dep (insn_t pro ATTRIBUTE_UNUSED, +has_dependence_note_dep (insn_t pro, ds_t ds ATTRIBUTE_UNUSED) { + /* We're only interested in dependencies with the current producer. + We might get other insns that were saved in dependence context + as last_* or pending_* fields. */ + if (INSN_UID (pro) != INSN_UID (has_dependence_data.pro)) +return; if (!sched_insns_conditions_mutex_p (has_dependence_data.pro, VINSN_INSN_RTX (has_dependence_data.con))) { diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69102.c b/gcc/testsuite/gcc.c-torture/compile/pr69102.c new file mode 100644 index 000..b1328ca --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr69102.c @@ -0,0 +1,21 @@ +/* { dg-options "-Og -fPIC -fschedule-insns2 -fselective-scheduling2 -fno-tree-fre --param=max-sched-extend-regions-iters=10" } */ +void bar (unsigned int); + +void +foo (void) +{ + char buf[1] = { 3 }; + const char *p = buf; + const char **q = + unsigned int ch; + switch (**q) +{ +case 1: ch = 5; break; +case 2: ch = 4; break; +case 3: ch = 3; break; +case 4: ch = 2; break; +case 5: ch = 1; break; +default: ch = 0; break; +} + bar (ch); +}
[04/05] Fix PR 69032
Hello, We fail to find the proper seqno for the fresh bookkeeping copy in this PR. The problem is that in get_seqno_by_preds we are iterating over bb from the given insn backwards up to the first bb insn. We skip the initial insn when iterating over bb, yet we should take seqno from it. The code in question originally didn't include bb head when iterating, and was patched to do so in 2011. The patch was wrong and instead of including bb head managed to exclude the original insn itself. By reading the original and patched code I've convinced myself that the right fix will be to do what the patch intended and include both the initial insn and the bb head in the iteration. Ok for trunk? gcc/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69032 * sel-sched-ir.c (get_seqno_by_preds): Include both tmp and head when looping backwards over basic block insns. testsuite/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/69032 * gcc.dg/pr69032.c: New test. Best, Andrey diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index ec59280..c1a9e55 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -4103,11 +4103,14 @@ get_seqno_by_preds (rtx_insn *insn) insn_t *preds; int n, i, seqno; - while (tmp != head) + /* Loop backwards from insn to head including both. */ + while (1) { - tmp = PREV_INSN (tmp); if (INSN_P (tmp)) return INSN_SEQNO (tmp); + if (tmp == head) + break; + tmp = PREV_INSN (tmp); } cfg_preds (bb, , ); diff --git a/gcc/testsuite/gcc.dg/pr69032.c b/gcc/testsuite/gcc.dg/pr69032.c new file mode 100644 index 000..e0925cd --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr69032.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fsched-pressure -fsel-sched-pipelining -fselective-scheduling" } */ + +void foo (long long i) +{ + while (i != -1) + { + ++i; + __asm__ (""); + } +}
[03/05] Fix PR 66660
Hello, We speculate an insn in the PR but we do not make a check for it though we should. The thing that broke this was the fix for PR 45472. In that pr, we have moved a volatile insn too far up because we failed to merge the bits describing its volatility when we have processed a control flow split. The code to propagate the insn pattern with the insn merging was added when the volatility of the two insns from the both split branches differ. However, the volatility of the speculated insn and its original differ: the original insn may trap while the speculated version may not. Thus, we replace a speculative pattern with the original one per the PR 45472 fix for no reason. The patch for this problem just limits the original fix for PR 45472 to apply for non-speculative insns only. There is no test as it is not so easy to construct one -- we could count the number of speculation check in the resulting assembly but there is no way to force speculation to happen. Ok for trunk? gcc/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR target/0 * sel-sched-ir.c (merge_expr): Do not propagate trap bits into speculative insns. Best, Andrey commit 53ef39496acc26cc0021555e403068e93343aa20 Author: Andrey Belevantsev <a...@ispras.ru> Date: Wed Jan 27 17:20:27 2016 +0300 Fix pr0: do not propagate trap bits into speculative insns diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index e181cb9..ec59280 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -1871,12 +1871,12 @@ merge_expr (expr_t to, expr_t from, insn_t split_point) /* Make sure that speculative pattern is propagated into exprs that have non-speculative one. This will provide us with consistent speculative bits and speculative patterns inside expr. */ - if ((EXPR_SPEC_DONE_DS (from) != 0 - && EXPR_SPEC_DONE_DS (to) == 0) - /* Do likewise for volatile insns, so that we always retain - the may_trap_p bit on the resulting expression. */ - || (VINSN_MAY_TRAP_P (EXPR_VINSN (from)) - && !VINSN_MAY_TRAP_P (EXPR_VINSN (to + if (EXPR_SPEC_DONE_DS (to) == 0 + && (EXPR_SPEC_DONE_DS (from) != 0 + /* Do likewise for volatile insns, so that we always retain + the may_trap_p bit on the resulting expression. */ + || (VINSN_MAY_TRAP_P (EXPR_VINSN (from)) + && !VINSN_MAY_TRAP_P (EXPR_VINSN (to) change_vinsn_in_expr (to, EXPR_VINSN (from)); merge_expr_data (to, from, split_point);
[02/05] Fix PR 63384
Hello, Here we're looping because we decrease the counter of the insns we still can issue on a DEBUG_INSN thus rendering the counter negative. The fix is to not count debug insns in the corresponding code. The selective scheduling is known to spoil the result of var tracking, but still it is not the reason to hang in there. The toggle option used in the test seems to be the equivalent of just enabling var-tracking-assignments which should lead to the same situation; however, if specified as is, var-tracking-assignments will be disabled by the toplev.c:1460 code. Maybe we also need the same treatment for flag_var_tracking_assignments_toggle. Ok for trunk? gcc/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/63384 * sel-sched.c (invoke_aftermath_hooks): Do not decrease issue_more on DEBUG_INSN_P insns. testsuite/ 2016-03-14 Andrey Belevantsev <a...@ispras.ru> PR rtl-optimization/63384 * testsuite/g++.dg/pr63384.C: New test. Best, Andrey diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index c798935..893a3e5 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -4249,7 +4249,8 @@ invoke_aftermath_hooks (fence_t fence, rtx_insn *best_insn, int issue_more) issue_more); memcpy (FENCE_STATE (fence), curr_state, dfa_state_size); } - else if (GET_CODE (PATTERN (best_insn)) != USE + else if (! DEBUG_INSN_P (best_insn) + && GET_CODE (PATTERN (best_insn)) != USE && GET_CODE (PATTERN (best_insn)) != CLOBBER) issue_more--; diff --git a/gcc/testsuite/g++.dg/pr63384.C b/gcc/testsuite/g++.dg/pr63384.C new file mode 100644 index 000..b4e0784 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr63384.C @@ -0,0 +1,12 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fselective-scheduling2 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fsel-sched-reschedule-pipelined -fvar-tracking-assignments-toggle -ftree-vectorize" } */ + +template T **make_test_matrix() { + T **data = new T *; + for (int i = 0; i < 1000; i++) +; +} + +template void test() { T **c = make_test_matrix(); } + +main() { test(); }
[01/05] Fix PR 64411
Hello, In this case, we get an inconsistency between the sched-deps interface, saying we can't move an insn writing the si register through a vector insn, and the liveness analysis, saying we can. The latter doesn't take into account implicit_reg_pending_clobbers set calculated in sched-deps before register allocation. The solution is to reflect this set in our insn data (sets/uses/clobbers). Ok for trunk? gcc/ 2016-01-15 Andrey Belevantsev <a...@ispras.ru> PR target/64411 * sched-deps.c (get_implicit_reg_pending_clobbers): New function, factored out from ... (sched_analyze_insn): ... here. * sched-int.h (get_implicit_reg_pending_clobbers): Declare it. * sel-sched-ir.c (setup_id_implicit_regs): New function, use get_implicit_reg_pending_clobbers in it. (setup_id_reg_sets): Use setup_id_implicit_regs. (deps_init_id): Ditto. testsuite/ 2016-01-15 Andrey Belevantsev <a...@ispras.ru> PR target/64411 * gcc.target/i386/pr64411.C: New test. Best, Andrey diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 4961dfb..3d4a1d5 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -2860,6 +2860,17 @@ sched_macro_fuse_insns (rtx_insn *insn) } +/* Get the implicit reg pending clobbers for INSN. */ +void +get_implicit_reg_pending_clobbers (rtx_insn *insn, HARD_REG_SET *temp) +{ + extract_insn (insn); + preprocess_constraints (insn); + alternative_mask preferred = get_preferred_alternatives (insn); + ira_implicitly_set_insn_hard_regs (temp, preferred); + AND_COMPL_HARD_REG_SET (*temp, ira_no_alloc_regs); +} + /* Analyze an INSN with pattern X to find all dependencies. */ static void sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn) @@ -2872,12 +2883,7 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx_insn *insn) if (! reload_completed) { HARD_REG_SET temp; - - extract_insn (insn); - preprocess_constraints (insn); - alternative_mask prefrred = get_preferred_alternatives (insn); - ira_implicitly_set_insn_hard_regs (, prefrred); - AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); + get_implicit_reg_pending_clobbers (insn, ); IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp); } diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 378c3aa..d0e2c0e 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1351,6 +1351,7 @@ extern void finish_deps_global (void); extern void deps_analyze_insn (struct deps_desc *, rtx_insn *); extern void remove_from_deps (struct deps_desc *, rtx_insn *); extern void init_insn_reg_pressure_info (rtx_insn *); +extern void get_implicit_reg_pending_clobbers (rtx_insn *, HARD_REG_SET *); extern dw_t get_dep_weak (ds_t, ds_t); extern ds_t set_dep_weak (ds_t, ds_t, dw_t); diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index d6c86b8..e181cb9 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -2650,6 +2650,24 @@ maybe_downgrade_id_to_use (idata_t id, insn_t insn) IDATA_TYPE (id) = USE; } +/* Setup implicit register clobbers calculated by sched-deps before reload. */ +static void +setup_id_implicit_regs (idata_t id, insn_t insn) +{ + if (reload_completed) +return; + + HARD_REG_SET temp; + unsigned regno; + hard_reg_set_iterator hrsi; + + get_implicit_reg_pending_clobbers (insn, ); + EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi) +{ + SET_REGNO_REG_SET (IDATA_REG_SETS (id), regno); +} +} + /* Setup register sets describing INSN in ID. */ static void setup_id_reg_sets (idata_t id, insn_t insn) @@ -2704,6 +2722,9 @@ setup_id_reg_sets (idata_t id, insn_t insn) } } + /* Also get implicit reg clobbers from sched-deps. */ + setup_id_implicit_regs (id, insn); + return_regset_to_pool (tmp); } @@ -2735,20 +2756,18 @@ deps_init_id (idata_t id, insn_t insn, bool force_unique_p) deps_init_id_data.force_use_p = false; init_deps (dc, false); - memcpy (_init_id_sched_deps_info, _deps_init_id_sched_deps_info, sizeof (deps_init_id_sched_deps_info)); - if (spec_info != NULL) deps_init_id_sched_deps_info.generate_spec_deps = 1; - sched_deps_info = _init_id_sched_deps_info; deps_analyze_insn (dc, insn); + /* Implicit reg clobbers received from sched-deps separately. */ + setup_id_implicit_regs (id, insn); free_deps (dc); - deps_init_id_data.id = NULL; } diff --git a/gcc/testsuite/gcc.target/i386/pr64411.C b/gcc/testsuite/gcc.target/i386/pr64411.C new file mode 100644 index 000..55858fb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr64411.C @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -mcmodel=medium -fPIC -fschedule-insns -fselective-scheduling" } */ + +typedef __SIZE_TYPE__ size_t; + +extern "C" long strtol () + { return 0; } + +static struct { + void *sp[2]; +} info; + +union S813 +{ + void * c[5]; +} +s813; + +S813 a81
Various selective scheduling fixes
Hello, In this thread I will be posting the patches for the fixed selective scheduling PRs (except the one that was already kindly checked in by Jeff). The patches were tested both on x86-64 and ia64 with the following combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the bootstrap/regtest with the second scheduler forced to sel-sched; 3) both schedulers forced to sel-sched. In all cases everything seemed to be fine. Three of the PRs are regressions, the other two showed different errors across the variety of releases tested by submitters; I think all of them are appropriate at this stage -- they do not touch anything outside of selective scheduling except the first patch where a piece of code from sched-deps.c needs to be refactored into a function to be called from sel-sched.c. Andrey
Re: [PATCH 1/5] Fix asymmetric comparison functions
On 17.12.2015 15:13, Yury Gribov wrote: On 12/17/2015 02:58 PM, Andrey Belevantsev wrote: Hello, On 17.12.2015 11:58, Yury Gribov wrote: Some obvious symmetry fixes. Cc-ing * Andrey (Belevantsev) for bb_top_order_comparator Here, as Jakub mentioned, we assume that the argument addresses will never be equal, The problem is that this is not guaranteed. Well, if the consensus is that this is indeed the case, you're free to change both places as you suggest. Yours, Andrey thus that would always be different basic blocks (the comparator is used for providing a custom sort over loop body bbs) and you don't need a return 0 there. You can put there gcc_unreachable instead as in ... * Andrew (MacLeod) for compare_case_labels * Andrew (Pinski) for resort_field_decl_cmp * Diego for pair_cmp * Geoff for resort_method_name_cmp * Jakub for compare_case_labels * Jason for method_name_cmp * Richard for insert_phi_nodes_compare_var_infos, compare_case_labels * Steven for cmp_v_in_regset_pool ... this case -- here gcc_unreachable () marks that we're sorting pool pointers and their values are always different. Please do not remove it. Same here. /Yury
Re: [PATCH 1/5] Fix asymmetric comparison functions
Hello, On 17.12.2015 11:58, Yury Gribov wrote: Some obvious symmetry fixes. Cc-ing * Andrey (Belevantsev) for bb_top_order_comparator Here, as Jakub mentioned, we assume that the argument addresses will never be equal, thus that would always be different basic blocks (the comparator is used for providing a custom sort over loop body bbs) and you don't need a return 0 there. You can put there gcc_unreachable instead as in ... * Andrew (MacLeod) for compare_case_labels * Andrew (Pinski) for resort_field_decl_cmp * Diego for pair_cmp * Geoff for resort_method_name_cmp * Jakub for compare_case_labels * Jason for method_name_cmp * Richard for insert_phi_nodes_compare_var_infos, compare_case_labels * Steven for cmp_v_in_regset_pool ... this case -- here gcc_unreachable () marks that we're sorting pool pointers and their values are always different. Please do not remove it. Yours, Andrey /Yury
Re: [PATCH] Fix -Wlogical-not-parentheses warning in sel-sched-ir.c (PR c/61271)
On 26.08.2014 18:03, Marek Polacek wrote: Another wrongdoing detected by -Wlogical-not-parentheses. From my reading of the code it seems that simple != was meant here. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. Heck, we traced this code back to the May 2007 commit :) Andrey 2014-08-26 Marek Polacek pola...@redhat.com PR c/61271 * sel-sched-ir.c (make_regions_from_the_rest): Fix condition. diff --git gcc/sel-sched-ir.c gcc/sel-sched-ir.c index c36658f..dd777fa 100644 --- gcc/sel-sched-ir.c +++ gcc/sel-sched-ir.c @@ -6185,7 +6185,7 @@ make_regions_from_the_rest (void) FOR_EACH_BB_FN (bb, cfun) { - if (bb-loop_father !bb-loop_father-num == 0 + if (bb-loop_father bb-loop_father-num != 0 !(bb-flags BB_IRREDUCIBLE_LOOP)) loop_hdr[bb-index] = bb-loop_father-num; } Marek
[PATCH] Fix PR 60901
Hello, This ICE comes from the ix86_dependencies_evaluation_hook code assumption that any scheduling region will be connected. This assumption is not correct in case of the outer loops pipelining of the selective scheduler as explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3. Trying to add dependencies between insns from the different scheduling regions results in a segfault within the dependency analyzer code. The fix is to adjust the code to account for the situation when basic block's predecessors do not belong to the same scheduling region. Bootstrapped and tested on x86-64, OK for trunk? Branches? The fix is low risk as the additional test should always be true for the regular scheduler. Yours, Andrey gcc/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that bb predecessor belongs to the same scheduling region. Adjust comment. testsuite/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * gcc.dg/pr60901.c: New test. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 99f0657..0274288 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -26254,13 +26254,17 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail) { edge e; edge_iterator ei; - /* Assume that region is SCC, i.e. all immediate predecessors - of non-head block are in the same region. */ + + /* Regions are SCCs with the exception of selective + scheduling with pipelining of outer blocks enabled. + So also check that immediate predecessors of a non-head + block are in the same region. */ FOR_EACH_EDGE (e, ei, bb-preds) { /* Avoid creating of loop-carried dependencies through - using topological odering in region. */ - if (BLOCK_TO_BB (bb-index) BLOCK_TO_BB (e-src-index)) + using topological ordering in the region. */ + if (rgn == CONTAINING_RGN (e-src-index) + BLOCK_TO_BB (bb-index) BLOCK_TO_BB (e-src-index)) add_dependee_for_func_arg (first_arg, e-src); } } diff --git a/gcc/testsuite/gcc.dg/pr60901.c b/gcc/testsuite/gcc.dg/pr60901.c new file mode 100644 index 000..1350f16 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr60901.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fselective-scheduling -fschedule-insns -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fno-tree-dominator-opts } */ + +extern int n; +extern void bar (void); +extern int baz (int); + +void +foo (void) +{ + int i, j; + for (j = 0; j n; j++) +{ + for (i = 1; i j; i++) + bar (); + baz (0); +} +}
[PATCH] Fix PR 60866
Hello, As described in the PR in more details, this ICE is about the scheduler not being able to initialize its data structures for the new unconditional jump created when redirecting an edge and simplifying control flow. It so happens that the new jump is the only unscheduled instruction being left so we need to pass some of its data from the old jump. The difficulty is to support only this particular situation to retain the existing checking code and to pass down the required information to the point where it is needed. I assume that we are able to use default parameters as I haven't heard otherwise. Bootstrapped and tested on x86-64, OK for trunk and branches? Branches should also be safe as we're fixing only very specific corner case (hey, we regressed four releases and nobody noticed until now). Yours, Andrey gcc/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60866 * sel-sched-ir (sel_init_new_insn): New parameter old_seqno. Default it to -1. Pass it down to init_simplejump_data. (init_simplejump_data): New parameter old_seqno. Pass it down to get_seqno_for_a_jump. (get_seqno_for_a_jump): New parameter old_seqno. Use it for initializing new jump seqno as a last resort. Add comment. (sel_redirect_edge_and_branch): Save old seqno of the conditional jump and pass it down to sel_init_new_insn. (sel_redirect_edge_and_branch_force): Likewise. testsuite/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60866 * gcc.dg/pr60866.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 868083b..3cba326 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -162,7 +162,7 @@ static void create_initial_data_sets (basic_block); static void free_av_set (basic_block); static void invalidate_av_set (basic_block); static void extend_insn_data (void); -static void sel_init_new_insn (insn_t, int); +static void sel_init_new_insn (insn_t, int, int = -1); static void finish_insns (void); /* Various list functions. */ @@ -4007,9 +4007,10 @@ get_seqno_by_succs (rtx insn) return seqno; } -/* Compute seqno for INSN by its preds or succs. */ +/* Compute seqno for INSN by its preds or succs. Use OLD_SEQNO to compute + seqno in corner cases. */ static int -get_seqno_for_a_jump (insn_t insn) +get_seqno_for_a_jump (insn_t insn, int old_seqno) { int seqno; @@ -4065,8 +4066,16 @@ get_seqno_for_a_jump (insn_t insn) if (seqno 0) seqno = get_seqno_by_succs (insn); - gcc_assert (seqno = 0); + if (seqno 0) +{ + /* The only case where this could be here legally is that the only + unscheduled insn was a conditional jump that got removed and turned + into this unconditional one. Initialize from the old seqno + of that jump passed down to here. */ + seqno = old_seqno; +} + gcc_assert (seqno = 0); return seqno; } @@ -4246,22 +4255,24 @@ init_insn_data (insn_t insn) } /* This is used to initialize spurious jumps generated by - sel_redirect_edge (). */ + sel_redirect_edge (). OLD_SEQNO is used for initializing seqnos + in corner cases within get_seqno_for_a_jump. */ static void -init_simplejump_data (insn_t insn) +init_simplejump_data (insn_t insn, int old_seqno) { init_expr (INSN_EXPR (insn), vinsn_create (insn, false), 0, REG_BR_PROB_BASE, 0, 0, 0, 0, 0, 0, vNULL, true, false, false, false, true); - INSN_SEQNO (insn) = get_seqno_for_a_jump (insn); + INSN_SEQNO (insn) = get_seqno_for_a_jump (insn, old_seqno); init_first_time_insn_data (insn); } /* Perform deferred initialization of insns. This is used to process - a new jump that may be created by redirect_edge. */ -void -sel_init_new_insn (insn_t insn, int flags) + a new jump that may be created by redirect_edge. OLD_SEQNO is used + for initializing simplejumps in init_simplejump_data. */ +static void +sel_init_new_insn (insn_t insn, int flags, int old_seqno) { /* We create data structures for bb when the first insn is emitted in it. */ if (INSN_P (insn) @@ -4288,7 +4299,7 @@ sel_init_new_insn (insn_t insn, int flags) if (flags INSN_INIT_TODO_SIMPLEJUMP) { extend_insn_data (); - init_simplejump_data (insn); + init_simplejump_data (insn, old_seqno); } gcc_assert (CONTAINING_RGN (BLOCK_NUM (insn)) @@ -5575,14 +5586,14 @@ sel_merge_blocks (basic_block a, basic_block b) } /* A wrapper for redirect_edge_and_branch_force, which also initializes - data structures for possibly created bb and insns. Returns the newly - added bb or NULL, when a bb was not needed. */ + data structures for possibly created bb and insns. */ void sel_redirect_edge_and_branch_force (edge e, basic_block to) { basic_block jump_bb, src, orig_dest = e-dest; int prev_max_uid; rtx jump; + int old_seqno = -1; /* This function is now used only
Re: [PATCH] Fix PR 60901
On 14.05.2014 13:09, Uros Bizjak wrote: On Wed, May 14, 2014 at 10:57 AM, Andrey Belevantsev a...@ispras.ru wrote: This ICE comes from the ix86_dependencies_evaluation_hook code assumption that any scheduling region will be connected. This assumption is not correct in case of the outer loops pipelining of the selective scheduler as explained in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60901#c3. Trying to add dependencies between insns from the different scheduling regions results in a segfault within the dependency analyzer code. The fix is to adjust the code to account for the situation when basic block's predecessors do not belong to the same scheduling region. Bootstrapped and tested on x86-64, OK for trunk? Branches? The fix is low risk as the additional test should always be true for the regular scheduler. I don't know all scheduler details, so your opinion counts there. Let's put this fix to mainline first and after a week without problems, backport it to all release branches. gcc/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * config/i386/i386.c (ix86_dependencies_evaluation_hook): Check that bb predecessor belongs to the same scheduling region. Adjust comment. testsuite/ 2014-05-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60901 * gcc.dg/pr60901.c: New test. +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fselective-scheduling -fschedule-insns -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fno-tree-dominator-opts } */ + As this is clearly a target bug, let's put the test in gcc.target/i386 directory. You can remove target selector, and the test will run for 64bit and 32bit targets automatically. Thanks for noticing, I forgot about this. Updated the test and committed in 210414. Andrey
Re: [PATCH] Fix PR 60268
On 25.02.2014 13:14, Andreas Schwab wrote: Andrey Belevantsev a...@ispras.ru writes: Fixed by placing the initialization properly at the end of sched_rgn_init and also moving the check for sched_pressure != NONE outside of the if statement in schedule_region as discussed in the PR trail with Jakub. Bootstrapped and tested on x86-64, ok? This breaks m68k: $ gcc/xgcc -Bgcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -flive-range-shrinkage -c -o pr60268.o ../gcc/testsuite/gcc.c-torture/compile/pr60268.c ../gcc/testsuite/gcc.c-torture/compile/pr60268.c: In function ‘f’: ../gcc/testsuite/gcc.c-torture/compile/pr60268.c:6:1: internal compiler error: in m68k_sched_issue_rate, at config/m68k/m68k.c:5978 The patch itself has nothing to do with the ICE, probably it means that -flive-range-shrinkage was never tried without tuning on m68k, because in m68k.c there is 630 /* Setup scheduling options. */ 631 if (TUNE_CFV1) 632 m68k_sched_cpu = CPU_CFV1; 633 else if (TUNE_CFV2) 634 m68k_sched_cpu = CPU_CFV2; 635 else if (TUNE_CFV3) 636 m68k_sched_cpu = CPU_CFV3; 637 else if (TUNE_CFV4) 638 m68k_sched_cpu = CPU_CFV4; 639 else 640 { 641 m68k_sched_cpu = CPU_UNKNOWN; 642 flag_schedule_insns = 0; 643 flag_schedule_insns_after_reload = 0; 644 flag_modulo_sched = 0; 645 } And on line 641 m68k_sched_cpu set to CPU_UNKNOWN is causing the ICE. I guess you need to turn the live range shrinkage off in that piece of code, the below patch fixes the ICE for me: diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index f20d071..ea1bcd4 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -642,6 +642,7 @@ m68k_option_override (void) flag_schedule_insns = 0; flag_schedule_insns_after_reload = 0; flag_modulo_sched = 0; + flag_live_range_shrinkage = 0; } if (m68k_sched_cpu != CPU_UNKNOWN) Yours, Andrey 0xbabc8b m68k_sched_issue_rate ../../gcc/config/m68k/m68k.c:5978 0xc3d9dc sched_init() ../../gcc/haifa-sched.c:6657 0xc3eecf haifa_sched_init() ../../gcc/haifa-sched.c:6719 0x8e807c schedule_insns ../../gcc/sched-rgn.c:3407 0x8e87cb schedule_insns ../../gcc/sched-rgn.c:3401 0x8e87cb rest_of_handle_live_range_shrinkage ../../gcc/sched-rgn.c:3614 0x8e87cb execute ../../gcc/sched-rgn.c:3704 Andreas.
[PATCH, committed] Fix PR 60292
Hello, This patch fixes the ICE on assert recently introduced by me and by itself is a continuation of PR 57422. When resetting the target availability bit, we do not always recompute it as this means renaming support and it is expensive. Thus, we cannot rely on the recomputation to check that the fence insn will always get the bit right. So the patch avoids resetting the bit for the fence insn. Bootstrapped and tested on x86-64 with selective scheduling enabled and on ia64, Dominique has kindly tested on darwin. Committed after offline discussion with Alexander. Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 208108) --- gcc/ChangeLog (revision 208109) *** *** 1,3 --- 1,9 + 2014-02-25 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/60292 + * sel-sched.c (fill_vec_av_set): Do not reset target availability + bit fot the fence instruction. + 2014-02-24 Alangi Derick alangider...@gmail.com * calls.h: Fix typo in comment. Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 208108) --- gcc/sel-sched.c (revision 208109) *** fill_vec_av_set (av_set_t av, blist_t bn *** 3823,3829 /* If insn was already scheduled on the current fence, set TARGET_AVAILABLE to -1 no matter what expr's attribute says. */ ! if (vinsn_vec_has_expr_p (vec_target_unavailable_vinsns, expr)) target_available = -1; /* If the availability of the EXPR is invalidated by the insertion of --- 3823,3830 /* If insn was already scheduled on the current fence, set TARGET_AVAILABLE to -1 no matter what expr's attribute says. */ ! if (vinsn_vec_has_expr_p (vec_target_unavailable_vinsns, expr) ! !fence_insn_p) target_available = -1; /* If the availability of the EXPR is invalidated by the insertion of
Print out final scheduling time for sel-sched
Hello, While looking at PR 60086 I noticed that we do not print the total scheduling time for the region in selective scheduler as we do for the Haifa scheduler. This is helpful for the PRs analysis and needs only the tiny adjustment so I went ahead and installed the following. Bootstrapped and tested with the patch for PR 60292. The patch does not actually fixes 60086, just helps with analyzing it, so no mentions of the PR in the changelog. Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 208109) --- gcc/ChangeLog (revision 208110) *** *** 1,4 --- 1,11 2014-02-25 Andrey Belevantsev a...@ispras.ru + + * sel-sched.c (calculate_new_fences): New parameter ptime. + Calculate it as a maximum over all fence cycles. + (sel_sched_region_2): Adjust the call to calculate_new_fences. + Print the final schedule timing when sched_verbose. + + 2014-02-25 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60292 * sel-sched.c (fill_vec_av_set): Do not reset target availability Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 208109) --- gcc/sel-sched.c (revision 208110) *** find_min_max_seqno (flist_t fences, int *** 7467,7478 } } ! /* Calculate new fences from FENCES. */ static flist_t ! calculate_new_fences (flist_t fences, int orig_max_seqno) { flist_t old_fences = fences; struct flist_tail_def _new_fences, *new_fences = _new_fences; flist_tail_init (new_fences); for (; fences; fences = FLIST_NEXT (fences)) --- 7467,7479 } } ! /* Calculate new fences from FENCES. Write the current time to PTIME. */ static flist_t ! calculate_new_fences (flist_t fences, int orig_max_seqno, int *ptime) { flist_t old_fences = fences; struct flist_tail_def _new_fences, *new_fences = _new_fences; + int max_time = 0; flist_tail_init (new_fences); for (; fences; fences = FLIST_NEXT (fences)) *** calculate_new_fences (flist_t fences, in *** 7501,7509 --- 7502,7512 } else extract_new_fences_from (fences, new_fences, orig_max_seqno); + max_time = MAX (max_time, FENCE_CYCLE (fence)); } flist_clear (old_fences); + *ptime = max_time; return FLIST_TAIL_HEAD (new_fences); } *** static void *** 7558,7563 --- 7561,7567 sel_sched_region_2 (int orig_max_seqno) { int highest_seqno_in_use = orig_max_seqno; + int max_time = 0; stat_bookkeeping_copies = 0; stat_insns_needed_bookkeeping = 0; *** sel_sched_region_2 (int orig_max_seqno) *** 7573,7591 find_min_max_seqno (fences, min_seqno, max_seqno); schedule_on_fences (fences, max_seqno, scheduled_insns_tailp); ! fences = calculate_new_fences (fences, orig_max_seqno); highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno, highest_seqno_in_use, scheduled_insns); } if (sched_verbose = 1) ! sel_print (Scheduled %d bookkeeping copies, %d insns needed !bookkeeping, %d insns renamed, %d insns substituted\n, !stat_bookkeeping_copies, !stat_insns_needed_bookkeeping, !stat_renamed_scheduled, !stat_substitutions_total); } /* Schedule a region. When pipelining, search for possibly never scheduled --- 7577,7598 find_min_max_seqno (fences, min_seqno, max_seqno); schedule_on_fences (fences, max_seqno, scheduled_insns_tailp); ! fences = calculate_new_fences (fences, orig_max_seqno, max_time); highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno, highest_seqno_in_use, scheduled_insns); } if (sched_verbose = 1) ! { ! sel_print (Total scheduling time: %d cycles\n, max_time); ! sel_print (Scheduled %d bookkeeping copies, %d insns needed !bookkeeping, %d insns renamed, %d insns substituted\n, !stat_bookkeeping_copies, !stat_insns_needed_bookkeeping, !stat_renamed_scheduled, !stat_substitutions_total); ! } } /* Schedule a region. When pipelining, search for possibly never scheduled
[PATCH] Fix PR 60268
Hello, While fixing PR 58960 I forgot about single-block regions placing the initialization of the new nr_regions_initial variable in the wrong place. Thus for single block regions we ended up with nr_regions = 1 and nr_regions_initial = 0 and effectively turned off sched-pressure immediately. No worries for the usual scheduling path but with the -flive-range-shrinkage we have broke an assert that sched-pressure is in the specific mode. Fixed by placing the initialization properly at the end of sched_rgn_init and also moving the check for sched_pressure != NONE outside of the if statement in schedule_region as discussed in the PR trail with Jakub. Bootstrapped and tested on x86-64, ok? Andrey gcc/ 2014-02-21 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60268 * sched-rgn.c (haifa_find_rgns): Move the nr_regions_initial init to ... (sched_rgn_init) ... here. (schedule_region): Check for SCHED_PRESSURE_NONE earlier. testsuite/ 2014-02-21 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60268 * gcc.c-torture/compile/pr60268.c: New test. diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c index 0573b6a..dc6fa16 100644 --- a/gcc/sched-rgn.c +++ b/gcc/sched-rgn.c @@ -1067,7 +1067,6 @@ haifa_find_rgns (void) BLOCK_TO_BB (bb-index) = 0; } - nr_regions_initial = nr_regions; free (max_hdr); free (degree); free (stack); @@ -2997,10 +2996,10 @@ schedule_region (int rgn) /* Do not support register pressure sensitive scheduling for the new regions as we don't update the liveness info for them. */ - if (rgn = nr_regions_initial) + if (sched_pressure != SCHED_PRESSURE_NONE + rgn = nr_regions_initial) { - if (sched_pressure != SCHED_PRESSURE_NONE) - free_global_sched_pressure_data (); + free_global_sched_pressure_data (); sched_pressure = SCHED_PRESSURE_NONE; } @@ -3166,6 +3165,7 @@ sched_rgn_init (bool single_blocks_p) RGN_BLOCKS (nr_regions) = (RGN_BLOCKS (nr_regions - 1) + RGN_NR_BLOCKS (nr_regions - 1)); + nr_regions_initial = nr_regions; } /* Free data structures for region scheduling. */ diff --git a/gcc/testsuite/gcc.c-torture/compile/pr60268.c b/gcc/testsuite/gcc.c-torture/compile/pr60268.c new file mode 100644 index 000..c3a6f94 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr60268.c @@ -0,0 +1,6 @@ +/* { dg-options -flive-range-shrinkage } */ +void f() +{ + int i = 0; + void *p = 0; +}
Re: [PATCH] Fix PR 58960
On 30.01.2014 9:42, Andrey Belevantsev wrote: Hello, Ping. As detailed in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58960#c6, we fail to use the DF liveness info in the register pressure sensitive scheduling for the new blocks as we do not properly compute it in this case. The patch fixes this by avoiding to use the sched-pressure for the new regions, as currently these are only ia64 recovery blocks and supposed to be cold. In the case we'd get other cases of the new blocks, this may be reconsidered. The other options of computing the DF info sketched at the above link do not seem plausible for this stage. Bootstrapped and tested on ia64, also tested by Andreas Schwab on ia64 (see PR log). OK for trunk? Andrey 2013-01-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/58960 * haifa-sched.c (alloc_global_sched_pressure_data): New, factored out from ... (sched_init) ... here. (free_global_sched_pressure_data): New, factored out from ... (sched_finish): ... here. * sched-int.h (free_global_sched_pressure_data): Declare. * sched-rgn.c (nr_regions_initial): New static global. (haifa_find_rgns): Initialize it. (schedule_region): Disable sched-pressure for the newly generated regions.
Re: PATCH: PR tree-optimization/60024: global-buffer-overflow in init_regs_for_mode
Hello, On 01.02.2014 19:16, H.J. Lu wrote: Hi, init_regs_for_mode accesses global_regs before checking if regno is OK for a mode, which leads to buffer overflow. This patch moves HARD_REGNO_MODE_OK for checking global_regs. There is no functional change. OK for trunk? This is fine, you'd need to change tree-optimization/60024 to rtl-optimization/60024 in the ChangeLog. Thanks for doing this. Yours, Andrey Thanks. H.J. --- 2014-02-01 H.J. Lu hongjiu...@intel.com PR tree-optimization/60024 * sel-sched.c (init_regs_for_mode): Check if mode is OK first. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index ad4a0aa..7b1a183 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -1116,9 +1116,16 @@ init_regs_for_mode (enum machine_mode mode) for (cur_reg = 0; cur_reg FIRST_PSEUDO_REGISTER; cur_reg++) { - int nregs = hard_regno_nregs[cur_reg][mode]; + int nregs; int i; + /* See whether it accepts all modes that occur in + original insns. */ + if (! HARD_REGNO_MODE_OK (cur_reg, mode)) +continue; + + nregs = hard_regno_nregs[cur_reg][mode]; + for (i = nregs - 1; i = 0; --i) if (fixed_regs[cur_reg + i] || global_regs[cur_reg + i] @@ -1140,11 +1147,6 @@ init_regs_for_mode (enum machine_mode mode) if (i = 0) continue; - /* See whether it accepts all modes that occur in - original insns. */ - if (! HARD_REGNO_MODE_OK (cur_reg, mode)) -continue; - if (HARD_REGNO_CALL_PART_CLOBBERED (cur_reg, mode)) SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode], cur_reg);
Re: [PATCH] Fix PR 57662
Hello, On 14.08.2013 17:12, Andrey Belevantsev wrote: Hello, As noted in Bugzilla, the problem is this PR is the rare case of removing the unconditional jump that is used for traversing CFG when simplifying control flow. We rescan the remaining successors when we know the flow below the current point has changed, and when we have removed the jump we are currently on, we segfault. The fix is just to break out of successor iterator loop as in the unconditional jump case we have already visited the only successor. I've recently found another test case for this PR which exhibits the segfault in the other spot of the CFG traversal driver -- we try to mark the basic block as visited when it is already gone. Fixed with the below patch, the test was reduced as much as multidelta could. Bootstrapped and tested on ia64, committed to trunk. Yours, Andrey gcc/ PR rtl-optimization/57662 * sel-sched.c (code_motion_path_driver): Do not mark already not existing blocks in the visiting bitmap. testsuite/ PR rtl-optimization/57662 * g++.dg/pr57662.C: New test. Bootstrapped and tested on x86-64, ok for trunk? If yes, Alexander, could you please commit this for me? Andrey 2013-08-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/57662 gcc/ * sel-sched.c (code_motion_process_successors): When the current insn is removed after the recursive traversal, break from the loop. Add comments and debug printouts. testsuite/ * gcc.dg/pr57662.c: New test. Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 207420) --- gcc/ChangeLog (revision 207422) *** *** 1,5 --- 1,11 2014-02-03 Andrey Belevantsev a...@ispras.ru + PR rtl-optimization/57662 + * sel-sched.c (code_motion_path_driver): Do not mark already not + existing blocks in the visiting bitmap. + + 2014-02-03 Andrey Belevantsev a...@ispras.ru + * sel-sched-ir.c (sel_gen_insn_from_expr_after): Reset INSN_DELETED_P on the insn being emitted. Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 207420) --- gcc/testsuite/ChangeLog (revision 207422) *** *** 1,3 --- 1,8 + 2014-02-03 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/57662 + * g++.dg/pr57662.C: New test. + 2014-02-02 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/3b-15.c: Remove special handling for little endian. Index: gcc/testsuite/g++.dg/pr57662.C === *** gcc/testsuite/g++.dg/pr57662.C (revision 0) --- gcc/testsuite/g++.dg/pr57662.C (revision 207422) *** *** 0 --- 1,339 + /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ + /* { dg-options -O2 -fselective-scheduling2 -fsel-sched-pipelining } */ + + extern C { + typedef struct _IO_FILE FILE; + extern int putc(int __c, FILE * __stream); + extern int strcmp(__const char *__s1, __const char *__s2) throw() + __attribute__ ((__pure__)) __attribute__ ((__nonnull__(1, 2))); + } typedef union tree_node *tree; + struct gcc_options { + int x_flag_openmp; + }; + extern struct gcc_options global_options; + struct ht_identifier { + const unsigned char *str; + }; + enum cpp_ttype { + CPP_SEMICOLON, CPP_NAME + }; + struct vl_embed { + }; + struct va_heap { + }; + struct va_gc { + typedef vl_embed default_layout; + }; + template typename T, typename A = va_heap, typename L = + typename A::default_layout struct vec { + }; + enum tree_code { + ERROR_MARK, + IDENTIFIER_NODE, + OMP_SIMD, + CILK_SIMD, + MAX_TREE_CODES + }; + struct tree_identifier { + struct ht_identifier + id; + }; + union tree_node { + struct tree_identifier + identifier; + }; + inline tree + tree_check(tree __t, const char *__f, int __l, const char *__g, tree_code __c) + { + } + + extern tree chainon(tree, tree); + extern vec tree, va_gc *make_tree_vector(void); + typedef unsigned long omp_clause_mask; + enum c_omp_clause_split { + C_OMP_CLAUSE_SPLIT_TARGET = 0, C_OMP_CLAUSE_SPLIT_COUNT + }; + typedef struct cxx_saved_binding { + tree attributes; + } cp_decl_specifier_seq; + typedef enum pragma_kind { + PRAGMA_NONE = 0, PRAGMA_OMP_DECLARE_REDUCTION, PRAGMA_OMP_TARGET + } pragma_kind; + typedef enum pragma_omp_clause { + PRAGMA_OMP_CLAUSE_NONE = + 0, PRAGMA_OMP_CLAUSE_DEVICE, PRAGMA_OMP_CLAUSE_IF, + PRAGMA_OMP_CLAUSE_MAP + } pragma_omp_clause; + typedef struct cp_token { + enum cpp_ttype type:8; + union cp_token_value { + tree value; + } u; + } cp_token; + typedef struct cp_token *cp_token_position; + typedef struct cp_lexer { + cp_token_position next_token; + bool debugging_p; + cp_lexer *lexer; + } cp_parser; + static FILE *cp_lexer_debug_stream; + static inline bool cp_lexer_debugging_p(cp_lexer * lexer) + { + return lexer
[PATCH] Reset INSN_DELETED_P when reusing insns in the selective scheduler
Hello, While testing PR 58960 on ia64, I've enabled selective scheduling for bootstrap and found two small issues. One is the other instance of PR 57662 and another one is a miscompile that happens because we emit an insn with INSN_DELETED_P flag set; the insn then obviously does not get output in the assembly file. The reason for this is that we've cached this insn in the transformation cache and reused it after it was deleted from the insn stream. The patch resetting the flag qualifies as obvious, but I've discussed it additionally with Alexander, and will be committing it shortly. (The bootstrap fail is actually a regression, also I don't see how to make a (small) test case.) Steven, we've been adding insns through add_insn_after and since you've touched the code last, I wanted to ask you two things. First, do you want an assert there that we do not add INSN_DELETED_P insns (like we have for the after parameter in add_insn_after_nobb?) I have added one and it didn't fail on the x86-64 bootstrap, which is no wonder since this one is directly used by reorg and sel-sched only. Second, the comment for add_insn_after about the BB parameter does not apply -- the code immediately does 3871 void 3872 add_insn_after (rtx insn, rtx after, basic_block bb) 3873 { 3874 add_insn_after_nobb (insn, after); 3875 if (!BARRIER_P (after) 3876!BARRIER_P (insn) 3877(bb = BLOCK_FOR_INSN (after))) so the bb parameter gets overwritten. Should that be fixed in line with add_insn_before code? Yours, Andrey 2013-01-31 Andrey Belevantsev a...@ispras.ru * sel-sched-ir.c (sel_gen_insn_from_expr_after): Reset INSN_DELETED_P on the insn being emitted. Index: sel-sched-ir.c === *** sel-sched-ir.c (revision 207299) --- sel-sched-ir.c (working copy) *** sel_gen_insn_from_expr_after (expr_t exp *** 1398,1403 --- 1398,1408 emit_expr = set_insn_init (expr, vinsn ? vinsn : EXPR_VINSN (expr), seqno); insn = EXPR_INSN_RTX (emit_expr); + + /* The insn may come from the transformation cache, which may hold already + deleted insns, so mark it as not deleted. */ + INSN_DELETED_P (insn) = 0; + add_insn_after (insn, after, BLOCK_FOR_INSN (insn)); flags = INSN_INIT_TODO_SSID;
[PATCH] Fix PR 58960
Hello, As detailed in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58960#c6, we fail to use the DF liveness info in the register pressure sensitive scheduling for the new blocks as we do not properly compute it in this case. The patch fixes this by avoiding to use the sched-pressure for the new regions, as currently these are only ia64 recovery blocks and supposed to be cold. In the case we'd get other cases of the new blocks, this may be reconsidered. The other options of computing the DF info sketched at the above link do not seem plausible for this stage. Bootstrapped and tested on ia64, also tested by Andreas Schwab on ia64 (see PR log). OK for trunk? Andrey 2013-01-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/58960 * haifa-sched.c (alloc_global_sched_pressure_data): New, factored out from ... (sched_init) ... here. (free_global_sched_pressure_data): New, factored out from ... (sched_finish): ... here. * sched-int.h (free_global_sched_pressure_data): Declare. * sched-rgn.c (nr_regions_initial): New static global. (haifa_find_rgns): Initialize it. (schedule_region): Disable sched-pressure for the newly generated regions. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 4d51984..45398bf 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -6553,6 +6553,53 @@ setup_sched_dump (void) ? stderr : dump_file); } +/* Allocate data for register pressure sensitive scheduling. */ +static void +alloc_global_sched_pressure_data (void) +{ + if (sched_pressure != SCHED_PRESSURE_NONE) +{ + int i, max_regno = max_reg_num (); + + if (sched_dump != NULL) + /* We need info about pseudos for rtl dumps about pseudo + classes and costs. */ + regstat_init_n_sets_and_refs (); + ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL); + sched_regno_pressure_class + = (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class)); + for (i = 0; i max_regno; i++) + sched_regno_pressure_class[i] + = (i FIRST_PSEUDO_REGISTER + ? ira_pressure_class_translate[REGNO_REG_CLASS (i)] + : ira_pressure_class_translate[reg_allocno_class (i)]); + curr_reg_live = BITMAP_ALLOC (NULL); + if (sched_pressure == SCHED_PRESSURE_WEIGHTED) + { + saved_reg_live = BITMAP_ALLOC (NULL); + region_ref_regs = BITMAP_ALLOC (NULL); + } +} +} + +/* Free data for register pressure sensitive scheduling. */ +void +free_global_sched_pressure_data (void) +{ + if (sched_pressure != SCHED_PRESSURE_NONE) +{ + if (regstat_n_sets_and_refs != NULL) + regstat_free_n_sets_and_refs (); + if (sched_pressure == SCHED_PRESSURE_WEIGHTED) + { + BITMAP_FREE (region_ref_regs); + BITMAP_FREE (saved_reg_live); + } + BITMAP_FREE (curr_reg_live); + free (sched_regno_pressure_class); +} +} + /* Initialize some global state for the scheduler. This function works with the common data shared between all the schedulers. It is called from the scheduler specific initialization routine. */ @@ -6656,29 +6703,7 @@ sched_init (void) if (targetm.sched.init_global) targetm.sched.init_global (sched_dump, sched_verbose, get_max_uid () + 1); - if (sched_pressure != SCHED_PRESSURE_NONE) -{ - int i, max_regno = max_reg_num (); - - if (sched_dump != NULL) - /* We need info about pseudos for rtl dumps about pseudo - classes and costs. */ - regstat_init_n_sets_and_refs (); - ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL); - sched_regno_pressure_class - = (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class)); - for (i = 0; i max_regno; i++) - sched_regno_pressure_class[i] - = (i FIRST_PSEUDO_REGISTER - ? ira_pressure_class_translate[REGNO_REG_CLASS (i)] - : ira_pressure_class_translate[reg_allocno_class (i)]); - curr_reg_live = BITMAP_ALLOC (NULL); - if (sched_pressure == SCHED_PRESSURE_WEIGHTED) - { - saved_reg_live = BITMAP_ALLOC (NULL); - region_ref_regs = BITMAP_ALLOC (NULL); - } -} + alloc_global_sched_pressure_data (); curr_state = xmalloc (dfa_state_size); } @@ -6777,18 +6802,7 @@ void sched_finish (void) { haifa_finish_h_i_d (); - if (sched_pressure != SCHED_PRESSURE_NONE) -{ - if (regstat_n_sets_and_refs != NULL) - regstat_free_n_sets_and_refs (); - if (sched_pressure == SCHED_PRESSURE_WEIGHTED) - { - BITMAP_FREE (region_ref_regs); - BITMAP_FREE (saved_reg_live); - } - BITMAP_FREE (curr_reg_live); - free (sched_regno_pressure_class); -} + free_global_sched_pressure_data (); free (curr_state); if (targetm.sched.finish_global) diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 3b1106f..b5481b9 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1337,6 +1337,7 @@ extern void debug_ds (ds_t); extern void initialize_live_range_shrinkage (void); extern void finish_live_range_shrinkage (void
Re: [PATCH, committed] Fix PR 57422
On 23.12.2013 10:52, Andrey Belevantsev wrote: Hello, As described in the PR, the ICE reason was the typo made when introducing calls to add_hard_reg_set. Fixed by the first attached patch, bootstrapped and tested on both ia64 and x86_64, committed as obvious. The test case is very sensitive to the scheduler decisions (e.g. it didn't fail on trunk but only on the revision reported for me), so instead of adding the test I have put in the code two asserts checking that we can always schedule the fence instruction as is. This hunk was tested together with the first but committed separately. The first patch can be safely committed to 4.8, the second can stay on trunk only. Jakub, will it be fine with you? Now the first hunk is also committed to 4.8 and 4.7. Andrey
Re: [PATCH, committed] Fix PR 57422
On 23.12.2013 16:24, H.J. Lu wrote: On Sun, Dec 22, 2013 at 10:52 PM, Andrey Belevantsev a...@ispras.ru wrote: Hello, As described in the PR, the ICE reason was the typo made when introducing calls to add_hard_reg_set. Fixed by the first attached patch, bootstrapped and tested on both ia64 and x86_64, committed as obvious. The test case is very sensitive to the scheduler decisions (e.g. it didn't fail on trunk but only on the revision reported for me), so instead of adding the test I have put in the code two asserts checking that we can always schedule the fence instruction as is. This hunk was tested together with the first but committed separately. Testcase is very small. Why not add it? Frankly, I think that the chances of this test uncovering similar issues in the future are very small. It needs lots of options to make it trigger and even with this a specific revision. The chance of triggering the asserts I added on another code is much higher. In the past, I have also avoided to add tests that require 5+ options to trigger the issue, adding only those that have some hope on more ore less reliably reproducing the required issue. The best solution of course is having an infrastructure to test the specific scheduler decisions, which we don't have. You are welcome to add the test if you feel so strongly about us needing it. Andrey
[PATCH, committed] Fix PR 57422
Hello, As described in the PR, the ICE reason was the typo made when introducing calls to add_hard_reg_set. Fixed by the first attached patch, bootstrapped and tested on both ia64 and x86_64, committed as obvious. The test case is very sensitive to the scheduler decisions (e.g. it didn't fail on trunk but only on the revision reported for me), so instead of adding the test I have put in the code two asserts checking that we can always schedule the fence instruction as is. This hunk was tested together with the first but committed separately. The first patch can be safely committed to 4.8, the second can stay on trunk only. Jakub, will it be fine with you? Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 206172) --- gcc/ChangeLog (revision 206173) *** *** 1,3 --- 1,9 + 2013-12-23 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/57422 + * sel-sched.c (mark_unavailable_hard_regs): Fix typo when calling + add_to_hard_reg_set. + 2013-12-20 Sharad Singhai sing...@google.com * Makefile.in: Add optinfo.texi. Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 206172) --- gcc/sel-sched.c (revision 206173) *** mark_unavailable_hard_regs (def_t def, s *** 1253,1259 if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) add_to_hard_reg_set (reg_rename_p-unavailable_hard_regs, ! Pmode, HARD_FRAME_POINTER_IS_FRAME_POINTER); } #ifdef STACK_REGS --- 1253,1259 if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) add_to_hard_reg_set (reg_rename_p-unavailable_hard_regs, ! Pmode, HARD_FRAME_POINTER_REGNUM); } #ifdef STACK_REGS Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 206173) --- gcc/ChangeLog (revision 206174) *** *** 1,6 --- 1,12 2013-12-23 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/57422 + * sel-sched.c (fill_vec_av_set): Assert that the fence insn + can always be scheduled in its current form. + + 2013-12-23 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/57422 * sel-sched.c (mark_unavailable_hard_regs): Fix typo when calling add_to_hard_reg_set. Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 206173) --- gcc/sel-sched.c (revision 206174) *** fill_vec_av_set (av_set_t av, blist_t bn *** 3801,3806 --- 3801,3807 signed char target_available; bool is_orig_reg_p = true; int need_cycles, new_prio; + bool fence_insn_p = INSN_UID (insn) == INSN_UID (FENCE_INSN (fence)); /* Don't allow any insns other than from SCHED_GROUP if we have one. */ if (FENCE_SCHED_NEXT (fence) insn != FENCE_SCHED_NEXT (fence)) *** fill_vec_av_set (av_set_t av, blist_t bn *** 3855,3863 if (sched_verbose = 4) sel_print (Expr %d has no suitable target register\n, INSN_UID (insn)); ! continue; } /* Filter expressions that need to be renamed or speculated when pipelining, because compensating register copies or speculation checks are likely to be placed near the beginning of the loop, --- 3856,3871 if (sched_verbose = 4) sel_print (Expr %d has no suitable target register\n, INSN_UID (insn)); ! ! /* A fence insn should not get here. */ ! gcc_assert (!fence_insn_p); ! continue; } + /* At this point a fence insn should always be available. */ + gcc_assert (!fence_insn_p + || INSN_UID (FENCE_INSN (fence)) == INSN_UID (EXPR_INSN_RTX (expr))); + /* Filter expressions that need to be renamed or speculated when pipelining, because compensating register copies or speculation checks are likely to be placed near the beginning of the loop,
[PATCH] Fix PR 57662
Hello, As noted in Bugzilla, the problem is this PR is the rare case of removing the unconditional jump that is used for traversing CFG when simplifying control flow. We rescan the remaining successors when we know the flow below the current point has changed, and when we have removed the jump we are currently on, we segfault. The fix is just to break out of successor iterator loop as in the unconditional jump case we have already visited the only successor. Bootstrapped and tested on x86-64, ok for trunk? If yes, Alexander, could you please commit this for me? Andrey 2013-08-14 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/57662 gcc/ * sel-sched.c (code_motion_process_successors): When the current insn is removed after the recursive traversal, break from the loop. Add comments and debug printouts. testsuite/ * gcc.dg/pr57662.c: New test. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index fb9386f..a7b8897 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -6424,10 +6424,23 @@ code_motion_process_successors (insn_t insn, av_set_t orig_ops, res = b; /* We have simplified the control flow below this point. In this case, - the iterator becomes invalid. We need to try again. */ + the iterator becomes invalid. We need to try again. + If we have removed the insn itself, it could be only an + unconditional jump. Thus, do not rescan but break immediately -- + we have already visited the only successor block. */ + if (!BLOCK_FOR_INSN (insn)) + { + if (sched_verbose = 6) + sel_print (Not doing rescan: already visited the only successor + of block %d\n, old_index); + break; + } if (BLOCK_FOR_INSN (insn)-index != old_index || EDGE_COUNT (bb-succs) != old_succs) { + if (sched_verbose = 6) + sel_print (Rescan: control flow simplified below insn %d, block %d\n, + INSN_UID (insn), BLOCK_FOR_INSN (insn)-index); insn = sel_bb_end (BLOCK_FOR_INSN (insn)); goto rescan; } diff --git a/gcc/testsuite/gcc.dg/pr57662.c b/gcc/testsuite/gcc.dg/pr57662.c new file mode 100644 index 000..7af8455 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr57662.c @@ -0,0 +1,47 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fno-guess-branch-probability -fpeel-loops -freorder-blocks-and-partition -fschedule-insns2 -fsel-sched-pipelining -fselective-scheduling2 -ftree-pre } */ + +struct intC +{ + short x; + short y; +}; + +void Get(void); + +int size_x; + +struct +{ + int *depot_table; + struct intC *ti; + int size; +} dummy; + +static inline int +GetRotatedTileFromOffset (int *a, struct intC tidc) +{ + if (!*a) +Get (); + switch (*a) +{ +case 0: + return (tidc.y size_x) + tidc.x; +case 1: + return tidc.y + (dummy.size - tidc.x) * size_x; +case 2: + return tidc.x + (dummy.size - tidc.y) * size_x; +case 3: + return (dummy.size - tidc.x); +} + return 0; +} + +int +GetHangarNum (int *a, int i) +{ + while (dummy.size) +if (GetRotatedTileFromOffset (a, dummy.ti[i])) + return *dummy.depot_table; + return 0; +}
[PATCH] Fix PRs 56957 and 57105
Hello, After Steven's changes tightening the add/remove insn interface, we found a problem in the selective scheduling when we incorrectly determined whether an insn should be removed (and a fresh one emitted) or just moved to a scheduling point. Fixed by just comparing INSN_UIDs of the insn below and the chosen expr above and abandoning other too clever code for checking whether an insn was changed or extra insns were emitted. Some extra cleanups are possible after this patch, but I'd wait a bit to see if there is any fallout first. Bootstrapped and tested on ia64 and x86-64, committed after offline approval from Alexander. Andrey 2013-04-30 Andrey Belevantsev a...@ispras.ru gcc: PR rtl-optimization/56957 PR rtl-optimization/57105 * sel-sched.c (move_op_orig_expr_found): Remove insn_emitted variable. Use just INSN_UID for determining whether an insn should be only disconnected from the insn stream. * sel-sched-ir.h (EXPR_WAS_CHANGED): Remove. gcc/testsuite: PR rtl-optimization/57105 * gcc.dg/pr57105.c: New test. Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 198448) --- gcc/ChangeLog (revision 198449) *** *** 1,3 --- 1,13 + 2013-04-30 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/56957 + PR rtl-optimization/57105 + + * sel-sched.c (move_op_orig_expr_found): Remove insn_emitted + variable. Use just INSN_UID for determining whether an insn + should be only disconnected from the insn stream. + * sel-sched-ir.h (EXPR_WAS_CHANGED): Remove. + 2013-04-30 Jakub Jelinek ja...@redhat.com PR tree-optimization/57104 Index: gcc/testsuite/gcc.dg/pr57105.c === *** gcc/testsuite/gcc.dg/pr57105.c (revision 0) --- gcc/testsuite/gcc.dg/pr57105.c (revision 198449) *** *** 0 --- 1,18 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options -Os -fselective-scheduling2 -g } */ + int bar (int); + int *baz (int *); + + void + foo (int a) + { + while (bar (0)) + { + int *c = baz (0); + if (a) + { + int i = *baz (c); + } + bar (*baz (c)); + } + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 198448) --- gcc/testsuite/ChangeLog (revision 198449) *** *** 1,3 --- 1,8 + 2013-04-30 Andrey Belevantsev a...@ispras.ru + + PR rtl-optimization/57105 + * gcc.dg/pr57105.c: New test. + 2013-04-30 Jakub Jelinek ja...@redhat.com PR tree-optimization/57104 Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 198448) --- gcc/sel-sched.c (revision 198449) *** move_op_orig_expr_found (insn_t insn, ex *** 6051,6064 cmpd_local_params_p lparams ATTRIBUTE_UNUSED, void *static_params) { ! bool only_disconnect, insn_emitted; moveop_static_params_p params = (moveop_static_params_p) static_params; copy_expr_onside (params-c_expr, INSN_EXPR (insn)); track_scheduled_insns_and_blocks (insn); ! insn_emitted = handle_emitting_transformations (insn, expr, params); ! only_disconnect = (params-uid == INSN_UID (insn) ! ! insn_emitted ! EXPR_WAS_CHANGED (expr)); /* Mark that we've disconnected an insn. */ if (only_disconnect) --- 6051,6063 cmpd_local_params_p lparams ATTRIBUTE_UNUSED, void *static_params) { ! bool only_disconnect; moveop_static_params_p params = (moveop_static_params_p) static_params; copy_expr_onside (params-c_expr, INSN_EXPR (insn)); track_scheduled_insns_and_blocks (insn); ! handle_emitting_transformations (insn, expr, params); ! only_disconnect = params-uid == INSN_UID (insn); /* Mark that we've disconnected an insn. */ if (only_disconnect) Index: gcc/sel-sched-ir.h === *** gcc/sel-sched-ir.h (revision 198448) --- gcc/sel-sched-ir.h (revision 198449) *** typedef expr_def *expr_t; *** 191,198 #define EXPR_WAS_RENAMED(EXPR) ((EXPR)-was_renamed) #define EXPR_CANT_MOVE(EXPR) ((EXPR)-cant_move) - #define EXPR_WAS_CHANGED(EXPR) (EXPR_HISTORY_OF_CHANGES (EXPR).length () 0) - /* Insn definition for list of original insns in find_used_regs. */ struct _def { --- 191,196
Re: Fix PR 56077
On 01.04.2013 12:38, Andrey Belevantsev wrote: On 22.02.2013 17:30, Andrey Belevantsev wrote: Hello, As found by Jakub and explained in the PR audit trail by Alexander, this patch fixes the selective scheduler merge glitch of 2008 that added the unnecessary JUMP_P check to the flush_pending_lists call. I have removed the check and expanded the binary negation for clarity. The patch was tested on x86-64, ia64, and ppc64 to be safe. The patch should be conservatively safe at this stage as it adds more flushes and thus more dependencies to the scheduler. The original test is fixed, but I don't know how to add the test checking assembly insns order to the testsuite. OK for trunk? Andrey 2012-02-22 Alexander Monakov amona...@ispras.ru Andrey Belevantsev a...@ispras.ru PR middle-end/56077 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier, flush pending lists also on non-jumps. Now backported to 4.7 and 4.6. Leonid has reported that this patch breaks the kernel build on mipsel-linux. I have quickly analyzed the issue. The patch removes the JUMP_P check when flushing pending lists at the reg_pending_barrier, restoring the 2008 year behavior. But at that time we didn't have debug insns. Now with the removed check we flush pending lists also on debug insns, which results in freeing pending_read_mems but not in freeing pending_read_insns, as in add_dependence_list_and_free we explicitly forbid freeing the dependency list for debug insns (that was for fixing PR 44013). This is inconsistent, and when proceeding with analysis we immediately hit 2451 t = canon_rtx (t); 2452 pending = deps-pending_read_insns; 2453 pending_mem = deps-pending_read_mems; 2454 while (pending) 2455 { 2456 if (read_dependence (XEXP (pending_mem, 0), t) and the line 2456 segfaults because pending is not NULL, but pending mem is NULL. I am testing the revert of this backport for 4.6 and will commit it in about an hour or so. However, I am surprised we don't hit this either on 4.7, 4.8 or trunk. Some flush_pending_lists calls are protected from debug insns as they check CALL_P or JUMP_P, but not all of them. It looks like flush_pending_lists should not be called on debug insns at all. And indeed, the attached patch fixes Leonid's test case. Jakub, you don't happen to remember any changes in this area that could hide the problem for 4.7 and later? Andrey Index: gcc/sched-deps.c === *** gcc/sched-deps.c (revision 197492) --- gcc/sched-deps.c (working copy) *** sched_analyze_insn (struct deps_desc *de *** 3044,3050 /* Don't flush pending lists on speculative checks for selective scheduling. */ ! if (!sel_sched_p () || !sel_insn_is_speculation_check (insn)) flush_pending_lists (deps, insn, true, true); if (!deps-readonly) --- 3044,3050 /* Don't flush pending lists on speculative checks for selective scheduling. */ ! if (NONDEBUG_INSN_P (insn) (!sel_sched_p () || !sel_insn_is_speculation_check (insn))) flush_pending_lists (deps, insn, true, true); if (!deps-readonly)
Re: Fix PR 56077
On 05.04.2013 14:10, Olivier Hainque wrote: On Apr 5, 2013, at 10:13 , Eric Botcazou ebotca...@adacore.com wrote: We do have regressions on the 4.7 branch in the scheduler (CCed Olivier who has more information). Right: we do see a SEGV while compiling the attached monitor.i (preprocessed output from a qemu tree) with -O2 -g. ./cc1 -m32 -O2 -g -quiet monitor.i .../monitor.c: In function ‘memory_dump’: .../monitor.c:1109:1: internal compiler error: Segmentation fault As already mentioned upthread, this is triggered by a call to flush_pending_lists with a DEBUG_INSN. We get into: if (for_write) { add_dependence_list_and_free (deps, insn, deps-pending_read_insns, 1, REG_DEP_ANTI); if (!deps-readonly) { free_EXPR_LIST_list (deps-pending_read_mems); deps-pending_read_list_length = 0; } } add_dependence_list_and_free doesn't free *LISTP when operating on DEBUG_INSNs, so we end up with pending_read_mems freed together with pending_read_insns not freed. This was cured on mainline by: Author: mkuvyrkov Date: Mon Aug 27 22:11:48 2012 + * sched-deps.c (add_dependence_list_and_free): Simplify. (flush_pending_list_and_free): Fix a hack that was fixing a hack. Free lists when add_dependence_list_and_free doesn't free them. (svn+ssh://gcc.gnu.org/svn/gcc/trunk@190733) http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01625.html I don't know whether backporting this would be better than reverting the offending change as just done on 4.7. I'd say for 4.6 the best way is to revert. PR 56077 is not that important, and this 4.6 release will be the last one. For 4.7, we can additionally backport Maxim's patch or revert this one. I'm fine with both options, but I'll test 4.7 backport too just to be ready for that. Andrey
Re: [PATCH, committed] Fix PR 45472
On 27.02.2013 13:03, Andrey Belevantsev wrote: Hello, For this volatile-related issue (no volatile bits on volatile fields of a non-volatile struct) AFAIU there is an agreement of fixing the front-ends if needed, but anyways the patch to the selective scheduler is required that properly merges expressions so that the may_trap_p bit is preserved. So the following patch was successfully tested on ia64 and x86-64, approved by Alexander offline, committed to trunk. The patch needs backport to other branches in about two weeks. Andrey PR middle-end/45472 gcc/ * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr when the may_trap_p bit of the exprs being merged differs. Reorder tests for speculativeness in the logical and operator. testsuite/ * gcc.dg/45472.c: New test. Now backported to 4.7 and 4.6 with Jakub's patch for the sel-sched-ir.c memory leak added. Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 197298) --- gcc/ChangeLog (revision 197299) *** *** 1,6 --- 1,25 2013-04-01 Andrey Belevantsev a...@ispras.ru Backport from mainline + 2013-02-27 Andrey Belevantsev a...@ispras.ru + + PR middle-end/45472 + + * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr + when the may_trap_p bit of the exprs being merged differs. + Reorder tests for speculativeness in the logical and operator. + + Backport from mainline +2013-03-05 Jakub Jelinek ja...@redhat.com + + PR middle-end/56461 + * sel-sched-ir.c (free_sched_pools): Release + succs_info_pool.stack[succs_info_pool.max_top] vectors too + if succs_info_pool.max_top isn't -1. + + 2013-04-01 Andrey Belevantsev a...@ispras.ru + + Backport from mainline 2012-02-19 Andrey Belevantsev a...@ispras.ru PR middle-end/55889 Index: gcc/testsuite/gcc.dg/pr45472.c === *** gcc/testsuite/gcc.dg/pr45472.c (revision 0) --- gcc/testsuite/gcc.dg/pr45472.c (revision 197299) *** *** 0 --- 1,63 + /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ + /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */ + + struct S + { + volatile long vl; + int i; + }; + struct S s1, s2; + + void + foo (int j, int c) + { + int i; + for (i = 0; i = j; i++) + { + if (c) + s2.vl += s1.vl; + s1 = s2; + } + } + /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ + /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */ + + struct S + { + volatile long vl; + int i; + }; + struct S s1, s2; + + void + foo (int j, int c) + { + int i; + for (i = 0; i = j; i++) + { + if (c) + s2.vl += s1.vl; + s1 = s2; + } + } + /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ + /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */ + + struct S + { + volatile long vl; + int i; + }; + struct S s1, s2; + + void + foo (int j, int c) + { + int i; + for (i = 0; i = j; i++) + { + if (c) + s2.vl += s1.vl; + s1 = s2; + } + } Index: gcc/testsuite/ChangeLog === *** gcc/testsuite/ChangeLog (revision 197298) --- gcc/testsuite/ChangeLog (revision 197299) *** *** 1,3 --- 1,11 + 2013-04-01 Andrey Belevantsev a...@ispras.ru + + Backport from mainline + 2013-02-27 Andrey Belevantsev a...@ispras.ru + + PR middle-end/45472 + * gcc.dg/pr45472.c: New test. + 2013-03-26 Richard Biener rguent...@suse.de Backport from mainline Index: gcc/sel-sched-ir.c === *** gcc/sel-sched-ir.c (revision 197298) --- gcc/sel-sched-ir.c (revision 197299) *** merge_expr (expr_t to, expr_t from, insn *** 1862,1869 /* Make sure that speculative pattern is propagated into exprs that have non-speculative one. This will provide us with consistent speculative bits and speculative patterns inside expr. */ ! if (EXPR_SPEC_DONE_DS (to) == 0 !EXPR_SPEC_DONE_DS (from) != 0) change_vinsn_in_expr (to, EXPR_VINSN (from)); merge_expr_data (to, from, split_point); --- 1862,1873 /* Make sure that speculative pattern is propagated into exprs that have non-speculative one. This will provide us with consistent speculative bits and speculative patterns inside expr. */ ! if ((EXPR_SPEC_DONE_DS (from) != 0 ! EXPR_SPEC_DONE_DS (to) == 0) ! /* Do likewise for volatile insns, so that we always retain !the may_trap_p bit on the resulting expression
Re: Fix PR 56077
On 22.02.2013 17:30, Andrey Belevantsev wrote: Hello, As found by Jakub and explained in the PR audit trail by Alexander, this patch fixes the selective scheduler merge glitch of 2008 that added the unnecessary JUMP_P check to the flush_pending_lists call. I have removed the check and expanded the binary negation for clarity. The patch was tested on x86-64, ia64, and ppc64 to be safe. The patch should be conservatively safe at this stage as it adds more flushes and thus more dependencies to the scheduler. The original test is fixed, but I don't know how to add the test checking assembly insns order to the testsuite. OK for trunk? Andrey 2012-02-22 Alexander Monakov amona...@ispras.ru Andrey Belevantsev a...@ispras.ru PR middle-end/56077 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier, flush pending lists also on non-jumps. Now backported to 4.7 and 4.6. Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 197296) --- gcc/ChangeLog (revision 197297) *** *** 1,3 --- 1,13 + 2013-04-01 Andrey Belevantsev a...@ispras.ru + + Backport from mainline + 2013-02-25 Andrey Belevantsev a...@ispras.ru + Alexander Monakov amona...@ispras.ru + + PR middle-end/56077 + * sched-deps.c (sched_analyze_insn): When reg_pending_barrier, + flush pending lists also on non-jumps. Adjust comment. + 2013-03-30 Gerald Pfeifer ger...@pfeifer.com * doc/invoke.texi (AVR Options): Tweak link for AVR-LibC user manual. Index: gcc/sched-deps.c === *** gcc/sched-deps.c(revision 197296) --- gcc/sched-deps.c(revision 197297) *** sched_analyze_insn (struct deps_desc *de *** 3262,3270 SET_REGNO_REG_SET (deps-reg_last_in_use, i); } ! /* Flush pending lists on jumps, but not on speculative checks. */ ! if (JUMP_P (insn) !(sel_sched_p () ! sel_insn_is_speculation_check (insn))) flush_pending_lists (deps, insn, true, true); reg_pending_barrier = NOT_A_BARRIER; --- 3262,3270 SET_REGNO_REG_SET (deps-reg_last_in_use, i); } ! /* Don't flush pending lists on speculative checks for !selective scheduling. */ ! if (!sel_sched_p () || !sel_insn_is_speculation_check (insn)) flush_pending_lists (deps, insn, true, true); reg_pending_barrier = NOT_A_BARRIER;
Re: [PATCH] Fix PR 55889
On 19.02.2013 17:13, Andrey Belevantsev wrote: Hello, As we discussed in the PR, the problem here is that the selective scheduler does not expect that renaming a hard register to a pseudo would result in extra dependencies. The dependencies come from implicit clobbers code of sched-deps.c, so I've made a minimal patch that checks only for that case using the same function from IRA and avoids it. The patch fixes the test case on AIX as reported by David and also bootstraps and tests fine on x86-64 and ia64, ok for trunk? The second patch is restoring debug printing in the scheduler that was accidentally broken, possibly by Steven's cleanups (now a pattern is expected where an insn was previously). I will commit is as obvious separately. Andrey 2012-02-19 Andrey Belevantsev a...@ispras.ru PR middle-end/55889 * sel-sched.c: Include ira.h. (implicit_clobber_conflict_p): New function. (moveup_expr): Use it. * Makefile.in (sel-sched.o): Depend on ira.h. Now backported this to 4.7 and 4.6. Andrey Index: gcc/ChangeLog === *** gcc/ChangeLog (revision 197297) --- gcc/ChangeLog (revision 197298) *** *** 1,6 --- 1,18 2013-04-01 Andrey Belevantsev a...@ispras.ru Backport from mainline + 2012-02-19 Andrey Belevantsev a...@ispras.ru + + PR middle-end/55889 + + * sel-sched.c: Include ira.h. + (implicit_clobber_conflict_p): New function. + (moveup_expr): Use it. + * Makefile.in (sel-sched.o): Depend on ira.h. + + 2013-04-01 Andrey Belevantsev a...@ispras.ru + + Backport from mainline 2013-02-25 Andrey Belevantsev a...@ispras.ru Alexander Monakov amona...@ispras.ru Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 197297) --- gcc/sel-sched.c (revision 197298) *** along with GCC; see the file COPYING3. *** 45,50 --- 45,51 #include rtlhooks-def.h #include output.h #include emit-rtl.h + #include ira.h #ifdef INSN_SCHEDULING #include sel-sched-ir.h *** moving_insn_creates_bookkeeping_block_p *** 2113,2118 --- 2114,2174 return TRUE; } + /* Return true when the conflict with newly created implicit clobbers +between EXPR and THROUGH_INSN is found because of renaming. */ + static bool + implicit_clobber_conflict_p (insn_t through_insn, expr_t expr) + { + HARD_REG_SET temp; + rtx insn, reg, rhs, pat; + hard_reg_set_iterator hrsi; + unsigned regno; + bool valid; + + /* Make a new pseudo register. */ + reg = gen_reg_rtx (GET_MODE (EXPR_LHS (expr))); + max_regno = max_reg_num (); + maybe_extend_reg_info_p (); + + /* Validate a change and bail out early. */ + insn = EXPR_INSN_RTX (expr); + validate_change (insn, SET_DEST (PATTERN (insn)), reg, true); + valid = verify_changes (0); + cancel_changes (0); + if (!valid) + { + if (sched_verbose = 6) + sel_print (implicit clobbers failed validation, ); + return true; + } + + /* Make a new insn with it. */ + rhs = copy_rtx (VINSN_RHS (EXPR_VINSN (expr))); + pat = gen_rtx_SET (VOIDmode, reg, rhs); + start_sequence (); + insn = emit_insn (pat); + end_sequence (); + + /* Calculate implicit clobbers. */ + extract_insn (insn); + preprocess_constraints (); + ira_implicitly_set_insn_hard_regs (temp); + AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); + + /* If any implicit clobber registers intersect with regular ones in + through_insn, we have a dependency and thus bail out. */ + EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi) + { + vinsn_t vi = INSN_VINSN (through_insn); + if (bitmap_bit_p (VINSN_REG_SETS (vi), regno) + || bitmap_bit_p (VINSN_REG_CLOBBERS (vi), regno) + || bitmap_bit_p (VINSN_REG_USES (vi), regno)) + return true; + } + + return false; + } + /* Modifies EXPR so it can be moved through the THROUGH_INSN, performing necessary transformations. Record the type of transformation made in PTRANS_TYPE, when it is not NULL. When INSIDE_INSN_GROUP, *** moveup_expr (expr_t expr, insn_t through *** 2245,2250 --- 2301,2317 if (!enable_schedule_as_rhs_p || !EXPR_SEPARABLE_P (expr)) return MOVEUP_EXPR_NULL; + /* When renaming a hard register to a pseudo before reload, extra +dependencies can occur from the implicit clobbers of the insn. +Filter out such cases here. */ + if (!reload_completed REG_P (EXPR_LHS (expr)) + HARD_REGISTER_P (EXPR_LHS (expr)) + implicit_clobber_conflict_p (through_insn, expr)) + { + if (sched_verbose = 6) + sel_print (implicit clobbers conflict detected, ); + return MOVEUP_EXPR_NULL
Re: [PATCH] Fix sel-sched memory leak (PR middle-end/56461)
Hi Jakub, On 2013-03-05 01:20, Jakub Jelinek wrote: Hi! alloc_succs_info creates vectors up to and including succs_info_pool.max_top, so without the following fix we leak the last set of 3 vectors. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? This is fine, thanks a lot. I can take care of backporting this to the older branches with the other sel-sched patches. Andrey 2013-03-04 Jakub Jelinek ja...@redhat.com PR middle-end/56461 * sel-sched-ir.c (free_sched_pools): Release succs_info_pool.stack[succs_info_pool.max_top] vectors too if succs_info_pool.max_top isn't -1. --- gcc/sel-sched-ir.c.jj 2013-02-27 14:59:51.0 +0100 +++ gcc/sel-sched-ir.c 2013-03-04 17:47:34.705686637 +0100 @@ -5020,7 +5020,7 @@ free_sched_pools (void) free_alloc_pool (sched_lists_pool); gcc_assert (succs_info_pool.top == -1); - for (i = 0; i succs_info_pool.max_top; i++) + for (i = 0; i = succs_info_pool.max_top; i++) { succs_info_pool.stack[i].succs_ok.release (); succs_info_pool.stack[i].succs_other.release (); Jakub
[PATCH, committed] Fix PR 45472
Hello, For this volatile-related issue (no volatile bits on volatile fields of a non-volatile struct) AFAIU there is an agreement of fixing the front-ends if needed, but anyways the patch to the selective scheduler is required that properly merges expressions so that the may_trap_p bit is preserved. So the following patch was successfully tested on ia64 and x86-64, approved by Alexander offline, committed to trunk. The patch needs backport to other branches in about two weeks. Andrey PR middle-end/45472 gcc/ * sel-sched-ir.c (merge_expr): Also change vinsn of merged expr when the may_trap_p bit of the exprs being merged differs. Reorder tests for speculativeness in the logical and operator. testsuite/ * gcc.dg/45472.c: New test. Index: gcc/testsuite/gcc.dg/pr45472.c === *** gcc/testsuite/gcc.dg/pr45472.c (revision 0) --- gcc/testsuite/gcc.dg/pr45472.c (revision 196308) *** *** 0 --- 1,21 + /* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ + /* { dg-options -O -fschedule-insns2 -fselective-scheduling2 } */ + + struct S + { + volatile long vl; + int i; + }; + struct S s1, s2; + + void + foo (int j, int c) + { + int i; + for (i = 0; i = j; i++) + { + if (c) + s2.vl += s1.vl; + s1 = s2; + } + } Index: gcc/sel-sched-ir.c === *** gcc/sel-sched-ir.c (revision 196307) --- gcc/sel-sched-ir.c (revision 196308) *** merge_expr (expr_t to, expr_t from, insn *** 1866,1873 /* Make sure that speculative pattern is propagated into exprs that have non-speculative one. This will provide us with consistent speculative bits and speculative patterns inside expr. */ ! if (EXPR_SPEC_DONE_DS (to) == 0 !EXPR_SPEC_DONE_DS (from) != 0) change_vinsn_in_expr (to, EXPR_VINSN (from)); merge_expr_data (to, from, split_point); --- 1866,1877 /* Make sure that speculative pattern is propagated into exprs that have non-speculative one. This will provide us with consistent speculative bits and speculative patterns inside expr. */ ! if ((EXPR_SPEC_DONE_DS (from) != 0 ! EXPR_SPEC_DONE_DS (to) == 0) ! /* Do likewise for volatile insns, so that we always retain ! the may_trap_p bit on the resulting expression. */ ! || (VINSN_MAY_TRAP_P (EXPR_VINSN (from)) ! !VINSN_MAY_TRAP_P (EXPR_VINSN (to change_vinsn_in_expr (to, EXPR_VINSN (from)); merge_expr_data (to, from, split_point);
Fix PR 56077
Hello, As found by Jakub and explained in the PR audit trail by Alexander, this patch fixes the selective scheduler merge glitch of 2008 that added the unnecessary JUMP_P check to the flush_pending_lists call. I have removed the check and expanded the binary negation for clarity. The patch was tested on x86-64, ia64, and ppc64 to be safe. The patch should be conservatively safe at this stage as it adds more flushes and thus more dependencies to the scheduler. The original test is fixed, but I don't know how to add the test checking assembly insns order to the testsuite. OK for trunk? Andrey 2012-02-22 Alexander Monakov amona...@ispras.ru Andrey Belevantsev a...@ispras.ru PR middle-end/56077 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier, flush pending lists also on non-jumps. Index: gcc/sched-deps.c === *** gcc/sched-deps.c (revision 196136) --- gcc/sched-deps.c (working copy) *** sched_analyze_insn (struct deps_desc *de *** 3318,3325 } /* Flush pending lists on jumps, but not on speculative checks. */ ! if (JUMP_P (insn) !(sel_sched_p () ! sel_insn_is_speculation_check (insn))) flush_pending_lists (deps, insn, true, true); reg_pending_barrier = NOT_A_BARRIER; --- 3318,3324 } /* Flush pending lists on jumps, but not on speculative checks. */ ! if (!sel_sched_p () || !sel_insn_is_speculation_check (insn)) flush_pending_lists (deps, insn, true, true); reg_pending_barrier = NOT_A_BARRIER;
[PATCH] Fix PR 55889
Hello, As we discussed in the PR, the problem here is that the selective scheduler does not expect that renaming a hard register to a pseudo would result in extra dependencies. The dependencies come from implicit clobbers code of sched-deps.c, so I've made a minimal patch that checks only for that case using the same function from IRA and avoids it. The patch fixes the test case on AIX as reported by David and also bootstraps and tests fine on x86-64 and ia64, ok for trunk? The second patch is restoring debug printing in the scheduler that was accidentally broken, possibly by Steven's cleanups (now a pattern is expected where an insn was previously). I will commit is as obvious separately. Andrey 2012-02-19 Andrey Belevantsev a...@ispras.ru PR middle-end/55889 * sel-sched.c: Include ira.h. (implicit_clobber_conflict_p): New function. (moveup_expr): Use it. * Makefile.in (sel-sched.o): Depend on ira.h. And the second patch is: 2012-02-19 Andrey Belevantsev a...@ispras.ru * sel-sched-dump.c (dump_insn_rtx_flags): Explicitly set DUMP_INSN_RTX_UID. (dump_insn_rtx_1): Pass PATTERN (insn) to str_pattern_slim. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index f3bb168..557ab08 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3338,7 +3338,7 @@ sel-sched.o : sel-sched.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(FUNCTION_H) $(INSN_ATTR_H) $(RECOG_H) $(EXCEPT_H) $(PARAMS_H) \ $(TM_P_H) output.h $(TARGET_H) $(TREE_PASS_H) \ $(SCHED_INT_H) $(GGC_H) $(TREE_H) langhooks.h rtlhooks-def.h \ - $(SEL_SCHED_IR_H) $(SEL_SCHED_DUMP_H) sel-sched.h $(DBGCNT_H) $(EMIT_RTL_H) + $(SEL_SCHED_IR_H) $(SEL_SCHED_DUMP_H) sel-sched.h $(DBGCNT_H) $(EMIT_RTL_H) ira.h sel-sched-dump.o : sel-sched-dump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \ $(FUNCTION_H) $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) $(PARAMS_H) \ diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 6f60d70..2944489 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include langhooks.h #include rtlhooks-def.h #include emit-rtl.h +#include ira.h #ifdef INSN_SCHEDULING #include sel-sched-ir.h @@ -2101,6 +2102,61 @@ moving_insn_creates_bookkeeping_block_p (insn_t insn, return TRUE; } +/* Return true when the conflict with newly created implicit clobbers + between EXPR and THROUGH_INSN is found because of renaming. */ +static bool +implicit_clobber_conflict_p (insn_t through_insn, expr_t expr) +{ + HARD_REG_SET temp; + rtx insn, reg, rhs, pat; + hard_reg_set_iterator hrsi; + unsigned regno; + bool valid; + + /* Make a new pseudo register. */ + reg = gen_reg_rtx (GET_MODE (EXPR_LHS (expr))); + max_regno = max_reg_num (); + maybe_extend_reg_info_p (); + + /* Validate a change and bail out early. */ + insn = EXPR_INSN_RTX (expr); + validate_change (insn, SET_DEST (PATTERN (insn)), reg, 1); + valid = verify_changes (0); + cancel_changes (0); + if (!valid) +{ + if (sched_verbose = 6) + sel_print (implicit clobbers failed validation, ); + return true; +} + + /* Make a new insn with it. */ + rhs = copy_rtx (VINSN_RHS (EXPR_VINSN (expr))); + pat = gen_rtx_SET (VOIDmode, reg, rhs); + start_sequence (); + insn = emit_insn (pat); + end_sequence (); + + /* Calculate implicit clobbers. */ + extract_insn (insn); + preprocess_constraints (); + ira_implicitly_set_insn_hard_regs (temp); + AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs); + + /* If any implicit clobber registers intersect with regular ones in + through_insn, we have a dependency and thus bail out. */ + EXECUTE_IF_SET_IN_HARD_REG_SET (temp, 0, regno, hrsi) +{ + vinsn_t vi = INSN_VINSN (through_insn); + if (bitmap_bit_p (VINSN_REG_SETS (vi), regno) + || bitmap_bit_p (VINSN_REG_CLOBBERS (vi), regno) + || bitmap_bit_p (VINSN_REG_USES (vi), regno)) + return true; +} + + return false; +} + /* Modifies EXPR so it can be moved through the THROUGH_INSN, performing necessary transformations. Record the type of transformation made in PTRANS_TYPE, when it is not NULL. When INSIDE_INSN_GROUP, @@ -2233,6 +2289,17 @@ moveup_expr (expr_t expr, insn_t through_insn, bool inside_insn_group, if (!enable_schedule_as_rhs_p || !EXPR_SEPARABLE_P (expr)) return MOVEUP_EXPR_NULL; + /* When renaming a hard register to a pseudo before reload, extra + dependencies can occur from the implicit clobbers of the insn. + Filter out such cases here. */ + if (!reload_completed REG_P (EXPR_LHS (expr)) + HARD_REGISTER_P (EXPR_LHS (expr)) + implicit_clobber_conflict_p (through_insn, expr)) + { + if (sched_verbose = 6) + sel_print (implicit clobbers conflict detected, ); + return MOVEUP_EXPR_NULL; + } EXPR_TARGET_AVAILABLE (expr
Re: [PATCH] PR 54472
On 30.10.2012 12:09, Andrey Belevantsev wrote: Hello, This PR is due to the selective scheduling missing the dependencies with implicit_sets. From the sched-deps.c code it looks like implicit sets generate anti dependencies with either sets, uses or clobbers, so that's that I've done with the below patch. Vlad, as it looks you've added implicit sets, does the above conclusion look correct? I will commit the patch then after bootstrapping and testing will complete. Now committed after offline discussion with Vlad and Alexander. Andrey
[PATCH] PR 54472
Hello, This PR is due to the selective scheduling missing the dependencies with implicit_sets. From the sched-deps.c code it looks like implicit sets generate anti dependencies with either sets, uses or clobbers, so that's that I've done with the below patch. Vlad, as it looks you've added implicit sets, does the above conclusion look correct? I will commit the patch then after bootstrapping and testing will complete. Yours, Andrey 2012-10-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/54472 * sel-sched-ir.c (has_dependence_note_reg_set): Handle implicit sets. (has_dependence_note_reg_clobber, has_dependence_note_reg_use): Likewise. 2012-10-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/54472 * gcc.dg/pr54472.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 2a7a170..220568a 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3185,7 +3185,7 @@ has_dependence_note_reg_set (int regno) || reg_last-clobbers != NULL) *dsp = (*dsp ~SPECULATIVE) | DEP_OUTPUT; - if (reg_last-uses) + if (reg_last-uses || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; } } @@ -3205,7 +3205,7 @@ has_dependence_note_reg_clobber (int regno) if (reg_last-sets) *dsp = (*dsp ~SPECULATIVE) | DEP_OUTPUT; - if (reg_last-uses) + if (reg_last-uses || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; } } @@ -3225,7 +3225,7 @@ has_dependence_note_reg_use (int regno) if (reg_last-sets) *dsp = (*dsp ~SPECULATIVE) | DEP_TRUE; - if (reg_last-clobbers) + if (reg_last-clobbers || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; /* Merge BE_IN_SPEC bits into *DSP when the dependency producer diff --git a/gcc/testsuite/gcc.dg/pr54472.c b/gcc/testsuite/gcc.dg/pr54472.c new file mode 100644 index 000..9395203 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr54472.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fschedule-insns -fselective-scheduling } */ + +int main () +{ + int a[3][3][3]; + __builtin_memset (a, 0, sizeof a); + return 0; +}
Re: Fix PR 53701
On 16.10.2012 11:50, Andrey Belevantsev wrote: The below is the port of this patch to 4.7, took longer than expected but still. Will commit after retesting on x86-64 (testing on ia64 is already fine) and with the fix for PR 53975. Now the same patch is also committed to 4.6 after more wait and testing. Andrey
Re: [PATCH] PR 53975
Hello, On 31.07.2012 15:08, Andrey Belevantsev wrote: Hello, This PR is about wrong speculation of an insn that doesn't support storing NaT bits done by the selective scheduler (more details in the PR audit trail). The reason for this is the wrong one-liner patch committed last year, the fix is to revert that patch and to clarify the comment before the patched code. Bootstrapped and tested on ia64, approved by Alexander offline. No test as I don't know how to check whether an insn got moved through another insn. Finally I've got to porting this to 4.7. The patch was successfully tested on ia64, I'll commit this after double checking on x86-64. The patch should be committed with the fix for PR 53701, which I will do right after this one. Andrey 2012-10-16 Andrey Belevantsev a...@ispras.ru Backport from mainline 2012-07-31 Andrey Belevantsev a...@ispras.ru PR target/53975 * sel-sched-ir.c (has_dependence_note_reg_use): Clarify comment. Revert 2011-08-04 Sergey Grechanik mouseent...@ispras.ru * sel-sched-ir.c (has_dependence_note_reg_use): Call ds_full_merge only if producer writes to the register given by regno. Index: gcc/sel-sched-ir.c === *** gcc/sel-sched-ir.c (revision 192488) --- gcc/sel-sched-ir.c (working copy) *** has_dependence_note_reg_use (int regno) *** 3224,3230 if (reg_last-clobbers) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; ! /* Handle BE_IN_SPEC. */ if (reg_last-uses) { ds_t pro_spec_checked_ds; --- 3224,3234 if (reg_last-clobbers) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; ! /* Merge BE_IN_SPEC bits into *DSP when the dependency producer !is actually a check insn. We need to do this for any register !read-read dependency with the check unless we track properly !all registers written by BE_IN_SPEC-speculated insns, as !we don't have explicit dependence lists. See PR 53975. */ if (reg_last-uses) { ds_t pro_spec_checked_ds; *** has_dependence_note_reg_use (int regno) *** 3232,3240 pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro); pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds); ! if (pro_spec_checked_ds != 0 ! bitmap_bit_p (INSN_REG_SETS (has_dependence_data.pro), regno)) ! /* Merge BE_IN_SPEC bits into *DSP. */ *dsp = ds_full_merge (*dsp, pro_spec_checked_ds, NULL_RTX, NULL_RTX); } --- 3236,3242 pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro); pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds); ! if (pro_spec_checked_ds != 0) *dsp = ds_full_merge (*dsp, pro_spec_checked_ds, NULL_RTX, NULL_RTX); }
Re: Fix PR 53701
Hello, On 09.08.2012 17:19, Alexander Monakov wrote: On Thu, 9 Aug 2012, Andrey Belevantsev wrote: Hello, The problem in question is uncovered by the recent speculation patch, it is in the handling of expressions blocked by bookkeeping. Those are expressions that become unavailable due to the newly created bookkeeping copies. In the original algorithm the supported insns and transformations cannot lead to this result, but when handling non-separable insns or creating speculative checks that unpredictably block certain insns the situation can arise. We just filter out all such expressions from the final availability set for correctness. The PR happens because the expression being filtered out can be transformed while being moved up, thus we need to look up not only its exact pattern but also all its previous forms saved in its history of changes. The patch does exactly that, I also clarified the comments w.r.t. this situation. Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too. OK for trunk? Also need to backport this to 4.7 with PR 53975, say on the next week. This is OK. The below is the port of this patch to 4.7, took longer than expected but still. Will commit after retesting on x86-64 (testing on ia64 is already fine) and with the fix for PR 53975. Andrey 2012-10-16 Andrey Belevantsev a...@ispras.ru Backport from mainline 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment. Process not only expr's vinsns but all old vinsns from expr's history of changes. (update_and_record_unavailable_insns): Clarify comment. testsuite: 2012-10-16 Andrey Belevantsev a...@ispras.ru Backport from mainline 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * gcc.dg/pr53701.c: New test. Index: gcc/testsuite/gcc.dg/pr53701.c === *** gcc/testsuite/gcc.dg/pr53701.c (revision 0) --- gcc/testsuite/gcc.dg/pr53701.c (revision 0) *** *** 0 --- 1,59 + /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ + /* { dg-options -O3 -fselective-scheduling2 -fsel-sched-pipelining } */ + typedef unsigned short int uint16_t; + typedef unsigned long int uintptr_t; + typedef struct GFX_VTABLE + { + int color_depth; + unsigned char *line[]; + } + BITMAP; + extern int _drawing_mode; + extern BITMAP *_drawing_pattern; + extern int _drawing_y_anchor; + extern unsigned int _drawing_x_mask; + extern unsigned int _drawing_y_mask; + extern uintptr_t bmp_write_line (BITMAP *, int); + void + _linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color) + { + int w; + if (_drawing_mode == 0) + { + int x, curw; + unsigned short *sline = + (unsigned short *) (_drawing_pattern- + line[((dy) - + _drawing_y_anchor) _drawing_y_mask]); + unsigned short *s; + unsigned short *d = + ((unsigned short *) (bmp_write_line (dst, dy)) + (dx1)); + s = ((unsigned short *) (sline) + (x)); + if (_drawing_mode == 2) + { + } + else if (_drawing_mode == 3) + { + do + { + w -= curw; + do + { + unsigned long c = (*(s)); + if (!((unsigned long) (c) == 0x7C1F)) + { + (*((uint16_t *) ((uintptr_t) (d))) = ((color))); + } + ((s)++); + } + while (--curw 0); + s = sline; + curw = + (((w) + ((int) _drawing_x_mask + + 1)) ? (w) : ((int) _drawing_x_mask + 1)); + } + while (curw 0); + } + } + } Index: gcc/sel-sched.c === *** gcc/sel-sched.c (revision 192488) --- gcc/sel-sched.c (working copy) *** process_use_exprs (av_set_t *av_ptr) *** 3567,3595 return NULL; } ! /* Lookup EXPR in VINSN_VEC and return TRUE if found. */ static bool vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr) { ! vinsn_t vinsn; int n; ! FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn) ! if (VINSN_SEPARABLE_P (vinsn)) ! { ! if (vinsn_equal_p (vinsn, EXPR_VINSN (expr))) ! return true; ! } ! else ! { ! /* For non-separable instructions, the blocking insn can have !another pattern due to substitution, and we can't choose !different register as in the above case. Check all registers !being written instead. */ ! if (bitmap_intersect_p (VINSN_REG_SETS (vinsn), ! VINSN_REG_SETS (EXPR_VINSN (expr ! return true; ! } return false; } --- 3567,3607 return NULL; } ! /* Lookup EXPR in VINSN_VEC and return
Re: [PATCH] Fix sel-sched ICE with asm goto (PR rtl-optimization/54455)
Hello, On 05.09.2012 19:38, Jakub Jelinek wrote: Hi! As discussed in the PR, sel-sched doesn't handle correctly tidying of empty blocks if fallthru predecessor ends with asm goto that has some labels on the empty block in addition to the fallthru edge. cfgrtl.c can handle that, so this patch just gives up on it on the sel-sched side. The testcase is new since the patch in the PR, tested with unpatched and patched gcc (fails vs. works). Bootstrapped/regtested on x86_64-linux and i686-linux (as usually, with rtl checking). Ok for trunk? OK. Sorry for being slower on this than you. Btw, the max-sched-extend-regions-iters can be limited to 2 in the test, I've checked that my build still segfaults with it. (I've mentioned once to Zdenek in some PR comment that this param's value of 3-4 covers most of the weird cfgs, so the larger values should not be tried in his opt searches.) Andrey 2012-09-05 Jakub Jelinek ja...@redhat.com PR rtl-optimization/54455 * sel-sched-ir.c (maybe_tidy_empty_bb): Give up if previous fallthru bb ends up with asm goto referencing bb's label. * gcc.dg/54455.c: New test. --- gcc/sel-sched-ir.c.jj 2012-08-15 10:55:30.0 +0200 +++ gcc/sel-sched-ir.c 2012-09-03 09:56:59.352233243 +0200 @@ -3686,6 +3686,22 @@ maybe_tidy_empty_bb (basic_block bb) FOR_EACH_EDGE (e, ei, bb-preds) if (e-flags EDGE_COMPLEX) return false; +else if (e-flags EDGE_FALLTHRU) + { + rtx note; + /* If prev bb ends with asm goto, see if any of the + ASM_OPERANDS_LABELs don't point to the fallthru + label. Do not attempt to redirect it in that case. */ + if (JUMP_P (BB_END (e-src)) +(note = extract_asm_operands (PATTERN (BB_END (e-src) + { + int i, n = ASM_OPERANDS_LABEL_LENGTH (note); + + for (i = 0; i n; ++i) + if (XEXP (ASM_OPERANDS_LABEL (note, i), 0) == BB_HEAD (bb)) + return false; + } + } free_data_sets (bb); --- gcc/testsuite/gcc.dg/54455.c.jj 2012-06-15 19:53:34.312404791 +0200 +++ gcc/testsuite/gcc.dg/54455.c2012-09-05 15:05:02.328728962 +0200 @@ -0,0 +1,25 @@ +/* PR rtl-optimization/54455 */ +/* { dg-do compile } */ +/* { dg-options -O1 -fschedule-insns -fselective-scheduling --param max-sched-extend-regions-iters=8 } */ + +extern void fn1 (void), fn2 (void); + +static inline __attribute__((always_inline)) int +foo (int *x, long y) +{ + asm goto ( : : r (x), r (y) : memory : lab); + return 0; +lab: + return 1; +} + +void +bar (int *x) +{ + if (foo (x, 23)) +fn1 (); + else +fn2 (); + + foo (x, 2); +} Jakub
Re: Scheduler: Save state at the end of a block
On 13.08.2012 14:32, Bernd Schmidt wrote: This is a small patch for sched-rgn that attempts to save DFA state at the end of a basic block and re-use it in successor blocks. This was a customer-requested optimization; I've not seen it make much of a difference in any macro benchmarks. FWIW, this was definitely helpful for sel-sched on ia64, as far as I recall, and likewise on some of the smaller tests. Andrey Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK? Bernd
Fix PR 53701
Hello, The problem in question is uncovered by the recent speculation patch, it is in the handling of expressions blocked by bookkeeping. Those are expressions that become unavailable due to the newly created bookkeeping copies. In the original algorithm the supported insns and transformations cannot lead to this result, but when handling non-separable insns or creating speculative checks that unpredictably block certain insns the situation can arise. We just filter out all such expressions from the final availability set for correctness. The PR happens because the expression being filtered out can be transformed while being moved up, thus we need to look up not only its exact pattern but also all its previous forms saved in its history of changes. The patch does exactly that, I also clarified the comments w.r.t. this situation. Bootstrapped and tested on ia64 and x86-64, the PR testcase is minimized, too. OK for trunk? Also need to backport this to 4.7 with PR 53975, say on the next week. Yours, Andrey gcc: 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * sel-sched.c (vinsn_vec_has_expr_p): Clarify function comment. Process not only expr's vinsns but all old vinsns from expr's history of changes. (update_and_record_unavailable_insns): Clarify comment. testsuite: 2012-08-09 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/53701 * gcc.dg/pr53701.c: New test. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 3099b92..f0c6eaf 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -3564,29 +3564,41 @@ process_use_exprs (av_set_t *av_ptr) return NULL; } -/* Lookup EXPR in VINSN_VEC and return TRUE if found. */ +/* Lookup EXPR in VINSN_VEC and return TRUE if found. Also check patterns from + EXPR's history of changes. */ static bool vinsn_vec_has_expr_p (vinsn_vec_t vinsn_vec, expr_t expr) { - vinsn_t vinsn; + vinsn_t vinsn, expr_vinsn; int n; + unsigned i; - FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn) -if (VINSN_SEPARABLE_P (vinsn)) - { -if (vinsn_equal_p (vinsn, EXPR_VINSN (expr))) - return true; - } -else - { -/* For non-separable instructions, the blocking insn can have - another pattern due to substitution, and we can't choose - different register as in the above case. Check all registers - being written instead. */ -if (bitmap_intersect_p (VINSN_REG_SETS (vinsn), -VINSN_REG_SETS (EXPR_VINSN (expr - return true; - } + /* Start with checking expr itself and then proceed with all the old forms + of expr taken from its history vector. */ + for (i = 0, expr_vinsn = EXPR_VINSN (expr); + expr_vinsn; + expr_vinsn = (i VEC_length (expr_history_def, + EXPR_HISTORY_OF_CHANGES (expr)) + ? VEC_index (expr_history_def, + EXPR_HISTORY_OF_CHANGES (expr), + i++)-old_expr_vinsn + : NULL)) +FOR_EACH_VEC_ELT (vinsn_t, vinsn_vec, n, vinsn) + if (VINSN_SEPARABLE_P (vinsn)) + { + if (vinsn_equal_p (vinsn, expr_vinsn)) + return true; + } + else + { + /* For non-separable instructions, the blocking insn can have + another pattern due to substitution, and we can't choose + different register as in the above case. Check all registers + being written instead. */ + if (bitmap_intersect_p (VINSN_REG_SETS (vinsn), + VINSN_REG_SETS (expr_vinsn))) + return true; + } return false; } @@ -5694,8 +5706,8 @@ update_and_record_unavailable_insns (basic_block book_block) || EXPR_TARGET_AVAILABLE (new_expr) != EXPR_TARGET_AVAILABLE (cur_expr)) /* Unfortunately, the below code could be also fired up on - separable insns. - FIXME: add an example of how this could happen. */ + separable insns, e.g. when moving insns through the new + speculation check as in PR 53701. */ vinsn_vec_add (vec_bookkeeping_blocked_vinsns, cur_expr); } diff --git a/gcc/testsuite/gcc.dg/pr53701.c b/gcc/testsuite/gcc.dg/pr53701.c new file mode 100644 index 000..2c85223 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr53701.c @@ -0,0 +1,59 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-options -O3 -fselective-scheduling2 -fsel-sched-pipelining } */ +typedef unsigned short int uint16_t; +typedef unsigned long int uintptr_t; +typedef struct GFX_VTABLE +{ + int color_depth; + unsigned char *line[]; +} +BITMAP; +extern int _drawing_mode; +extern BITMAP *_drawing_pattern; +extern int _drawing_y_anchor; +extern unsigned int _drawing_x_mask; +extern unsigned int _drawing_y_mask; +extern uintptr_t bmp_write_line (BITMAP *, int); + void +_linear_hline15 (BITMAP * dst, int dx1, int dy, int dx2, int color) +{ + int w; + if (_drawing_mode == 0) + { +int x, curw; +unsigned
Re: [patch] one more sched-vis fix
Hi Steven, On 03.08.2012 12:02, Steven Bosscher wrote: Hello, I also need to handle SEQUENCE, of course, or slim dumping fails in targets with branch delay slots. But who knew SEQUENCE can also appear as a REG_NOTE value?! The cfgrtl.c fix is purely cosmetic and obvious. Bootstrappedtested on powerpc64-unknown-linux-gnu. Committed as obvious. You can even close PR 47489 as fixed then :) Andrey Ciao! Steven * sched-vis (print_pattern): Handle SEQUENCE also. * cfgrtl.c (print_rtl_with_bb): Do not print a newline between insns. Index: sched-vis.c === --- sched-vis.c (revision 190016) +++ sched-vis.c (working copy) @@ -610,8 +610,19 @@ print_pattern (char *buf, const_rtx x, int verbose } break; case SEQUENCE: - /* Should never see SEQUENCE codes until after reorg. */ - gcc_unreachable (); + { + int i; + + sprintf (t1, sequence{); + for (i = 0; i XVECLEN (x, 0); i++) + { + print_pattern (t2, XVECEXP (x, 0, i), verbose); + sprintf (t3, %s%s;, t1, t2); + strcpy (t1, t3); + } + sprintf (buf, %s}, t1); + } + break; case ASM_INPUT: sprintf (buf, asm {%s}, XSTR (x, 0)); break; Index: cfgrtl.c === --- cfgrtl.c(revision 190112) +++ cfgrtl.c(working copy) @@ -1958,10 +1958,9 @@ print_rtl_with_bb (FILE *outf, const_rtx rtx_first dump_bb_info (outf, bb, 0, dump_flags | TDF_COMMENT, false, true); if (df (flags TDF_DETAILS)) df_dump_bottom (bb, outf); + putc ('\n', outf); } } - - putc ('\n', outf); } free (start);
[PATCH] PR 53975
Hello, This PR is about wrong speculation of an insn that doesn't support storing NaT bits done by the selective scheduler (more details in the PR audit trail). The reason for this is the wrong one-liner patch committed last year, the fix is to revert that patch and to clarify the comment before the patched code. Bootstrapped and tested on ia64, approved by Alexander offline. No test as I don't know how to check whether an insn got moved through another insn. Andrey PR target/53975 * sel-sched-ir.c (has_dependence_note_reg_use): Clarify comment. Revert 2011-08-04 Sergey Grechanik mouseent...@ispras.ru * sel-sched-ir.c (has_dependence_note_reg_use): Call ds_full_merge only if producer writes to the register given by regno. Index: gcc/sel-sched-ir.c === *** gcc/sel-sched-ir.c (revision 190004) --- gcc/sel-sched-ir.c (revision 190005) *** has_dependence_note_reg_use (int regno) *** 3228,3234 if (reg_last-clobbers) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; ! /* Handle BE_IN_SPEC. */ if (reg_last-uses) { ds_t pro_spec_checked_ds; --- 3228,3238 if (reg_last-clobbers) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; ! /* Merge BE_IN_SPEC bits into *DSP when the dependency producer ! is actually a check insn. We need to do this for any register ! read-read dependency with the check unless we track properly ! all registers written by BE_IN_SPEC-speculated insns, as ! we don't have explicit dependence lists. See PR 53975. */ if (reg_last-uses) { ds_t pro_spec_checked_ds; *** has_dependence_note_reg_use (int regno) *** 3236,3244 pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro); pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds); ! if (pro_spec_checked_ds != 0 ! bitmap_bit_p (INSN_REG_SETS (has_dependence_data.pro), regno)) ! /* Merge BE_IN_SPEC bits into *DSP. */ *dsp = ds_full_merge (*dsp, pro_spec_checked_ds, NULL_RTX, NULL_RTX); } --- 3240,3246 pro_spec_checked_ds = INSN_SPEC_CHECKED_DS (has_dependence_data.pro); pro_spec_checked_ds = ds_get_max_dep_weak (pro_spec_checked_ds); ! if (pro_spec_checked_ds != 0) *dsp = ds_full_merge (*dsp, pro_spec_checked_ds, NULL_RTX, NULL_RTX); }
Re: [patch] un-#ifdef GATHER_STATISTICS
On 24.07.2012 21:13, Steven Bosscher wrote: AFAIR the qsort is just for getting a stable ordering over two pools to find the leaked regsets afterwards, the type of ordering doesn't matter. What matters is that the compare function gives a reliable result. You can't subtract pointers like that for qsort. After consulting the experts on IRC, I'm going with a fix along the lines of return (x == y ? 0 : (x y ? -1 : 1));. Yeah, I agree the code was dubious, so thanks for uncovering this. If you have already tested the patch, consider it preapproved, otherwise I can fix it on this week. Anyways, how come this is related to your patch? We don't use statistics in sel-sched... Something got miscompiled? No, just allocated slightly differently. A bitmap_head is one pointer bigger than before. I'm unsure how that causes this problem, though. I suspect you would have seen the same failure with GATHER_STATISTICS enabled. Still interesting to know why your first patch fixed the issue in the first place. I will try taking a look at the generated code in my spare time. Yours, Andrey Ciao! Steven
Re: [patch] un-#ifdef GATHER_STATISTICS
Hello, On 24.07.2012 18:56, Steven Bosscher wrote: On Tue, Jul 24, 2012 at 4:37 PM, Steven Bosscherstevenb@gmail.com wrote: On Tue, Jul 24, 2012 at 3:08 PM, Uros Bizjakubiz...@gmail.com wrote: This patch (r189803) regressed a bunch of tests on x86_64 [1], [2]. [1] http://gcc.gnu.org/ml/gcc-testresults/2012-07/msg02066.html [2] http://gcc.gnu.org/ml/gcc-regression/2012-07/msg00177.html These are all selective-scheduler test cases. It looks like qsort is being used incorrectly. This seems to fix it for me, but I don't understand why. Apparently, a pointer subtraction doesn't result in a signed value?? In any case, the sort on those arrays wasn't correct. Comments? Index: sel-sched-ir.c === --- sel-sched-ir.c (revision 189808) +++ sel-sched-ir.c (working copy) @@ -954,7 +954,9 @@ return_regset_to_pool (regset rs) static int cmp_v_in_regset_pool (const void *x, const void *xx) { - return *((const regset *) x) - *((const regset *) xx); + ptrdiff_t d = (ptrdiff_t) *((const regset *) x); + ptrdiff_t dd = (ptrdiff_t) *((const regset *) xx); + return d - dd; AFAIR the qsort is just for getting a stable ordering over two pools to find the leaked regsets afterwards, the type of ordering doesn't matter. Anyways, how come this is related to your patch? We don't use statistics in sel-sched... Something got miscompiled? Andrey } #endif
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 18:22, Richard Guenther wrote: 2012/4/12 Andrey Belevantseva...@ispras.ru: On 12.04.2012 17:54, Richard Guenther wrote: 2012/4/12 Andrey Belevantseva...@ispras.ru: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? I mean that the scheduler does things like sort the ready list according to a number of heuristics and to the insn priority, then choose the insn that would allow the maximum of ready insns to be issued on the current cycle. The heuristics are rather general. The scheduler does not do things like if some random insn is ready, then choose some other random insn from the ready list and schedule it (which is what the patch does). This is what scheduler hooks are for, to account for some target-specific heuristic. The problem is that this particular implementation looks somewhat like an overkill and also motivated by a single benchmark. Testing on a wider set of benchmarks and checking compile-time hit would make the motivation more convincing. Yeah, and especially looking _why_ the generic heuristics are not working and if they could be improved. After all the issue seems to be properly modeled in the DFA. It is, but the DFA only models the actual stalls that you get when an imul is scheduled. What you need here is to increase priority for ready insns that have imuls in their critical paths, and those imuls should be close enough to fill in the gap (actual consumers in the case of the patch). The lookahead done in max_issue can help with this dynamic priority change in general, but it considers just the ready insns, not their immediate consumers. You need to make the deeper lookahead if you want to do this in a target independent way -- no-no from compile time POV. A number of already existing hooks can help: - sched_reorder used in this patch just sorts the ready list in any way the target wishes; - adjust_priority/adjust_cost helps to boost or to lower priority in such subtle cases; - is_costly_dependence (now rs6000 only) can guide the early queue insn removal and its addition to the ready list; - dfa_lookahead_guard (now ia64 only) can ban some insns from being issued on the current try. Using sched_reorder is existing practice, like in s390, spu, mips, etc. Usually though an implementation again only considers the actual ready insns and not the successors. I'd say that trying more tests would help, or other hooks can be tried, too, but after all this is Atom specific, so it's the decision of i386 maintainers. Andrey Richard. Andrey Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list
Re: [PATCH] Fix PR 52203 and 52715
Hello, On 07.03.2012 15:46, Alexander Monakov wrote: On Wed, 7 Mar 2012, Andrey Belevantsev wrote: Hello, This PR is again about insns that are recog'ed as=0 but do not change the processor state. As explained in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried experimenting with an attribute marking those empty insns in MD files and asserting that all other insns do have reservations. As this doesn't seem to be interesting, I give up with the idea, and the below patch makes sel-sched do exactly what the Haifa scheduler does, i.e. do not count such insns against issue_rate when modelling clock cycles. But of course I wrongly microoptimized the decision of whether an insn is empty as shown in PR 52715, so the right fix is to check the emptiness right before issuing the insn. Thus, the following patch is really needed (tested on ia64 and x86 again), OK? The new hunk is the last one in the patch. Andrey 2012-04-13 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/52203 PR rtl-optimization/52715 Revert the 2012-03-07 fix for PR 52203. * sel-sched.c (reset_sched_cycles_in_current_ebb): Check that the insn does not modify DFA right before issuing, adjust issue_rate accordingly. Tested on ia64 and x86-64, OK for trunk? No testcase again because of the amount of flags needed. Andrey 2012-03-07 Andrey Belevantseva...@ispras.ru PR rtl-optimization/52203 * sel-sched.c (estimate_insn_cost): New parameter pempty. Adjust all callers to pass NULL except ... (reset_sched_cycles_in_current_ebb): ... here, save the value in new variable 'empty'. Increase issue_rate only for non-empty insns. This is OK. Thanks. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 4e13230..ce38fa0 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -4265,10 +4265,9 @@ invoke_aftermath_hooks (fence_t fence, rtx best_insn, int issue_more) return issue_more; } -/* Estimate the cost of issuing INSN on DFA state STATE. Write to PEMPTY - true when INSN does not change the processor state. */ +/* Estimate the cost of issuing INSN on DFA state STATE. */ static int -estimate_insn_cost (rtx insn, state_t state, bool *pempty) +estimate_insn_cost (rtx insn, state_t state) { static state_t temp = NULL; int cost; @@ -4278,8 +4277,6 @@ estimate_insn_cost (rtx insn, state_t state, bool *pempty) memcpy (temp, state, dfa_state_size); cost = state_transition (temp, insn); - if (pempty) -*pempty = (memcmp (temp, state, dfa_state_size) == 0); if (cost 0) return 0; @@ -4310,7 +4307,7 @@ get_expr_cost (expr_t expr, fence_t fence) return 0; } else -return estimate_insn_cost (insn, FENCE_STATE (fence), NULL); +return estimate_insn_cost (insn, FENCE_STATE (fence)); } /* Find the best insn for scheduling, either via max_issue or just take @@ -7023,7 +7020,7 @@ reset_sched_cycles_in_current_ebb (void) { int cost, haifa_cost; int sort_p; - bool asm_p, real_insn, after_stall, all_issued, empty; + bool asm_p, real_insn, after_stall, all_issued; int clock; if (!INSN_P (insn)) @@ -7050,7 +7047,7 @@ reset_sched_cycles_in_current_ebb (void) haifa_cost = 0; } else -haifa_cost = estimate_insn_cost (insn, curr_state, empty); +haifa_cost = estimate_insn_cost (insn, curr_state); /* Stall for whatever cycles we've stalled before. */ after_stall = 0; @@ -7084,7 +7081,7 @@ reset_sched_cycles_in_current_ebb (void) if (!after_stall real_insn haifa_cost 0 - estimate_insn_cost (insn, curr_state, NULL) == 0) + estimate_insn_cost (insn, curr_state) == 0) break; /* When the data dependency stall is longer than the DFA stall, @@ -7096,7 +7093,7 @@ reset_sched_cycles_in_current_ebb (void) if ((after_stall || all_issued) real_insn haifa_cost == 0) -haifa_cost = estimate_insn_cost (insn, curr_state, NULL); +haifa_cost = estimate_insn_cost (insn, curr_state); } haifa_clock += i; @@ -7127,8 +7124,14 @@ reset_sched_cycles_in_current_ebb (void) if (real_insn) { + static state_t temp = NULL; + + if (!temp) + temp = xmalloc (dfa_state_size); + memcpy (temp, curr_state, dfa_state_size); + cost = state_transition (curr_state, insn); - if (!empty) + if (memcmp (temp, curr_state, dfa_state_size)) issued_insns++; if (sched_verbose = 2)
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 13.04.2012 14:18, Igor Zamyatin wrote: On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantseva...@ispras.ru wrote: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.comwrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Could you provide some examples when this patch would harm the performance? I thought of the cases when the other ready insns can fill up the hole and that would be more beneficial because e.g. they would be on more critical paths than the producer of your second imul. I don't know enough of Atom to give an example -- maybe some long divisions? Sched_reorder was chosen since it is used in other ports and looks most suitable for such case, e.g. it provides access to the whole ready list. BTW, just increasing producer's priority seems to be more risky in performance sense - we can incorrectly start delaying some instructions. Yes, but exactly because of the above example you can start incorrectly delaying other insns, too, as you force the insn to be the first in the list. While bumping priority still leaves the scheduler sorting heuristics in place and actually lowers that risk. Thought ready list doesn't contain DEBUG_INSN... Is it so? If it contains them - this could be added easily It does, but I'm not sure the sched_reorder hook gets them or they are immediately removed -- I saw similar checks in one of the targets' hooks. Anyways, my main thought was that it is better to test on more benchmarks to alleviate the above concerns, so as long as the i386 maintainers are happy, I don't see major problems here. A good idea could be to generalize the patch to handle other long latency insns as second consumers, not only imuls, if this is relevant for Atom. Andrey The case with more than one imul in the ready list wasn't considered just because the goal was to catch the particular case when there is a risk to get the following picture: imul-producer-imul which is less effective than producer-imul-imul for Atom Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander Thanks, Igor
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [PATCH] Atom: Scheduler improvements for better imul placement
On 12.04.2012 17:54, Richard Guenther wrote: 2012/4/12 Andrey Belevantseva...@ispras.ru: On 12.04.2012 16:38, Richard Guenther wrote: On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com wrote: On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther richard.guent...@gmail.comwrote: On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru wrote: Can atom execute two IMUL in parallel? Or what exactly is the pipeline behavior? As I understand from Intel's optimization reference manual, the behavior is as follows: if the instruction immediately following IMUL has shorter latency, execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a 4-or-more cycles latency instruction can be issued after IMUL without a stall. In other words, IMUL is pipelined with respect to other long-latency instructions, but not to short-latency instructions. It seems to be modeled in the pipeline description though: ;;; imul insn has 5 cycles latency (define_reservation atom-imul-32 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4, atom-port-0) ;;; imul instruction excludes other non-FP instructions. (exclusion_set atom-eu-0, atom-eu-1 atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4) The main idea is quite simple: If we are going to schedule IMUL instruction (it is on the top of ready list) we try to find out producer of other (independent) IMUL instruction that is in ready list too. The goal is try to schedule such a producer to get another IMUL in ready list and get scheduling of 2 successive IMUL instructions. Why does that not happen without your patch? Does it never happen without your patch or does it merely not happen for one EEMBC benchmark (can you provide a testcase?)? It does not happen because the scheduler by itself does not do such specific reordering. That said, it is easy to imagine the cases where this patch will make things worse rather than better. That surprises me. What is so specific about this reordering? I mean that the scheduler does things like sort the ready list according to a number of heuristics and to the insn priority, then choose the insn that would allow the maximum of ready insns to be issued on the current cycle. The heuristics are rather general. The scheduler does not do things like if some random insn is ready, then choose some other random insn from the ready list and schedule it (which is what the patch does). This is what scheduler hooks are for, to account for some target-specific heuristic. The problem is that this particular implementation looks somewhat like an overkill and also motivated by a single benchmark. Testing on a wider set of benchmarks and checking compile-time hit would make the motivation more convincing. Andrey Igor, why not try different subtler mechanisms like adjust_priority, which is get called when an insn is added to the ready list? E.g. increase the producer's priority. The patch as is misses checks for NONDEBUG_INSN_P. Also, why bail out when you have more than one imul in the ready list? Don't you want to bump the priority of the other imul found? Andrey And MD allows us only prefer scheduling of successive IMUL instructions, i.e. If IMUL was just scheduled and ready list contains another IMUL instruction then it will be chosen as the best candidate for scheduling. at least from my very limited guessing of what the above does. So, did you analyze why the scheduler does not properly schedule things for you? Richard. From reading the patch, I could not understand the link between pipeline behavior and what the patch appears to do. Hope that helps. Alexander
Re: [SMS] Support new loop pattern
Hello Ayal, First of all, thanks for your feedback. Now to your questions: On 31.03.2012 3:20, Ayal Zaks wrote: Roman, Andrey, Sorry for the delayed response. It would indeed be good to have SMS apply to more loop patterns, still within the realm of *countable* loops. SMS was originally designed to handle doloops, with a specific pattern controlling the loop, easily identified and separable from the loop's body. The newly proposed change to support new loop patterns is pretty invasive and sizable, taking place entirely within modulo-sched.c. The main issue I've been considering, is whether it would be possible instead to transform the new loop patterns we want SMS to handle, into doloops (potentially introducing additional induction variables to feed other uses), and then feed the resulting loop into SMS as is? In other words, could you fold it into doloop.c? And if so, will doing so introduce significant overheads? Let me perhaps explain better. The patch itself is one core patch (this thread) adding the new functionality on detecting more complex loop patterns and the three fixes to SMS found while working on the main patch (the fixes are in the mails pinged at the very end of this message). The three fixes are worthwhile to commit separately anyways, they are splitted up from the main patch for this purpose, so I would suggest to consider them in any case. For the main patch, its core is as small as we could get. It stays with the countable loops as for the cases where we could get overflow behavior or infinite loops we bail out early. We handle only a case of simple same-step affine counters. The main reason why we add support to SMS and not to the doloop pass are is when we do not pipeline a loop newly transformed to the doloop form, this loop actually slows down on the platforms not having a true doloop pattern. One has to undo the doloop form and to get back to the original loop form to avoid this, which seems rather strange. Also, the separate decrement insn that changes the induction variable is better be scheduled to get more precise schedule. And yes, I believe that making an extra induction variable just to have the control part without uses in the loop core will be unnecessary overhead. Thus, I believe that if we do want SMS to handle more complex loop, then it is inevitable that SMS itself would be somewhat more complex. I would welcome your suggestions to make the patch more clear. One way I see is that the function for getting the condition of the new loop form can be moved to the generic RTL loop code given the agreement of other RTL maintainers. Also, some new helpers can be introduced for handling this specific loop forms. But it seems that the distinction between doloop/non-doloop loops has to stay in the code. Yours, Andrey 2012/3/29 Andrey Belevantseva...@ispras.ru: Hello, I'd like to ping again those SMS patches once we're back to Stage 1. Ayal, maybe it would remove some burden for you if you'd review the general SMS functionality of those patches, and we'd ask RTL folks to look at the pieces related to RTL pattern matching and generation? It definitely would ... especially in light of the above issue. Thanks (for your patches, patience, pings..), Ayal. Yours, Andrey On 10.02.2012 16:15, Roman Zhuykov wrote: Ping. Ayal, please review this patch and these three patches too: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html -- Roman Zhuykov zhr...@ispras.ru
Re: [PATCH, RTL] Fix PR 51106
On 03.04.2012 13:36, Jakub Jelinek wrote: On Mon, Apr 02, 2012 at 06:56:25PM +0400, Andrey Belevantsev wrote: After Richi's RTL generation related cleanups went it, the extra cleanup_cfg call was added so we are no longer lucky to have the proper fallthru edge in this case. The PR trail has the patch to purge_dead_edges making it consider this situation, but I still prefer the below patch that fixes only the invalid asm case. The reason is that I think it unlikely that after initial RTL expansion (of which the instantiate virtual regs pass seems to be the part) we will get the problematic situation. However, I'm happy to test the PR trail patch, too. I don't like blindly changing edge into FALLTHRU, generally the edge could be abnormal, or EH, etc. and making that fallthru is not a good idea. I'd prefer if wherever the fallthru edge is removed the other normal edge(s) are adjusted. Well, as I mentioned in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51106#c18 the removal happens in try_optimize_cfg around line 2617, that's the code that deals with removing trivially empty block (in the PR case, the succ of the asm goto block is trivially empty). After that we have an asm goto acting as an unconditional jump with a single succ edge and no fallthru bit, which seems perfectly fine. We get into trouble when we actually realize that the asm is bogus. Thus I've tried to fix it up at this time. The options that we briefly discussed on IRC with Richi are as follows: - Fix up try_optimize_cfg in the case of asm gotos, but it is not clear to me how do we do this -- we don't yet distinguish between good and bad asm goto at this point; - Fix up function.c as I did but make it more robust. Either handle more cases with strange edges (EH seems also possible after introducing the throw attribute for asms), or just remove the asm and insert the unconditional jump pointing to the place where the asm was, retaining all the flags; - Fix up purge_dead_edges. as I did initially in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51106#c16. When we find a situation of no jump at the end of block and no fallthru edge, assert this happens only with single_succ_p and actually make this edge fallthru. (Probably also watch out whether we need to make it fake or whatever.) Or as Richi tried, just accept the situation of having no successors here, as in -- no fallthru edge on entry to purge_dead_edges and no jump means no successors, period. I think that just nobody deleted unconditional jumps with delete_insn_and_edges previously, otherwise I don't understand why this did not trigger. Thoughts? Andrey Jakub
Re: [PATCH, RTL] Fix PR 51106
Hello, On 19.01.2012 18:40, Jakub Jelinek wrote: On Thu, Jan 19, 2012 at 06:13:58PM +0400, Andrey Belevantsev wrote: 2012-01-19 Andrey Belevantseva...@ispras.ru PR target/51106 * function.c (instantiate_virtual_regs_in_insn): Use delete_insn_and_edges when removing a wrong asm insn. This is ok for trunk. After Richi's RTL generation related cleanups went it, the extra cleanup_cfg call was added so we are no longer lucky to have the proper fallthru edge in this case. The PR trail has the patch to purge_dead_edges making it consider this situation, but I still prefer the below patch that fixes only the invalid asm case. The reason is that I think it unlikely that after initial RTL expansion (of which the instantiate virtual regs pass seems to be the part) we will get the problematic situation. However, I'm happy to test the PR trail patch, too. Tested fine on x86-64, ok for trunk? Andrey 2012-04-02 Andrey Belevantsev a...@ispras.ru PR target/51106 PR middle-end/52650 * function.c (instantiate_virtual_regs_in_insn): Make sure to set the proper fallthru bits when removing a wrong jump asm. diff --git a/gcc/function.c b/gcc/function.c index 3e903ef..a2638bb 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -1730,6 +1730,15 @@ instantiate_virtual_regs_in_insn (rtx insn) if (!check_asm_operands (PATTERN (insn))) { error_for_asm (insn, impossible constraint in %asm%); + if (JUMP_P (insn)) + { + basic_block bb = BLOCK_FOR_INSN (insn); + edge e; + + if (single_succ_p (bb) + !((e = single_succ_edge (bb))-flags EDGE_FALLTHRU)) + e-flags |= EDGE_FALLTHRU; + } delete_insn_and_edges (insn); } }
Re: [SMS] Support new loop pattern
Hello, I'd like to ping again those SMS patches once we're back to Stage 1. Ayal, maybe it would remove some burden for you if you'd review the general SMS functionality of those patches, and we'd ask RTL folks to look at the pieces related to RTL pattern matching and generation? Yours, Andrey On 10.02.2012 16:15, Roman Zhuykov wrote: Ping. Ayal, please review this patch and these three patches too: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html -- Roman Zhuykov zhr...@ispras.ru
[PATCH] Fix PR 52203
Hello, This PR is again about insns that are recog'ed as =0 but do not change the processor state. As explained in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried experimenting with an attribute marking those empty insns in MD files and asserting that all other insns do have reservations. As this doesn't seem to be interesting, I give up with the idea, and the below patch makes sel-sched do exactly what the Haifa scheduler does, i.e. do not count such insns against issue_rate when modelling clock cycles. Tested on ia64 and x86-64, OK for trunk? No testcase again because of the amount of flags needed. Andrey 2012-03-07 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/52203 * sel-sched.c (estimate_insn_cost): New parameter pempty. Adjust all callers to pass NULL except ... (reset_sched_cycles_in_current_ebb): ... here, save the value in new variable 'empty'. Increase issue_rate only for non-empty insns. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index 2af01ae..2829f60 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -4265,9 +4265,10 @@ invoke_aftermath_hooks (fence_t fence, rtx best_insn, int issue_more) return issue_more; } -/* Estimate the cost of issuing INSN on DFA state STATE. */ +/* Estimate the cost of issuing INSN on DFA state STATE. Write to PEMPTY + true when INSN does not change the processor state. */ static int -estimate_insn_cost (rtx insn, state_t state) +estimate_insn_cost (rtx insn, state_t state, bool *pempty) { static state_t temp = NULL; int cost; @@ -4277,6 +4278,8 @@ estimate_insn_cost (rtx insn, state_t state) memcpy (temp, state, dfa_state_size); cost = state_transition (temp, insn); + if (pempty) +*pempty = (memcmp (temp, state, dfa_state_size) == 0); if (cost 0) return 0; @@ -4307,7 +4310,7 @@ get_expr_cost (expr_t expr, fence_t fence) return 0; } else -return estimate_insn_cost (insn, FENCE_STATE (fence)); +return estimate_insn_cost (insn, FENCE_STATE (fence), NULL); } /* Find the best insn for scheduling, either via max_issue or just take @@ -7020,7 +7023,7 @@ reset_sched_cycles_in_current_ebb (void) { int cost, haifa_cost; int sort_p; - bool asm_p, real_insn, after_stall, all_issued; + bool asm_p, real_insn, after_stall, all_issued, empty; int clock; if (!INSN_P (insn)) @@ -7047,7 +7050,7 @@ reset_sched_cycles_in_current_ebb (void) haifa_cost = 0; } else -haifa_cost = estimate_insn_cost (insn, curr_state); +haifa_cost = estimate_insn_cost (insn, curr_state, empty); /* Stall for whatever cycles we've stalled before. */ after_stall = 0; @@ -7081,7 +7084,7 @@ reset_sched_cycles_in_current_ebb (void) if (!after_stall real_insn haifa_cost 0 - estimate_insn_cost (insn, curr_state) == 0) + estimate_insn_cost (insn, curr_state, NULL) == 0) break; /* When the data dependency stall is longer than the DFA stall, @@ -7093,7 +7096,7 @@ reset_sched_cycles_in_current_ebb (void) if ((after_stall || all_issued) real_insn haifa_cost == 0) -haifa_cost = estimate_insn_cost (insn, curr_state); +haifa_cost = estimate_insn_cost (insn, curr_state, NULL); } haifa_clock += i; @@ -7125,7 +7128,8 @@ reset_sched_cycles_in_current_ebb (void) if (real_insn) { cost = state_transition (curr_state, insn); - issued_insns++; + if (!empty) + issued_insns++; if (sched_verbose = 2) {
[PATCH] [4.7?] Fix PR 52250
Hello, The PR is about selective scheduling not trying hard enough to find a neighbor block to stick the note list from an empty block being removed. Fixed by a) trying harder and b) making sure that if we fail to find the right place, we just drop the notes instead of asserting. Tested on x86-64 and ia64, approved by Alexander offline. No test as it has quite insane number of flags that would likely stop reproducing the core issue in a few weeks. Richard, Jakub, I can commit this to trunk today or to wait until the release. The patch is very safe and so can be committed in case we will need an RC2, if you want one bug less in a release, but I would feel equally happy to postpone it until 4.7.1 as the problematic situation is quite rare (and probably requires pipelining outer loops to trigger which is disabled with ia64/-O3). Yours, Andrey 2012-03-05 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/52250 * sel-sched-ir.c (maybe_tidy_empty_bb): Try harder to find a bb to put note list into. Unconditionally call move_bb_info. (move_bb_info): Do not assert the blocks being in the same region, just drop the note list if they are not. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index a93cd68..c53d2e1 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3658,7 +3658,7 @@ sel_recompute_toporder (void) static bool maybe_tidy_empty_bb (basic_block bb) { - basic_block succ_bb, pred_bb; + basic_block succ_bb, pred_bb, note_bb; VEC (basic_block, heap) *dom_bbs; edge e; edge_iterator ei; @@ -3697,6 +3697,17 @@ maybe_tidy_empty_bb (basic_block bb) pred_bb = NULL; dom_bbs = NULL; + /* Save a pred/succ from the current region to attach the notes to. */ + note_bb = NULL; + FOR_EACH_EDGE (e, ei, bb-preds) +if (in_current_region_p (e-src)) + { + note_bb = e-src; + break; + } + if (note_bb == NULL) +note_bb = succ_bb; + /* Redirect all non-fallthru edges to the next bb. */ while (rescan_p) { @@ -3746,10 +3757,8 @@ maybe_tidy_empty_bb (basic_block bb) else { /* This is a block without fallthru predecessor. Just delete it. */ - gcc_assert (pred_bb != NULL); - - if (in_current_region_p (pred_bb)) - move_bb_info (pred_bb, bb); + gcc_assert (note_bb); + move_bb_info (note_bb, bb); remove_empty_bb (bb, true); } @@ -5231,10 +5240,9 @@ sel_remove_bb (basic_block bb, bool remove_from_cfg_p) static void move_bb_info (basic_block merge_bb, basic_block empty_bb) { - gcc_assert (in_current_region_p (merge_bb)); - - concat_note_lists (BB_NOTE_LIST (empty_bb), -BB_NOTE_LIST (merge_bb)); + if (in_current_region_p (merge_bb)) +concat_note_lists (BB_NOTE_LIST (empty_bb), + BB_NOTE_LIST (merge_bb)); BB_NOTE_LIST (empty_bb) = NULL_RTX; }
Re: [PATCH] Fix PR 51505
On 30.01.2012 11:38, Paolo Bonzini wrote: On 01/29/2012 04:09 PM, Eric Botcazou wrote: As discussed in Bugzilla, this is the patch implementing Paolo's suggestion of killing REG_EQUAL/REG_EQUIV notes from df_kill_notes. The code assumes there is at most one such note per insn. That's wrong though and wreaks havoc during reload, e.g.: (insn 169 60 62 4 (set (reg:TF 158) (mem/c:TF (plus:SI (reg/f:SI 101 %sfp) (const_int -16 [0xfff0])) [3 S16 A64])) 960513-1.c:13 97 {*movtf_insn_sp32} (expr_list:REG_EQUIV (mem/c:TF (plus:SI (reg/f:SI 101 %sfp) (const_int -16 [0xfff0])) [3 S16 A64]) (expr_list:REG_EQUAL (mult:TF (reg/v:TF 110 [ d ]) (reg:TF 154)) (nil because the REG_EQUIV note disappears behind reload's back and it isn't prepared for that. This is the cause of the following regression on SPARC: FAIL: gcc.c-torture/execute/960513-1.c execution, -Os As well as: FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O2 FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O3 -fomit-frame-pointer FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O3 -g FAIL: gcc.c-torture/execute/stdarg-2.c execution, -Os FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O2 -flto -flto-partition=none FAIL: gcc.c-torture/execute/stdarg-2.c execution, -O2 -flto for the exact same reason. Does this help? That would fix the problem of multiple notes per insn (as we wanted to do initially), but I didn't understand whether this is the real problem or the problem is the reload not happy with disappearing notes. Also I can't reproduce it with a cross to sparc64-linux -- when I put a breakpoint on the code removing the notes, I find only similarly looking insn 148 which gets removed from the df_analyze call at the start of IRA. Though I see the fail from SPARC test results on the ML, so I guess I'm missing something... Andrey Paolo
Re: [PATCH] Fix PR 51505
On 30.01.2012 17:47, Paolo Bonzini wrote: On 01/30/2012 09:44 AM, Andrey Belevantsev wrote: Does this help? That would fix the problem of multiple notes per insn (as we wanted to do initially), but I didn't understand whether this is the real problem or the problem is the reload not happy with disappearing notes. Also I can't reproduce it with a cross to sparc64-linux -- when I put a breakpoint on the code removing the notes, I find only similarly looking insn 148 which gets removed from the df_analyze call at the start of IRA. Though I see the fail from SPARC test results on the ML, so I guess I'm missing something... The REG_EQUAL note can go, but the REG_EQUIV note should not (by definition: they are valid throughout the entire function). In fact, we could just as well apply the loop to REG_EQUAL notes only but that would have been a bit too clever and more risky. Eric, Paolo, thanks for the explanations! Andrey
Re: [PATCH] Fix PR 51389
On 25.01.2012 18:21, Richard Guenther wrote: 2012/1/25 Andrey Belevantseva...@ispras.ru: ... Sure, I've just had the impression that the datarefs vector is no use without the dependencies themselves. The possibility of making find_data_references_in_loop static also kind of hints in this direction. Anyways, I don't have any problem with fixing compute_all_dependences instead. I'd prefer that. The below tests fine and passes the test case. compute_all_dependences now returns bool and has the hunk from compute_data_dependences_for_loop for creating a single dont-know datadep. The only caller that needs to be fixed is compute_data_dependences_for_bb, others look fine. Tree prefetching has its own knob for the same goal, PREFETCH_MAX_MEM_REFS_PER_LOOP, which is 200. Do we want to remove it, to change it to the new parameter (might be too big), or to leave it as is? (Interesting enough, with data deps fixed the most time on the test case is taken by RTL PRE, 27%.) Andrey 2012-01-27 Andrey Belevantsev a...@ispras.ru PR middle-end/51389 * Makefile.in (tree-data-ref.o): Depend on $(PARAMS_H). * tree-data-ref.h (find_data_references_in_loop): Remove declaration. * tree-data-ref.c (find_data_references_in_loop): Make static. (compute_all_dependences): Change return type to bool. Bail out for too many datarefs in a loop. Move the hunk resetting the data dependences vector from ... (compute_data_dependences_for_loop): ... here. Account for compute_all_dependences returning false. (compute_data_dependences_for_bb): Likewise. * params.def (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS): New param. * doc/invoke.texi (loop-max-datarefs-for-datadeps): Document it. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index f450d3e..43295aa 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2598,7 +2598,7 @@ tree-scalar-evolution.o : tree-scalar-evolution.c $(CONFIG_H) $(SYSTEM_H) \ $(TREE_PASS_H) $(PARAMS_H) gt-tree-scalar-evolution.h tree-data-ref.o : tree-data-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ gimple-pretty-print.h $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \ - $(TREE_PASS_H) langhooks.h tree-affine.h + $(TREE_PASS_H) langhooks.h tree-affine.h $(PARAMS_H) sese.o : sese.c sese.h $(CONFIG_H) $(SYSTEM_H) coretypes.h tree-pretty-print.h \ $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) tree-pass.h value-prof.h graphite.o : graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DIAGNOSTIC_CORE_H) \ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e3d3789..8a916ef 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9115,6 +9115,13 @@ with more basic blocks than this parameter won't have loop invariant motion optimization performed on them. The default value of the parameter is 1000 for -O1 and 1 for -O2 and above. +@item loop-max-datarefs-for-datadeps +Building data dapendencies is expensive for very large loops. This +parameter limits the number of data references in loops that are +considered for data dependence analysis. These large loops will not +be handled then by the optimizations using loop data dependencies. +The default value is 1000. + @item max-vartrack-size Sets a maximum number of hash table slots to use during variable tracking dataflow analysis of any function. If this limit is exceeded diff --git a/gcc/params.def b/gcc/params.def index 239b684..d7cdd7b 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -820,6 +820,12 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION, maximum number of basic blocks per function to be analyzed by Graphite, 100, 0, 0) +/* Avoid data dependence analysis on very large loops. */ +DEFPARAM (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS, + loop-max-datarefs-for-datadeps, + Maximum number of datarefs in loop for building loop data dependencies, + 1000, 0, 0) + /* Avoid doing loop invariant motion on very large loops. */ DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP, diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 09929c7..b8dfa31 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include langhooks.h #include tree-affine.h +#include params.h static struct datadep_stats { @@ -4134,9 +4135,10 @@ compute_affine_dependence (struct data_dependence_relation *ddr, /* Compute in DEPENDENCE_RELATIONS the data dependence graph for all the data references in DATAREFS, in the LOOP_NEST. When COMPUTE_SELF_AND_RR is FALSE, don't compute read-read and self - relations. */ + relations. Return true when successful, i.e. data references number + is small enough to be handled. */ -void +bool compute_all_dependences (VEC (data_reference_p, heap) *datarefs, VEC (ddr_p, heap) **dependence_relations, VEC (loop_p, heap) *loop_nest, @@ -4146,6 +4148,18
[PATCH] Fix PR 51389
Hello, In this PR data dependence analysis goes wild by trying to process 20k datarefs, so the patch limits the number of datarefs per loop we handle to 1000 via a param. On the way find_data_references_in_loop is made static and predcom/parloops are fixed for compute_data_dependences_for_loop returning false. Bootstrapped and tested on x86_64-linux, no testcase as it is really a memory hog. Strictly speaking, this could be a regression only from the time when -O3 didn't have predcom/vectorization turned on by default, so -- ok for trunk or 4.8? Branches? Andrey 2012-01-25 Andrey Belevantsev a...@ispras.ru PR middle-end/51389 * Makefile.in (tree-data-ref.o): Depend on $(PARAMS_H). * tree-data-ref.c (find_data_references_in_loop): Make static. Bail out for too many datarefs in a loop. * tree-data-ref.h (find_data_references_in_loop): Remove declaration. * tree-predcom.c (tree_predictive_commoning_loop): Bail out when compute_data_dependences_for_loop returns false. * tree-parloops.c (loop_parallel_p): Likewise. * params.def (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP): New param. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index f450d3e..43295aa 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2598,7 +2598,7 @@ tree-scalar-evolution.o : tree-scalar-evolution.c $(CONFIG_H) $(SYSTEM_H) \ $(TREE_PASS_H) $(PARAMS_H) gt-tree-scalar-evolution.h tree-data-ref.o : tree-data-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ gimple-pretty-print.h $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \ - $(TREE_PASS_H) langhooks.h tree-affine.h + $(TREE_PASS_H) langhooks.h tree-affine.h $(PARAMS_H) sese.o : sese.c sese.h $(CONFIG_H) $(SYSTEM_H) coretypes.h tree-pretty-print.h \ $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) tree-pass.h value-prof.h graphite.o : graphite.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DIAGNOSTIC_CORE_H) \ diff --git a/gcc/params.def b/gcc/params.def index 239b684..e50fa26 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -820,6 +820,12 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION, maximum number of basic blocks per function to be analyzed by Graphite, 100, 0, 0) +/* Avoid data dependence analysis on very large loops. */ +DEFPARAM (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP, + datadeps-max-datarefs-in-loop, + Maximum number of datarefs in loop for loop data dependence analysis, + 1000, 0, 0) + /* Avoid doing loop invariant motion on very large loops. */ DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP, diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index a0d86ec..5a7659a 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include langhooks.h #include tree-affine.h +#include params.h static struct datadep_stats { @@ -4338,7 +4339,7 @@ find_data_references_in_bb (struct loop *loop, basic_block bb, TODO: This function should be made smarter so that it can handle address arithmetic as if they were array accesses, etc. */ -tree +static tree find_data_references_in_loop (struct loop *loop, VEC (data_reference_p, heap) **datarefs) { @@ -4351,7 +4352,9 @@ find_data_references_in_loop (struct loop *loop, { bb = bbs[i]; - if (find_data_references_in_bb (loop, bb, datarefs) == chrec_dont_know) + if (find_data_references_in_bb (loop, bb, datarefs) == chrec_dont_know + || ((int) VEC_length (data_reference_p, *datarefs) + PARAM_VALUE (PARAM_DATADEPS_MAX_DATAREFS_IN_LOOP))) { free (bbs); return chrec_dont_know; diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 0f12962..66258ab 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -394,8 +394,6 @@ extern bool compute_data_dependences_for_loop (struct loop *, bool, extern bool compute_data_dependences_for_bb (basic_block, bool, VEC (data_reference_p, heap) **, VEC (ddr_p, heap) **); -extern tree find_data_references_in_loop (struct loop *, - VEC (data_reference_p, heap) **); extern void print_direction_vector (FILE *, lambda_vector, int); extern void print_dir_vectors (FILE *, VEC (lambda_vector, heap) *, int); extern void print_dist_vectors (FILE *, VEC (lambda_vector, heap) *, int); diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 339ddcc..c5b7655 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -387,8 +387,14 @@ loop_parallel_p (struct loop *loop, struct obstack * parloop_obstack) datarefs = VEC_alloc (data_reference_p, heap, 10); dependence_relations = VEC_alloc (ddr_p, heap, 10 * 10); loop_nest = VEC_alloc (loop_p, heap, 3); - compute_data_dependences_for_loop (loop, true, loop_nest
[PATCH] Fix PR 48374
Hello, This patch fixes another problem with sel-sched not happy having bbs with zero successors. Bootstrapped and tested on x86_64/linux. Again, this is not a regression as __builtin_unreachable did not exist before sel-sched, but the patch is very safe and obvious. It is my fault not committing this earlier (the patch in the audit trail was created in April 2011). What do release managers think? Andrey gcc/ 2012-01-25 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/48374 * sel-sched-ir.h (get_all_loop_exits): Check for zero successors. testsuite/ 2012-01-25 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/48374 * gcc.dg/pr48374.c: New test. diff --git a/gcc/sel-sched-ir.h b/gcc/sel-sched-ir.h index c8f8be6..ede08e4 100644 --- a/gcc/sel-sched-ir.h +++ b/gcc/sel-sched-ir.h @@ -1119,7 +1119,8 @@ get_all_loop_exits (basic_block bb) /* If bb is empty, and we're skipping to loop exits, then consider bb as a possible gate to the inner loop now. */ while (sel_bb_empty_or_nop_p (bb) - in_current_region_p (bb)) + in_current_region_p (bb) + EDGE_COUNT (bb-succs) 0) { bb = single_succ (bb); diff --git a/gcc/testsuite/gcc.dg/pr48374.c b/gcc/testsuite/gcc.dg/pr48374.c new file mode 100644 index 000..24826d5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr48374.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fschedule-insns2 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops -fselective-scheduling2 --param max-sched-extend-regions-iters=2 } */ + +void foo (int y) +{ + switch (y) +{ +case 3: +case 5: +case 7: +case 11: + break; +default: + __builtin_unreachable (); +} +} +
Re: [PATCH] Fix PR 51389
On 25.01.2012 16:38, Richard Guenther wrote: 2012/1/25 Andrey Belevantseva...@ispras.ru: Hello, In this PR data dependence analysis goes wild by trying to process20k datarefs, so the patch limits the number of datarefs per loop we handle to 1000 via a param. On the way find_data_references_in_loop is made static and predcom/parloops are fixed for compute_data_dependences_for_loop returning false. Bootstrapped and tested on x86_64-linux, no testcase as it is really a memory hog. Strictly speaking, this could be a regression only from the time when -O3 didn't have predcom/vectorization turned on by default, so -- ok for trunk or 4.8? Branches? You limit the number of data references we find - but that isn't really the scalability issue (it's linear in the number of stmts). It's really compute_all_dependences that hits the quadraticness, thus _that_ function should fail instead (I realize it doesn't return any success/failure state yet, but the number of callers is limited). Sure, I've just had the impression that the datarefs vector is no use without the dependencies themselves. The possibility of making find_data_references_in_loop static also kind of hints in this direction. Anyways, I don't have any problem with fixing compute_all_dependences instead. Btw, new params need documentation in doc/invoke.texi. Bah, I convinced myself params.def was the only documentation :-) The tree-predcom.c and tree-parloops.c changes are ok for trunk anyway. I will commit them separately. Andrey
[PATCH, RTL] Fix PR 51106
Hello, As discussed with Jakub in the PR, the problem is that a jump asm goto insn is removed without purging the dead edges. Bootstrapped and tested on x86-64. I had to use dg-prune-output to remove the error message and to make the test pass, as we only checking for an ICE. OK for trunk/4.6/4.5? Andrey gcc/ 2012-01-19 Andrey Belevantsev a...@ispras.ru PR target/51106 * function.c (instantiate_virtual_regs_in_insn): Use delete_insn_and_edges when removing a wrong asm insn. testsuite/ 2012-01-19 Jakub Jelinek ja...@redhat.com PR target/51106 * gcc.c-torture/compile/pr51106.c: New test. diff --git a/gcc/function.c b/gcc/function.c index fcb79f5..94e51f4 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -1737,7 +1737,7 @@ instantiate_virtual_regs_in_insn (rtx insn) if (!check_asm_operands (PATTERN (insn))) { error_for_asm (insn, impossible constraint in %asm%); - delete_insn (insn); + delete_insn_and_edges (insn); } } else diff --git a/gcc/testsuite/gcc.c-torture/compile/pr51106.c b/gcc/testsuite/gcc.c-torture/compile/pr51106.c new file mode 100644 index 000..2f1c51d --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr51106.c @@ -0,0 +1,19 @@ +/* { dg-options -w } */ +/* { dg-prune-output impossible constraint } */ +int +foo (int x) +{ + asm goto ( : : i (x) : : lab); + return 1; +lab: + return 0; +} + +int +bar (int x) +{ + asm goto ( : : i (x) : : lab); + __builtin_unreachable (); +lab: + return 0; +}
[PATCH] Fix PR 51505
Hello, As discussed in Bugzilla, this is the patch implementing Paolo's suggestion of killing REG_EQUAL/REG_EQUIV notes from df_kill_notes. The code assumes there is at most one such note per insn. Bootstrapped and tested on x86-64, ok for trunk? Andrey gcc: 2012-01-18 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/51505 * df-problems.c (df_kill_notes): New parameter live. Update comment. Remove REG_EQUAL/REG_EQUIV notes referring to dead registers. (df_note_bb_compute): Update the call to df_kill_notes. testsuite: 2012-01-18 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/51505 * gcc.dg/pr51505.c: New test. diff --git a/gcc/df-problems.c b/gcc/df-problems.c index 8928454..f9b0bc1 100644 --- a/gcc/df-problems.c +++ b/gcc/df-problems.c @@ -2748,10 +2748,12 @@ df_ignore_stack_reg (int regno ATTRIBUTE_UNUSED) /* Remove all of the REG_DEAD or REG_UNUSED notes from INSN and add - them to OLD_DEAD_NOTES and OLD_UNUSED_NOTES. */ + them to OLD_DEAD_NOTES and OLD_UNUSED_NOTES. Remove also + REG_EQUAL/REG_EQUIV notes referring to dead pseudos using LIVE + as the bitmap of currently live registers. */ static void -df_kill_notes (rtx insn) +df_kill_notes (rtx insn, bitmap live) { rtx *pprev = REG_NOTES (insn); rtx link = *pprev; @@ -2798,6 +2800,45 @@ df_kill_notes (rtx insn) } break; + case REG_EQUAL: + case REG_EQUIV: + { + /* Remove the notes that refer to dead registers. As we have at most + one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note + so we need to purge the complete EQ_USES vector when removing + the note using df_notes_rescan. */ + df_ref *use_rec; + bool deleted = false; + + for (use_rec = DF_INSN_EQ_USES (insn); *use_rec; use_rec++) + { + df_ref use = *use_rec; + if (DF_REF_REGNO (use) FIRST_PSEUDO_REGISTER + (DF_REF_FLAGS (use) DF_REF_IN_NOTE) + ! bitmap_bit_p (live, DF_REF_REGNO (use))) + { + deleted = true; + break; + } + } + if (deleted) + { + rtx next; +#ifdef REG_DEAD_DEBUGGING + df_print_note (deleting: , insn, link); +#endif + next = XEXP (link, 1); + free_EXPR_LIST_node (link); + *pprev = link = next; + df_notes_rescan (insn); + } + else + { + pprev = XEXP (link, 1); + link = *pprev; + } + break; + } default: pprev = XEXP (link, 1); link = *pprev; @@ -3299,7 +3340,7 @@ df_note_bb_compute (unsigned int bb_index, debug_insn = DEBUG_INSN_P (insn); bitmap_clear (do_not_gen); - df_kill_notes (insn); + df_kill_notes (insn, live); /* Process the defs. */ if (CALL_P (insn)) diff --git a/gcc/testsuite/gcc.dg/pr51505.c b/gcc/testsuite/gcc.dg/pr51505.c new file mode 100644 index 000..dbcd322 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr51505.c @@ -0,0 +1,19 @@ +/* PR rtl-optimization/51505 */ +/* { dg-do compile } */ +/* { dg-options -O --param max-cse-insns=1 } */ +struct S +{ +char a[256]; +}; + +int bar(struct S, char[16]); + +void foo () +{ + struct S u, s1, s2; + char e[256]; + char i; + e[i] = ~s1.a[i] s2.a[i]; + if (bar(u, e)) +__builtin_abort (); +}
Re: [PATCH] Fix PR 51505
On 18.01.2012 21:28, Paolo Bonzini wrote: On 01/18/2012 05:41 PM, Andrey Belevantsev wrote: Ok, thanks for working on this. Installed, do you want this for 4.6/4.5? Andrey
Re: [PATCH] Copy over REG_ARGS_SIZE notes in sel-sched (PR debug/51557)
On 16.12.2011 11:53, Jakub Jelinek wrote: On Fri, Dec 16, 2011 at 09:47:52AM +0400, Andrey Belevantsev wrote: Thus, if an insn has any note that prevents it from being cloned, we need to fix the above place. (If an insn should not even be moved from its block, then CANT_MOVE should be set to 1 in the same function.) Then create_copy_of_insn_rtx can safely copy all the notes as we already know that only clonable insns get to this function. The REG_EQUAL/REG_EQUIV notes may be dropped -- I thought these are supposed to be recomputed by df these days, am I wrong? REG_DEAD/REG_UNUSED notes are recomputed by df, I don't think anything recomputes REG_EQUAL/REG_EQUIV. Eric agrees with you, so the patch is OK. Thanks for the patch and for enlightening me. And please feel free to CC/assign me on any PRs mentioning selective scheduler. Andrey
Re: [PATCH] Copy over REG_ARGS_SIZE notes in sel-sched (PR debug/51557)
On 16.12.2011 0:42, Jakub Jelinek wrote: On Thu, Dec 15, 2011 at 09:37:45AM -0800, Richard Henderson wrote: On 12/15/2011 09:32 AM, Jakub Jelinek wrote: PR debug/51557 * sel-sched-ir.c (create_copy_of_insn_rtx): Copy REG_ARGS_SIZE notes. * gcc.dg/pr51557.c: New test. There are plenty of other notes that could as well need duplication. E.g. REG_INC, REG_FRAME_RELATED_EXPR, REG_CFA_*, REG_EH_REGION. I wasn't sure if it is safe to duplicate them. I guess most of them are safe, not sure about REG_EQUAL/REG_EQUIV. I can certainly try to copy all the notes, e.g. emit_copy_of_insn_after seems to copy all of them but REG_LABEL_OPERAND. We need to clone only those insns that are safe. In sel-sched-ir.c:init_global_and_expr_for_insn we check a number of conditions that indicate the insn being un-copyable (force_unique_p is set to true for those; the relevant code starts from line 2954). Previously I fixed this about a year ago wrt prologue and epilogue insns, you can check the thread with the http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00013.html message. Thus, if an insn has any note that prevents it from being cloned, we need to fix the above place. (If an insn should not even be moved from its block, then CANT_MOVE should be set to 1 in the same function.) Then create_copy_of_insn_rtx can safely copy all the notes as we already know that only clonable insns get to this function. The REG_EQUAL/REG_EQUIV notes may be dropped -- I thought these are supposed to be recomputed by df these days, am I wrong? Jakub, I am happy to prepare a patch along these lines and to adjust it to mark un-copyable the insns with any notes you or Richard think may be problematic. It would also be great to have this information recorded in a general helper in rtlanal.c or somewhere, so we don't have to guess every time this comes up; what do you think? Andrey Jakub