Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling
> On Tue, Feb 14, 2017 at 2:13 PM, Bin.Chengwrote: > > On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka wrote: > >>> Thanks, > >>> bin > >>> 2017-02-13 Bin Cheng > >>> > >>> PR tree-optimization/79347 > >>> * tree-vect-loop-manip.c (apply_probability_for_bb): New function. > >>> (vect_do_peeling): Maintain profile counters during peeling. > >>> > >>> gcc/testsuite/ChangeLog > >>> 2017-02-13 Bin Cheng > >>> > >>> PR tree-optimization/79347 > >>> * gcc.dg/vect/pr79347.c: New test. > >> > >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> b/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> new file mode 100644 > >>> index 000..586c638 > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> @@ -0,0 +1,13 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-require-effective-target vect_int } */ > >>> +/* { dg-additional-options "-fdump-tree-vect-all" } */ > >>> + > >>> +short *a; > >>> +int c; > >>> +void n(void) > >>> +{ > >>> + for (int i = 0; i >>> +a[i]++; > >>> +} > >> > >> Thanks for fixing the prologue. I think there is still one extra problem > >> in the vectorizer. > >> With the internal vectorized loop I now see: > >> > >> ;; basic block 9, loop depth 1, count 0, freq 956, maybe hot > >> ;; Invalid sum of incoming frequencies 1961, should be 956 > >> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) > >> ;;pred: 10 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) > >> ;;8 [100.0%] (FALLTHRU) > >> # i_18 = PHI > >> # vectp_a.13_66 = PHI > >> # vectp_a.19_75 = PHI > >> # ivtmp_78 = PHI > >> _2 = (long unsigned int) i_18; > >> _3 = _2 * 2; > >> _4 = a.0_1 + _3; > >> vect__5.15_68 = MEM[(short int *)vectp_a.13_66]; > >> _5 = *_4; > >> vect__6.16_69 = VIEW_CONVERT_EXPR >> short>(vect__5.15_68); > >> _6 = (unsigned short) _5; > >> vect__7.17_71 = vect__6.16_69 + vect_cst__70; > >> _7 = _6 + 1; > >> vect__8.18_72 = VIEW_CONVERT_EXPR (vect__7.17_71); > >> _8 = (short int) _7; > >> MEM[(short int *)vectp_a.19_75] = vect__8.18_72; > >> i_14 = i_18 + 1; > >> vectp_a.13_67 = vectp_a.13_66 + 16; > >> vectp_a.19_76 = vectp_a.19_75 + 16; > >> ivtmp_79 = ivtmp_78 + 1; > >> if (ivtmp_79 < bnd.10_59) > >> goto ; [85.00%] > >> else > >> goto ; [15.00%] > >> > >> So it seems that the frequency of the loop itself is unrealistically > >> scaled down. > >> Before vetorizing the frequency is 8500 and predicted number of iterations > >> is > >> 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, > >> by > >> exit probability exits with 15% probability and thus still has 6.6 > >> iterations, > >> but by BB frequencies its body executes fewer times than the preheader. > >> > >> Now this is a fragile area vectirizing loop should scale number of > >> iterations down > >> 8 times. However guessed CFG profiles are always very "flat". Of course > >> if loop iterated 6.6 times at the average vectorizing would not make any > >> sense. > >> Making guessed profiles less flat is unrealistic, because average loop > >> iterates few times, > >> but of course while vectorizing we make additional guess that the > >> vectorizable loops > >> matters and the guessed profile is probably unrealistic. > > That's what I mentioned in the original patch. Vectorizer calls > > scale_loop_profile in > > function vect_transform_loop to scale down loop's frequency regardless > > mismatch > > between loop and preheader/exit basic blocks. In fact, after this > > patch all mismatches > > in vectorizer are introduced by this. I don't see any way to keep > > consistency beween > > vectorized loop and the rest program without visiting whole CFG. So > > shall we skip > > scaling down profile counters for vectorized loop? > > > >> > >> GCC 6 seems however bit more consistent. > >>> +/* Apply probability PROB to basic block BB and its single succ edge. */ > >>> + > >>> +static void > >>> +apply_probability_for_bb (basic_block bb, int prob) > >>> +{ > >>> + bb->frequency = apply_probability (bb->frequency, prob); > >>> + bb->count = apply_probability (bb->count, prob); > >>> + gcc_assert (single_succ_p (bb)); > >>> + single_succ_edge (bb)->count = bb->count; > >>> +} > >>> + > >>> /* Function vect_do_peeling. > >>> > >>> Input: > >>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > >>> niters, tree nitersm1, > >>> may be preferred. */ > >>>basic_block anchor = loop_preheader_edge (loop)->src; > >>>if (skip_vector) > >>> -split_edge (loop_preheader_edge (loop)); > >>> +{ > >>> + split_edge (loop_preheader_edge (loop)); > >>> + > >>> + /* Due to the order in which we peel
Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling
On Tue, Feb 14, 2017 at 2:13 PM, Bin.Chengwrote: > On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka wrote: >>> Thanks, >>> bin >>> 2017-02-13 Bin Cheng >>> >>> PR tree-optimization/79347 >>> * tree-vect-loop-manip.c (apply_probability_for_bb): New function. >>> (vect_do_peeling): Maintain profile counters during peeling. >>> >>> gcc/testsuite/ChangeLog >>> 2017-02-13 Bin Cheng >>> >>> PR tree-optimization/79347 >>> * gcc.dg/vect/pr79347.c: New test. >> >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c >>> b/gcc/testsuite/gcc.dg/vect/pr79347.c >>> new file mode 100644 >>> index 000..586c638 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c >>> @@ -0,0 +1,13 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target vect_int } */ >>> +/* { dg-additional-options "-fdump-tree-vect-all" } */ >>> + >>> +short *a; >>> +int c; >>> +void n(void) >>> +{ >>> + for (int i = 0; i >> +a[i]++; >>> +} >> >> Thanks for fixing the prologue. I think there is still one extra problem in >> the vectorizer. >> With the internal vectorized loop I now see: >> >> ;; basic block 9, loop depth 1, count 0, freq 956, maybe hot >> ;; Invalid sum of incoming frequencies 1961, should be 956 >> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) >> ;;pred: 10 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) >> ;;8 [100.0%] (FALLTHRU) >> # i_18 = PHI >> # vectp_a.13_66 = PHI >> # vectp_a.19_75 = PHI >> # ivtmp_78 = PHI >> _2 = (long unsigned int) i_18; >> _3 = _2 * 2; >> _4 = a.0_1 + _3; >> vect__5.15_68 = MEM[(short int *)vectp_a.13_66]; >> _5 = *_4; >> vect__6.16_69 = VIEW_CONVERT_EXPR (vect__5.15_68); >> _6 = (unsigned short) _5; >> vect__7.17_71 = vect__6.16_69 + vect_cst__70; >> _7 = _6 + 1; >> vect__8.18_72 = VIEW_CONVERT_EXPR (vect__7.17_71); >> _8 = (short int) _7; >> MEM[(short int *)vectp_a.19_75] = vect__8.18_72; >> i_14 = i_18 + 1; >> vectp_a.13_67 = vectp_a.13_66 + 16; >> vectp_a.19_76 = vectp_a.19_75 + 16; >> ivtmp_79 = ivtmp_78 + 1; >> if (ivtmp_79 < bnd.10_59) >> goto ; [85.00%] >> else >> goto ; [15.00%] >> >> So it seems that the frequency of the loop itself is unrealistically scaled >> down. >> Before vetorizing the frequency is 8500 and predicted number of iterations is >> 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, by >> exit probability exits with 15% probability and thus still has 6.6 >> iterations, >> but by BB frequencies its body executes fewer times than the preheader. >> >> Now this is a fragile area vectirizing loop should scale number of >> iterations down >> 8 times. However guessed CFG profiles are always very "flat". Of course >> if loop iterated 6.6 times at the average vectorizing would not make any >> sense. >> Making guessed profiles less flat is unrealistic, because average loop >> iterates few times, >> but of course while vectorizing we make additional guess that the >> vectorizable loops >> matters and the guessed profile is probably unrealistic. > That's what I mentioned in the original patch. Vectorizer calls > scale_loop_profile in > function vect_transform_loop to scale down loop's frequency regardless > mismatch > between loop and preheader/exit basic blocks. In fact, after this > patch all mismatches > in vectorizer are introduced by this. I don't see any way to keep > consistency beween > vectorized loop and the rest program without visiting whole CFG. So > shall we skip > scaling down profile counters for vectorized loop? > >> >> GCC 6 seems however bit more consistent. >>> +/* Apply probability PROB to basic block BB and its single succ edge. */ >>> + >>> +static void >>> +apply_probability_for_bb (basic_block bb, int prob) >>> +{ >>> + bb->frequency = apply_probability (bb->frequency, prob); >>> + bb->count = apply_probability (bb->count, prob); >>> + gcc_assert (single_succ_p (bb)); >>> + single_succ_edge (bb)->count = bb->count; >>> +} >>> + >>> /* Function vect_do_peeling. >>> >>> Input: >>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree >>> niters, tree nitersm1, >>> may be preferred. */ >>>basic_block anchor = loop_preheader_edge (loop)->src; >>>if (skip_vector) >>> -split_edge (loop_preheader_edge (loop)); >>> +{ >>> + split_edge (loop_preheader_edge (loop)); >>> + >>> + /* Due to the order in which we peel prolog and epilog, we first >>> + propagate probability to the whole loop. The purpose is to >>> + avoid adjusting probabilities of both prolog and vector loops >>> + separately. Note in this case, the probability of epilog loop >>> + needs to be
Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling
On 02/14/2017 07:57 AM, Jan Hubicka wrote: > So it seems that the frequency of the loop itself is unrealistically scaled > down. > Before vetorizing the frequency is 8500 and predicted number of iterations is > 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, by > exit probability exits with 15% probability and thus still has 6.6 iterations, > but by BB frequencies its body executes fewer times than the preheader. > > Now this is a fragile area vectirizing loop should scale number of iterations > down > 8 times. However guessed CFG profiles are always very "flat". Of course > if loop iterated 6.6 times at the average vectorizing would not make any > sense. > Making guessed profiles less flat is unrealistic, because average loop > iterates few times, > but of course while vectorizing we make additional guess that the > vectorizable loops > matters and the guessed profile is probably unrealistic. We have the same problem in the RTL loop unroller in that we'll scale the unrolled loop by the unroll factor (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212#c3), which can result in a loop with lower frequency than surrounding code. Problem is compounded if we vectorize the loop and then unroll it. Whatever approach is decided for the case when we have guessed profile should be applied to both vectorizer and RTL loop unroller. -Pat
Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling
On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubickawrote: >> Thanks, >> bin >> 2017-02-13 Bin Cheng >> >> PR tree-optimization/79347 >> * tree-vect-loop-manip.c (apply_probability_for_bb): New function. >> (vect_do_peeling): Maintain profile counters during peeling. >> >> gcc/testsuite/ChangeLog >> 2017-02-13 Bin Cheng >> >> PR tree-optimization/79347 >> * gcc.dg/vect/pr79347.c: New test. > >> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c >> b/gcc/testsuite/gcc.dg/vect/pr79347.c >> new file mode 100644 >> index 000..586c638 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "-fdump-tree-vect-all" } */ >> + >> +short *a; >> +int c; >> +void n(void) >> +{ >> + for (int i = 0; i > +a[i]++; >> +} > > Thanks for fixing the prologue. I think there is still one extra problem in > the vectorizer. > With the internal vectorized loop I now see: > > ;; basic block 9, loop depth 1, count 0, freq 956, maybe hot > ;; Invalid sum of incoming frequencies 1961, should be 956 > ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) > ;;pred: 10 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) > ;;8 [100.0%] (FALLTHRU) > # i_18 = PHI > # vectp_a.13_66 = PHI > # vectp_a.19_75 = PHI > # ivtmp_78 = PHI > _2 = (long unsigned int) i_18; > _3 = _2 * 2; > _4 = a.0_1 + _3; > vect__5.15_68 = MEM[(short int *)vectp_a.13_66]; > _5 = *_4; > vect__6.16_69 = VIEW_CONVERT_EXPR (vect__5.15_68); > _6 = (unsigned short) _5; > vect__7.17_71 = vect__6.16_69 + vect_cst__70; > _7 = _6 + 1; > vect__8.18_72 = VIEW_CONVERT_EXPR (vect__7.17_71); > _8 = (short int) _7; > MEM[(short int *)vectp_a.19_75] = vect__8.18_72; > i_14 = i_18 + 1; > vectp_a.13_67 = vectp_a.13_66 + 16; > vectp_a.19_76 = vectp_a.19_75 + 16; > ivtmp_79 = ivtmp_78 + 1; > if (ivtmp_79 < bnd.10_59) > goto ; [85.00%] > else > goto ; [15.00%] > > So it seems that the frequency of the loop itself is unrealistically scaled > down. > Before vetorizing the frequency is 8500 and predicted number of iterations is > 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, by > exit probability exits with 15% probability and thus still has 6.6 iterations, > but by BB frequencies its body executes fewer times than the preheader. > > Now this is a fragile area vectirizing loop should scale number of iterations > down > 8 times. However guessed CFG profiles are always very "flat". Of course > if loop iterated 6.6 times at the average vectorizing would not make any > sense. > Making guessed profiles less flat is unrealistic, because average loop > iterates few times, > but of course while vectorizing we make additional guess that the > vectorizable loops > matters and the guessed profile is probably unrealistic. That's what I mentioned in the original patch. Vectorizer calls scale_loop_profile in function vect_transform_loop to scale down loop's frequency regardless mismatch between loop and preheader/exit basic blocks. In fact, after this patch all mismatches in vectorizer are introduced by this. I don't see any way to keep consistency beween vectorized loop and the rest program without visiting whole CFG. So shall we skip scaling down profile counters for vectorized loop? > > GCC 6 seems however bit more consistent. >> +/* Apply probability PROB to basic block BB and its single succ edge. */ >> + >> +static void >> +apply_probability_for_bb (basic_block bb, int prob) >> +{ >> + bb->frequency = apply_probability (bb->frequency, prob); >> + bb->count = apply_probability (bb->count, prob); >> + gcc_assert (single_succ_p (bb)); >> + single_succ_edge (bb)->count = bb->count; >> +} >> + >> /* Function vect_do_peeling. >> >> Input: >> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree >> niters, tree nitersm1, >> may be preferred. */ >>basic_block anchor = loop_preheader_edge (loop)->src; >>if (skip_vector) >> -split_edge (loop_preheader_edge (loop)); >> +{ >> + split_edge (loop_preheader_edge (loop)); >> + >> + /* Due to the order in which we peel prolog and epilog, we first >> + propagate probability to the whole loop. The purpose is to >> + avoid adjusting probabilities of both prolog and vector loops >> + separately. Note in this case, the probability of epilog loop >> + needs to be scaled back later. */ >> + basic_block bb_before_loop = loop_preheader_edge (loop)->src; >> + apply_probability_for_bb (bb_before_loop, prob_vector); > Aha, this is the bit I missed while trying to fix it
Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling
> Thanks, > bin > 2017-02-13 Bin Cheng> > PR tree-optimization/79347 > * tree-vect-loop-manip.c (apply_probability_for_bb): New function. > (vect_do_peeling): Maintain profile counters during peeling. > > gcc/testsuite/ChangeLog > 2017-02-13 Bin Cheng > > PR tree-optimization/79347 > * gcc.dg/vect/pr79347.c: New test. > diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c > b/gcc/testsuite/gcc.dg/vect/pr79347.c > new file mode 100644 > index 000..586c638 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "-fdump-tree-vect-all" } */ > + > +short *a; > +int c; > +void n(void) > +{ > + for (int i = 0; i +a[i]++; > +} Thanks for fixing the prologue. I think there is still one extra problem in the vectorizer. With the internal vectorized loop I now see: ;; basic block 9, loop depth 1, count 0, freq 956, maybe hot ;; Invalid sum of incoming frequencies 1961, should be 956 ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) ;;pred: 10 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) ;;8 [100.0%] (FALLTHRU) # i_18 = PHI # vectp_a.13_66 = PHI # vectp_a.19_75 = PHI # ivtmp_78 = PHI _2 = (long unsigned int) i_18; _3 = _2 * 2; _4 = a.0_1 + _3; vect__5.15_68 = MEM[(short int *)vectp_a.13_66]; _5 = *_4; vect__6.16_69 = VIEW_CONVERT_EXPR (vect__5.15_68); _6 = (unsigned short) _5; vect__7.17_71 = vect__6.16_69 + vect_cst__70; _7 = _6 + 1; vect__8.18_72 = VIEW_CONVERT_EXPR (vect__7.17_71); _8 = (short int) _7; MEM[(short int *)vectp_a.19_75] = vect__8.18_72; i_14 = i_18 + 1; vectp_a.13_67 = vectp_a.13_66 + 16; vectp_a.19_76 = vectp_a.19_75 + 16; ivtmp_79 = ivtmp_78 + 1; if (ivtmp_79 < bnd.10_59) goto ; [85.00%] else goto ; [15.00%] So it seems that the frequency of the loop itself is unrealistically scaled down. Before vetorizing the frequency is 8500 and predicted number of iterations is 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, by exit probability exits with 15% probability and thus still has 6.6 iterations, but by BB frequencies its body executes fewer times than the preheader. Now this is a fragile area vectirizing loop should scale number of iterations down 8 times. However guessed CFG profiles are always very "flat". Of course if loop iterated 6.6 times at the average vectorizing would not make any sense. Making guessed profiles less flat is unrealistic, because average loop iterates few times, but of course while vectorizing we make additional guess that the vectorizable loops matters and the guessed profile is probably unrealistic. GCC 6 seems however bit more consistent. > +/* Apply probability PROB to basic block BB and its single succ edge. */ > + > +static void > +apply_probability_for_bb (basic_block bb, int prob) > +{ > + bb->frequency = apply_probability (bb->frequency, prob); > + bb->count = apply_probability (bb->count, prob); > + gcc_assert (single_succ_p (bb)); > + single_succ_edge (bb)->count = bb->count; > +} > + > /* Function vect_do_peeling. > > Input: > @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > niters, tree nitersm1, > may be preferred. */ >basic_block anchor = loop_preheader_edge (loop)->src; >if (skip_vector) > -split_edge (loop_preheader_edge (loop)); > +{ > + split_edge (loop_preheader_edge (loop)); > + > + /* Due to the order in which we peel prolog and epilog, we first > + propagate probability to the whole loop. The purpose is to > + avoid adjusting probabilities of both prolog and vector loops > + separately. Note in this case, the probability of epilog loop > + needs to be scaled back later. */ > + basic_block bb_before_loop = loop_preheader_edge (loop)->src; > + apply_probability_for_bb (bb_before_loop, prob_vector); Aha, this is the bit I missed while trying to fix it myself. I scale_bbs_frequencies_int(_before_loop, 1, prob_vector, REG_BR_PROB_BASE) to do this. I plan to revamp API for this next stage1, but lets keep this consistent. Path is OK with this change and ... > + scale_loop_profile (loop, prob_vector, bound); ... please try to check if scaling is really done reasonably. From the above it seems that the vectorized loop is unrealistically scalled down that may prevent further optimization for speed... Thanks for looking into this, Honza
[PATCH PR79347]Maintain profile counter information in vect_do_peeling
Hi, This patch fixes issue reported by PR79347 by calculating/maintaining profile counter information on the fly in vect_do_peeling. Due to the order that we first peel prologue loop, peel epilogue loop, and then add guarding edge skipping prolog+vector loop if niter is small, this patch takes a trick that firstly scales down counters for loop before peeling and scales counters back after adding the aforementioned guarding edge. Otherwise, more work would be needed to calculate counters for prolog and vector loop. After this patch, # of profile counter for tramp3d benchmark is improved from: tramp3d-v4.cpp.157t.ifcvt:296 tramp3d-v4.cpp.158t.vect:1118 tramp3d-v4.cpp.159t.dce6:1118 tramp3d-v4.cpp.160t.pcom:1118 tramp3d-v4.cpp.161t.cunroll:1019 tramp3d-v4.cpp.162t.slp1:1019 tramp3d-v4.cpp.164t.ivopts:1019 tramp3d-v4.cpp.165t.lim4:1019 tramp3d-v4.cpp.166t.loopdone:1007 tramp3d-v4.cpp.167t.no_loop:31 ... tramp3d-v4.cpp.226t.optimized:1009 to: tramp3d-v4.cpp.157t.ifcvt:296 tramp3d-v4.cpp.158t.vect:814 tramp3d-v4.cpp.159t.dce6:814 tramp3d-v4.cpp.160t.pcom:814 tramp3d-v4.cpp.161t.cunroll:723 tramp3d-v4.cpp.162t.slp1:723 tramp3d-v4.cpp.164t.ivopts:723 tramp3d-v4.cpp.165t.lim4:723 tramp3d-v4.cpp.166t.loopdone:711 tramp3d-v4.cpp.167t.no_loop:31 ... tramp3d-v4.cpp.226t.optimized:831 Bootstrap and test on x86_64 and AArch64. Is it OK? BTW, with the patch, vectorizer only introduces mismatches by below code in vect_transform_loop: /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); Though it makes sense to scale down according to vect-factor, but it definitely introduces mismatch between vector_loop's frequency and the rest program. I also believe it is not that useful to scale here, especially without profiling information. At least we need to make vector_loop's frequency consistent with the rest program. Thanks, bin 2017-02-13 Bin ChengPR tree-optimization/79347 * tree-vect-loop-manip.c (apply_probability_for_bb): New function. (vect_do_peeling): Maintain profile counters during peeling. gcc/testsuite/ChangeLog 2017-02-13 Bin Cheng PR tree-optimization/79347 * gcc.dg/vect/pr79347.c: New test.diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c b/gcc/testsuite/gcc.dg/vect/pr79347.c new file mode 100644 index 000..586c638 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fdump-tree-vect-all" } */ + +short *a; +int c; +void n(void) +{ + for (int i = 0; i frequency = apply_probability (bb->frequency, prob); + bb->count = apply_probability (bb->count, prob); + gcc_assert (single_succ_p (bb)); + single_succ_edge (bb)->count = bb->count; +} + /* Function vect_do_peeling. Input: @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, may be preferred. */ basic_block anchor = loop_preheader_edge (loop)->src; if (skip_vector) -split_edge (loop_preheader_edge (loop)); +{ + split_edge (loop_preheader_edge (loop)); + + /* Due to the order in which we peel prolog and epilog, we first +propagate probability to the whole loop. The purpose is to +avoid adjusting probabilities of both prolog and vector loops +separately. Note in this case, the probability of epilog loop +needs to be scaled back later. */ + basic_block bb_before_loop = loop_preheader_edge (loop)->src; + apply_probability_for_bb (bb_before_loop, prob_vector); + scale_loop_profile (loop, prob_vector, bound); +} tree niters_prolog = build_int_cst (type, 0); source_location loop_loc = find_loop_location (loop); @@ -1727,6 +1749,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, niters_prolog, build_int_cst (type, 0)); guard_bb = loop_preheader_edge (prolog)->src; + basic_block bb_after_prolog = loop_preheader_edge (loop)->src; guard_to = split_edge (loop_preheader_edge (loop)); guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,