Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-15 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: Hi, Thanks for all your comments! > Hi! > > On Wed, Apr 15, 2020 at 08:21:03AM +0200, Richard Biener wrote: >> On Wed, Apr 15, 2020 at 3:56 AM Jiufu Guo via Gcc-patches >> wrote: >> > As you may know, we have loop unroll pass in

Re: [RFC] split pseudos during loop unrolling in RTL unroller

2020-04-15 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > Hi Jiufu, > > Just reviewing random things as I see them... > > On Wed, Apr 15, 2020 at 09:56:00AM +0800, Jiufu Guo wrote: >> This patch only supports simple loops: one exit edge with one major basic >> block. > > That is fine for a proof-of-concept, but will need

[RFC] split pseudos during loop unrolling in RTL unroller

2020-04-14 Thread Jiufu Guo via Gcc-patches
Hi, As you may know, we have loop unroll pass in RTL which was introduced a few years ago, and works for a long time. Currently, this unroller is using the pseudos in the original body, and then the pseudos are written multiple times. It would be a good idea to create new pseudos for those

Re: [PATCH] rs6000: Check -+0 and NaN for smax/smin generation

2020-03-26 Thread Jiufu Guo via Gcc-patches
Matthias Klose writes: Thanks so much for all of you for pay attention and take care of this. Matthias and Segher point out this; Joseph helped remove this file. Sorry for spend your extra time on this. Thanks again! > diff --git a/a b/a > new file mode 100644 > index

Re: [PATCH V2] correct COUNT and PROB for unrolled loop

2020-05-19 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Hi, I'd like to ping this patch for trunk on stage 1. This patch could fix the issue on incorrect COUNT/FREQUENCES of loop unrolled blocks, and also could help the improve the cold/hot issue of the unrolled loops. patch is also at

Re: [PATCH V2] correct COUNT and PROB for unrolled loop

2020-03-18 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Hi! I'd like to ping following patch. As near end of gcc10 stage 4, it seems I would ask approval for GCC11 trunk. Thanks, Jiufu Guo > Hi Honza and all, > > I updated the patch a little as below. Bootstrap and regtest are ok > on powerpc64le. > > Is OK for trunk? > > Thanks

Re: [PATCH] rs6000: Check -+0 and NaN for smax/smin generation

2020-03-19 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Backported to GCC 9, preapproved by Segher. Thanks, Jiufu > Segher Boessenkool writes: > >> Hi! >> >> On Thu, Mar 05, 2020 at 10:46:58AM +0800, Jiufu Guo wrote: >>> PR93709 mentioned regressions on maxlocval_4.f90 and minlocval_f.f90 which >>> relates to max of '-inf' and

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote: >> I think this is the wrong way to approach this. You're doing too many >> things at once. Try to fix the powerpc regression with the extra >> flag_rtl_unroll_loops, that could be backported. Then

[PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-19 Thread Jiufu Guo via Gcc-patches
Hi, In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2. At the same time, the GIMPLE cunroll is also enabled, while it is not only for simple loops. This patch introduces a hook to check if a loop is suitable to unroll completely. The hook can be used to check if a

Re: [PATCH 2/2] rs6000: tune loop size for cunroll at O2

2020-05-19 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Hi, > > This patch check the size of a loop to be unrolled/peeled completely, > and set the limits to a number (24). This prevents large loop from > being unrolled, then avoid binary size increasing, and this limit keeps > performance. > > Bootstrap pass on powerpc64le, ok

[PATCH 2/2] rs6000: tune loop size for cunroll at O2

2020-05-19 Thread Jiufu Guo via Gcc-patches
Hi, This patch check the size of a loop to be unrolled/peeled completely, and set the limits to a number (24). This prevents large loop from being unrolled, then avoid binary size increasing, and this limit keeps performance. Bootstrap pass on powerpc64le, ok for trunk? Jiufu ---

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-21 Thread Jiufu Guo via Gcc-patches
Jan Hubicka writes: >> Segher Boessenkool writes: >> >> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote: >> >> I think this is the wrong way to approach this. You're doing too many >> >> things at once. Try to fix the powerpc regression with the extra >> >>

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-21 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Jan Hubicka writes: > >>> Segher Boessenkool writes: >>> >>> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote: >>> >> I think this is the wrong way to approach this. You're doing too many >>> >> things at once. Try to fix the powerpc regression with the

Re: [PATCH] Check calls before loop unrolling

2020-08-31 Thread Jiufu Guo via Gcc-patches
guojiufu writes: Hi, In this patch, the default value of param=max-unrolled-average-calls-x1 is '0', which means to unroll a loop, there should be no call inside the body. Do I need to set the default value to a bigger value (16?) for later tune? Biger value will keep the behavior

Re: [PATCH] Check calls before loop unrolling

2020-09-15 Thread Jiufu Guo via Gcc-patches
Hi all, This patch sets the default value to 16 for parameter max_unrolled_average_calls which could be used to restict calls in loop when unrolling. This default value(16) is a big number which keeps current behavior for almost all cases. Bootstrap and regtest pass on powerpc64le. Is this ok

Re: [PATCH] Check calls before loop unrolling

2020-08-24 Thread Jiufu Guo via Gcc-patches
On 2020-08-24 19:16, Jan Hubicka wrote: On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches wrote: > > Hi, > > This patch is checking the _average_ number of calls which is the > summary of call numbers multiply the possibility of the call maybe > executed. The _average_ number could be a

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, May 27, 2020 at 6:36 AM Jiufu Guo wrote: >> >> Segher Boessenkool writes: >> >> > Hi! >> > >> > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote: >> >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool >> >> wrote: >> >> > Yes, cunroll does not

Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Jiufu Guo via Gcc-patches
David Edelsohn writes: > On Mon, May 25, 2020 at 1:58 PM Richard Biener > wrote: >> >> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool >> wrote: >> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote: >> >> On Mon, May 25, 2020 at 1:10 PM guojiufu >> >wrote: >> >> Since

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool > wrote: >> >> On Mon, May 25, 2020 at 02:39:54PM +0200, Richard Biener wrote: >> > On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool >> > wrote: >> > > > The split above allows the "bug" to be fixed (even on the

Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Thu, May 28, 2020 at 10:52 AM guojiufu wrote: >> >> From: Jiufu Guo >> >> Currently GIMPLE complete unroller(cunroll) is checking >> flag_unroll_loops and flag_peel_loops to see if allow size growth. >> Beside affects curnoll, flag_unroll_loops also controls RTL

Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-05-28 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Thu, May 28, 2020 at 4:37 PM Jiufu Guo wrote: >> >> Richard Biener writes: >> >> > On Thu, May 28, 2020 at 10:52 AM guojiufu wrote: >> >> >> >> From: Jiufu Guo >> >> >> >> Currently GIMPLE complete unroller(cunroll) is checking >> >> flag_unroll_loops and

Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > Hi! > > On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote: >> For GIMPLE level transforms I don't think targets have more knowledge >> than the middle-end. > > Yes, certainly. > >> In fact GIMPLE complete unrolling is about >> secondary effects, removing

Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-01 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Hi, I updated the patch just a little accordinlgy. Thanks! diff --git a/gcc/common.opt b/gcc/common.opt index 4464049fc1f..570e2aa53c8 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2856,6 +2856,10 @@ funroll-all-loops Common Report Var(flag_unroll_all_loops)

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-26 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > Hi! > > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote: >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool >> wrote: >> > Yes, cunroll does not have its own option, and that is a problem. But >> > that is easy to fix! Either with an option, or

Re: [PATCH 2/2] rs6000: tune loop size for cunroll at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
"Kewen.Lin" writes: > Hi Jeff, > > on 2020/5/20 上午11:58, Jiufu Guo via Gcc-patches wrote: >> Hi, >> >> This patch check the size of a loop to be unrolled/peeled completely, >> and set the limits to a number (24). This prevents large loop fro

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches > wrote: >> >> Hi, >> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2. >> At the same time, the GIMPLE cunroll is also enabled, while it

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches > wrote: >> >> Hi, >> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at -O2. >> At the same time, the GIMPLE cunroll is also enabled, while it

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

2020-05-20 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, May 20, 2020 at 10:27 AM Jiufu Guo wrote: >> >> Richard Biener writes: >> >> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches >> > wrote: >> >> >> >> Hi, >> >> >> >&

Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-07 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: Thanks all! > Hi! > > On Mon, Jul 06, 2020 at 03:13:13PM +0800, guojiufu wrote: >> For very small loops (< 6 insns), it would be fine to unroll 4 >> times to use cache line better. Like below loops: >> `while (i) a[--i] = NULL; while (p < e) *d++ = *p++;` > > Yes,

Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-07 Thread Jiufu Guo via Gcc-patches
will schmidt writes: Thanks! > On Mon, 2020-07-06 at 15:13 +0800, guojiufu via Gcc-patches wrote: > > Hi, > > Assorted comments below. thanks :-) > >> For very small loops (< 6 insns), it would be fine to unroll 4 >> times to use cache line better. Like below loops: >> `while (i) a[--i] =

Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-09 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: Hi, > On Wed, Jul 08, 2020 at 11:39:56AM +0800, Jiufu Guo wrote: >> Segher Boessenkool writes: >> > I am not happy about what is considered "a complex loop" here. >> For early exit, which may cause and *next* unrolled iterations may be >> not executed, then unroll

Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-10 Thread Jiufu Guo via Gcc-patches
Hi Martin, Martin Liška writes: > On 7/10/20 4:14 AM, Jiufu Guo wrote: >> Thanks so much for your time and kindly help!!! > > And I run your patch on SPEC2006 with: > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549728.html > > Doing that I see just few changes: > > diff -qr

Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-10 Thread Jiufu Guo via Gcc-patches
Hi, Segher Boessenkool writes: > Hi Jiufu, > > On Thu, Jul 09, 2020 at 04:01:38PM +0800, Jiufu Guo wrote: >> Segher Boessenkool writes: >> >> But for each single condition, loop unrolling may still be helpful. >> >> While, if these conditions are all occur in a loop, it would be more >> >>

Re: [PATCH] rs6000: Refine RTL unroll adjust hook

2020-07-10 Thread Jiufu Guo via Gcc-patches
Hi Segher, Thanks a lot for your time and helpful comments! Segher Boessenkool writes: > Hi Jiufu, > > On Thu, Jul 09, 2020 at 04:01:38PM +0800, Jiufu Guo wrote: >> Segher Boessenkool writes: ... > If the generic code decides to unroll big loops with calls *and* jumps, > there is a big

Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-01 Thread Jiufu Guo via Gcc-patches
/gcc-patches/2020-02/msg00927.html > > BR, > Jiufu Guo > >> Jiufu Guo via Gcc-patches writes: >> >> Hi, >> >> I would like to reping this, hope to get approval for this patch. >> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html

Re: [PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-07-09 Thread Jiufu Guo via Gcc-patches
*b, long int n) +{ + long int i; + + for (i = 0; i < n; i++) +a[i] = *b; +} + +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 "alignments"} } */ + -- 2.7.4 Thanks! Jiufu Guo. Martin Liška writes: > On 7/2/20 4:35 AM, Jiufu Guo via Gcc-patc

[PATCH V2] PING^2 correct COUNT and PROB for unrolled loop

2020-06-17 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Gentle ping. https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html BR, Jiufu Guo > Jiufu Guo via Gcc-patches writes: > > Hi, > > I would like to reping this, hope to get approval for this patch. > https://gcc.gnu.org/legacy-ml/gcc-patches/202

Re: [PATCH V2 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-07 Thread Jiufu Guo via Gcc-patches
On 2020-06-05 01:53, Segher Boessenkool wrote: On Thu, Jun 04, 2020 at 08:46:23AM +0200, Richard Biener wrote: On Thu, Jun 4, 2020 at 5:34 AM Jiufu Guo wrote: > Patch is updated a little according to comments. > Please see if this is ok to commit. OK with a proper ChangeLog after bootstrap /

Re: [PATCH V2 2/2] rs6000: allow cunroll to grow size according to -funroll-loop or -fpeel-loops

2020-06-03 Thread Jiufu Guo via Gcc-patches
guojiufu writes: > From: Jiufu Guo > > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4567,7 +4567,12 @@ rs6000_option_override_internal (bool global_init_p) > unroll_only_small_loops = 0; > if (!global_options_set.x_flag_rename_registers) >

Re: [PATCH V2 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-03 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Hi, Patch is updated a little according to comments. Please see if this is ok to commit. diff --git a/gcc/common.opt b/gcc/common.opt index 4464049fc1f..570e2aa53c8 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2856,6 +2856,10 @@ funroll-all-loops Common Report

Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-02 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo wrote: >> >> Jiufu Guo writes: >> >> Hi, >> >> I updated the patch just a little accordinlgy. Thanks! >> >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 4464049fc1f..570e2aa53c8 100644 >> --- a/gcc/common.opt >> +++

[PATCH V2] PING^ correct COUNT and PROB for unrolled loop

2020-06-02 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches writes: Hi, I would like to reping this, hope to get approval for this patch. https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html BR, Jiufu Guo > Jiufu Guo writes: > > Hi, > > I'd like to ping this patch for trunk on stage 1. > >

Re: [PATCH 1/2] correct BB frequencies after loop changed

2020-12-03 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Jeff Law writes: > >> On 11/18/20 12:28 AM, Richard Biener wrote: >>> On Tue, 17 Nov 2020, Jeff Law wrote: >>> Minor questions for Jan and Richi embedded below... On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote: > When investigating the issue from

Re: [PATCH 1/2] correct BB frequencies after loop changed

2020-12-03 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Jiufu Guo writes: > >> Jeff Law writes: >> >>> On 11/18/20 12:28 AM, Richard Biener wrote: On Tue, 17 Nov 2020, Jeff Law wrote: > Minor questions for Jan and Richi embedded below... > > On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote: >> When

Re: [PATCH 1/2] correct BB frequencies after loop changed

2020-11-23 Thread Jiufu Guo via Gcc-patches
Jeff Law writes: > On 11/18/20 12:28 AM, Richard Biener wrote: >> On Tue, 17 Nov 2020, Jeff Law wrote: >> >>> Minor questions for Jan and Richi embedded below... >>> >>> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote: When investigating the issue from

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-11 Thread Jiufu Guo via Gcc-patches
Thanks a lot for the sugguestion from previous mails. The patch was updated accordingly. This updated patch propagates loop-closed PHIs them out after loop_optimizer_finalize under a new introduced flag. At some cases, to clean up loop-closed PHIs would save efforts of optimization passes

Re: [PATCH 2/2] reset edge probibility and BB-count for peeled/unrolled loop

2020-11-11 Thread Jiufu Guo via Gcc-patches
guojiufu writes: Hi Honza, all, Just want to ping this for review. Original messages: [PATCH 2/2] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html [PATCH 1/2] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html Thanks, Jiufu Guo. > Hi, > PR68212 mentioned that

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-16 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, 11 Nov 2020, Jiufu Guo wrote: > >> >> Thanks a lot for the sugguestion from previous mails. >> The patch was updated accordingly. >> >> This updated patch propagates loop-closed PHIs them out after >> loop_optimizer_finalize under a new introduced flag. At

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-16 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Richard Biener writes: > >> On Wed, 11 Nov 2020, Jiufu Guo wrote: >> >>> >>> Thanks a lot for the sugguestion from previous mails. >>> The patch was updated accordingly. >>> >>> This updated patch propagates loop-closed PHIs them out after >>> loop_optimizer_finalize under

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-16 Thread Jiufu Guo via Gcc-patches
On 2020-11-16 17:35, Richard Biener wrote: On Mon, Nov 16, 2020 at 10:26 AM Jiufu Guo wrote: Jiufu Guo writes: > Richard Biener writes: > >> On Wed, 11 Nov 2020, Jiufu Guo wrote: >> >>> >>> Thanks a lot for the suggestion from previous mails. >>> The patch was updated accordingly. >>> >>>

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-17 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > On 2020-11-16 17:35, Richard Biener wrote: >> On Mon, Nov 16, 2020 at 10:26 AM Jiufu Guo >> wrote: >>> >>> Jiufu Guo writes: >>> >>> > Richard Biener writes: >>> > >>> >> On Wed, 11 Nov 2020, Jiufu Guo wrote: >>> >> .. >>> + >>> + /* Check dominator info before get

Re: [PATCH] Clean up loop-closed PHIs at loopdone pass

2020-11-05 Thread Jiufu Guo via Gcc-patches
On 2020-11-05 21:43, Richard Biener wrote: Hi Richard, Thanks for your comments and suggestions! On Thu, Nov 5, 2020 at 2:19 PM guojiufu via Gcc-patches wrote: In PR87473, there are discussions about loop-closed PHIs which are generated for loop optimization passes. It would be helpful to

[PATCH] go/100537 - Bootstrap-O3 and bootstrap-debug fail

2021-05-13 Thread Jiufu Guo via Gcc-patches
As discussed in the PR, Richard mentioned the method to figure out which VAR was not set TREE_ADDRESSABLE, and then cause this failure. It is address_expression which build addr_expr (build_fold_addr_expr_loc), but not set TREE_ADDRESSABLE. I drafted this patch with reference the comments from

[PATCH V2] Split loop for NE condition.

2021-05-16 Thread Jiufu Guo via Gcc-patches
When there is the possibility that overflow/wrap may happen on the loop index, a few optimizations would not happen. For example code: foo (int *a, int *b, unsigned k, unsigned n) { while (++k != n) a[k] = b[k] + 1; } For this code, if "k > n", k would wrap. if "k < n" at begining, it

[PATCH V3] Split loop for NE condition.

2021-06-04 Thread Jiufu Guo via Gcc-patches
Update the patch since v2: . Check index and bound from gcond before checking if wrap. . Update test case, and add an executable case. . Refine code comments. . Enhance the checking for i++/++i in the loop header. . Enhance code to handle equal condition on exit Bootstrap and regtest pass on

[PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-07-07 Thread Jiufu Guo via Gcc-patches
Changes since v1: * Update assumptions for niter, add more test cases check * Use widest_int/wide_int instead mpz to do +-/ * Move some early check for quick return For code like: unsigned foo(unsigned val, unsigned start) { unsigned cnt = 0; for (unsigned i = start; i > val; ++i) cnt++;

[PATCH] Analyze niter for until-wrap condition [PR101145]

2021-06-30 Thread Jiufu Guo via Gcc-patches
For code like: unsigned foo(unsigned val, unsigned start) { unsigned cnt = 0; for (unsigned i = start; i > val; ++i) cnt++; return cnt; } The number of iterations should be about UINT_MAX - start. There is function adjust_cond_for_loop_until_wrap which handles similar work for const

[PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-08 Thread Jiufu Guo via Gcc-patches
Currently, doloop.xx variable is using the type as niter which may shorter than word size. For some cases, it may be better to use word size type. For example, on some 64bit system, to access 32bit niter, subreg maybe used. Then using 64bit type would not need to use subreg if the value can be

[PATCH] split loop for NE condition.

2021-04-29 Thread Jiufu Guo via Gcc-patches
When there is the possibility that overflow may happen on the loop index, a few optimizations would not happen. For example code: foo (int *a, int *b, unsigned k, unsigned n) { while (++k != n) a[k] = b[k] + 1; } For this code, if "l > n", overflow may happen. if "l < n" at begining, it

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-08-31 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: On Tue, 31 Aug 2021, guojiufu wrote: On 2021-08-30 20:02, Richard Biener wrote: > On Mon, 30 Aug 2021, guojiufu wrote: > >> On 2021-08-30 14:15, Jiufu Guo wrote: >> > Hi, >> > >> > In patch r12-3136, niter->control, niter->bound and >> > niter->cmp are >> > derived

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-08-31 Thread Jiufu Guo via Gcc-patches
在 2021/9/1 上午11:30, Jiufu Guo via Gcc-patches 写道: Richard Biener writes: On Tue, 31 Aug 2021, guojiufu wrote: On 2021-08-30 20:02, Richard Biener wrote: > On Mon, 30 Aug 2021, guojiufu wrote: > >> On 2021-08-30 14:15, Jiufu Guo wrote: >> > Hi, >> > >>

[PATCH] Set bound/cmp/control for until wrap loop.

2021-08-30 Thread Jiufu Guo via Gcc-patches
Hi, In patch r12-3136, niter->control, niter->bound and niter->cmp are derived from number_of_iterations_lt. While for 'until wrap condition', the calculation in number_of_iterations_lt is not align the requirements on the define of them and requirements in determine_exit_conditions. This patch

Re: Ping: [PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-08-16 Thread Jiufu Guo via Gcc-patches
"Bin.Cheng" writes: On Wed, Aug 4, 2021 at 10:42 AM guojiufu wrote: Hi, cut... >> @@ -0,0 +1,63 @@ >> +TYPE __attribute__ ((noinline)) >> +foo_sign (int *__restrict__ a, int *__restrict__ b, TYPE l, >> TYPE n) >> +{ >> + for (l = L_BASE; n < l; l += C) >> +*a++ = *b++ + 1; >> +

Re: Ping: [PATCH v2] Analyze niter for until-wrap condition [PR101145]

2021-08-16 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: "Bin.Cheng" writes: On Wed, Aug 4, 2021 at 10:42 AM guojiufu wrote: Hi, cut... >> @@ -0,0 +1,63 @@ >> +TYPE __attribute__ ((noinline)) >> +foo_sign (int *__restrict__ a, int *__restrict__ b, TYPE >> l, >> TYPE n) >> +{ >> + for (l = L_BASE; n < l; l += C) >> +

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-09-02 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Tue, 31 Aug 2021, guojiufu wrote: > >> On 2021-08-30 20:02, Richard Biener wrote: >> > On Mon, 30 Aug 2021, guojiufu wrote: >> > >> >> On 2021-08-30 14:15, Jiufu Guo wrote: >> >> > Hi, >> >> > >> >> > In patch r12-3136, niter->control, niter->bound and niter->cmp are

[PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Jiufu Guo via Gcc-patches
When transform {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step} to {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0} There would be error if 'iv0.step - iv1.step' in negative, for which means run until wrap/overflow. For example: {1, +, 1} <= {4, +, 3} => {1, +, -2} <= {4, +, 0} This

[PATCH V2] Set bound/cmp/control for until wrap loop.

2021-09-02 Thread Jiufu Guo via Gcc-patches
Changes on V1: * Add more test case * Add comment for exit-condition transform * Removing duplicate setting on niter->control This patch reset niter->control, niter->bound and niter->cmp in number_of_iterations_until_wrap. Bootstrap and test pass on ppc64 and x86, and pass the test cases in

Re: [PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Thu, 2 Sep 2021, Jiufu Guo wrote: > >> When transform >> {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step} >> to >> {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0} >> >> There would be error if 'iv0.step - iv1.step' in negative, >> for which means run until

Re: [PATCH V2] Set bound/cmp/control for until wrap loop.

2021-09-15 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: I may want to have a gentle ping on this. https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578680.html BR, Jiufu > Changes on V1: > * Add more test case > * Add comment for exit-condition transform > * Removing duplicate setting on niter->control > > This patch reset

[PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-13 Thread Jiufu Guo via Gcc-patches
Major changes from v1: * Add target hook to query preferred doloop mode. * Recompute doloop iv base from niter under preferred mode. Currently, doloop.xx variable is using the type as niter which may shorter than word size. For some cases, it would be better to use word size type. For example,

Re: [PATCH V2] Use preferred mode for doloop iv [PR61837].

2021-07-15 Thread Jiufu Guo via Gcc-patches
Iain Sandoe writes: On 15 Jul 2021, at 06:09, guojiufu via Gcc-patches wrote: On 2021-07-15 02:04, Segher Boessenkool wrote: +@deftypefn {Target Hook} machine_mode TARGET_PREFERRED_DOLOOP_MODE (machine_mode @var{mode}) +This hook takes a @var{mode} which is the original mode of doloop

[PATCH V3] Use preferred mode for doloop IV [PR61837]

2021-07-15 Thread Jiufu Guo via Gcc-patches
Refine code for V2 according to review comments: * Use if check instead assert, and refine assert * Use better RE check for test case, e.g. (?n)/(?p) * Use better wording for target.def Currently, doloop.xx variable is using the type as niter which may be shorter than word size. For some

[PATCH] Use fold_build2 instead fold_binary for TRUTH_AND

2021-10-19 Thread Jiufu Guo via Gcc-patches
In tree_simplify_using_condition_1, there is code which should be logic: "op0 || op1"/"op0 && op1". When creating expression for TRUTH_OR_EXPR and TRUTH_AND_EXPR, fold_build2 would be used instead fold_binary which always return NULL_TREE for this kind of expr. Bootstrap and regtest pass on ppc

[RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-10-18 Thread Jiufu Guo via Gcc-patches
With reference the discussions in: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html Base on the patches in above discussion, we may draft a patch to

[PATCH] disable aggressive_loop_optimizations until niter ready

2021-12-21 Thread Jiufu Guo via Gcc-patches
Hi, Normaly, estimate_numbers_of_iterations get/caculate niter first, and then invokes infer_loop_bounds_from_undefined. While in some case, after a few call stacks, estimate_numbers_of_iterations is invoked before niter is ready (e.g. before number_of_latch_executions returns). e.g.

Re: [PATCH] Check number of iterations for test cases pr101145

2021-11-02 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Mon, 1 Nov 2021, Jiufu Guo wrote: > >> PR101145 is supporting if the number of iterations can be calculated >> for the 'until wrap' condition. Current test cases are checking if >> the loop can be vectorized, if a loop can be vectorized then the number >> of

Re: [PATCH] Check number of iterations for test cases pr101145

2021-11-03 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, 3 Nov 2021, Jiufu Guo wrote: > >> Richard Biener writes: >> >> > On Mon, 1 Nov 2021, Jiufu Guo wrote: >> > >> >> PR101145 is supporting if the number of iterations can be calculated >> >> for the 'until wrap' condition. Current test cases are checking if >> >>

Re: [PATCH] Check number of iterations for test cases pr101145

2021-11-04 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Wed, 3 Nov 2021, Jiufu Guo wrote: > >> Richard Biener writes: >> >> > On Mon, 1 Nov 2021, Jiufu Guo wrote: >> > >> >> PR101145 is supporting if the number of iterations can be calculated >> >> for the 'until wrap' condition. Current test cases are checking if >> >>

[PATCH] Check number of iterations for test cases pr101145

2021-10-31 Thread Jiufu Guo via Gcc-patches
PR101145 is supporting if the number of iterations can be calculated for the 'until wrap' condition. Current test cases are checking if the loop can be vectorized, if a loop can be vectorized then the number of interations is known. While it would be better to check the loop's number of

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-12-09 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Richard Biener writes: > >> On Mon, 18 Oct 2021, Jiufu Guo wrote: >> >>> With reference the discussions in: >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html >>>

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-12-16 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: > Jiufu Guo writes: > >> Richard Biener writes: >> >>> On Mon, 18 Oct 2021, Jiufu Guo wrote: >>> With reference the discussions in: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-12-08 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Mon, 18 Oct 2021, Jiufu Guo wrote: > >> With reference the discussions in: >> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html >> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html >>

Re: [PATCH V3] Use preferred mode for doloop IV [PR61837]

2021-07-26 Thread Jiufu Guo via Gcc-patches
Jeff Law writes: On 7/15/2021 4:08 AM, Jiufu Guo via Gcc-patches wrote: Refine code for V2 according to review comments: * Use if check instead assert, and refine assert * Use better RE check for test case, e.g. (?n)/(?p) * Use better wording for target.def Currently, doloop.xx variable

Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Thu, 13 Jan 2022, guojiufu wrote: > >> On 2022-01-03 22:30, Richard Biener wrote: >> > On Wed, 22 Dec 2021, Jiufu Guo wrote: >> > >> >> Hi, >> >> ... >> >> >> >> Bootstrap and regtest pass on ppc64* and x86_64. Is this ok for trunk? >> > >> > So this is a

[PATCH 2/2] Add assumption combining iv

2022-01-12 Thread Jiufu Guo via Gcc-patches
This is the second patch for two IVs combining. When one IV is chasing another one, to make it safe, we should check if there is wrap/overflow for either IV. With the assumption, which computed as this patch, the number of iterations can be caculated, even the no_overflow flag is not updated for

[PATCH 1/2] Check negative combined step

2022-01-12 Thread Jiufu Guo via Gcc-patches
Hi, Previously, there is discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586460.html I seperate it as two patches. This first patch is to avoid negative step when combining two ivs. The second patch is adding more accurate assumptions. This patch pass bootstrap and

Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-17 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Fri, 14 Jan 2022, Jiufu Guo wrote: > >> Richard Biener writes: >> >> > On Thu, 13 Jan 2022, guojiufu wrote: >> > >> >> On 2022-01-03 22:30, Richard Biener wrote: >> >> > On Wed, 22 Dec 2021, Jiufu Guo wrote: >> >> > >> >> >> Hi, >> >> >> ... >> >> >> >> >> >>

Re: [PATCH] Check if loading const from mem is faster

2022-03-08 Thread Jiufu Guo via Gcc-patches
Jiufu Guo writes: Hi! > Hi Sehger, > > Segher Boessenkool writes: > >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote: >>> Segher Boessenkool writes: >>> > No. insn_cost is only for correct, existing instructions, not for >>> > made-up nonsense. I created insn_cost precisely to

[PATCH]rs6000: optimize li+rldicr+cmpd==>rotldi+cmpldi for 16bits

2022-03-17 Thread Jiufu Guo via Gcc-patches
When checking eq/neq with a constant which has only 16bits, then it can be optimized to check the rotated data. By this, the constant building is optimized. As the example in PR103743: For "in == 0x8000LL", this patch generates: rotldi %r3,%r3,16 cmpldi %cr0,%r3,32768

Re: [PATCH] Check if loading const from mem is faster

2022-03-10 Thread Jiufu Guo via Gcc-patches
Hi! Richard Biener writes: > On Thu, 10 Mar 2022, Jiufu Guo wrote: > >> >> Hi! >> >> Richard Biener writes: >> >> > On Wed, 9 Mar 2022, Jiufu Guo wrote: >> > >> >> >> >> Hi! >> >> >> >> Richard Biener writes: >> >> >> >> > On Tue, 8 Mar 2022, Jiufu Guo wrote: >> >> > >> >> >> Jiufu

Re: [PATCH] Check if loading const from mem is faster

2022-03-08 Thread Jiufu Guo via Gcc-patches
Hi! Richard Biener writes: > On Tue, 8 Mar 2022, Jiufu Guo wrote: > >> Jiufu Guo writes: >> >> Hi! >> >> > Hi Sehger, >> > >> > Segher Boessenkool writes: >> > >> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote: >> >>> Segher Boessenkool writes: >> >>> > No. insn_cost is

Re: [PATCH] Check if loading const from mem is faster

2022-03-09 Thread Jiufu Guo via Gcc-patches
Hi! Richard Biener writes: > On Wed, 9 Mar 2022, Jiufu Guo wrote: > >> >> Hi! >> >> Richard Biener writes: >> >> > On Tue, 8 Mar 2022, Jiufu Guo wrote: >> > >> >> Jiufu Guo writes: >> >> >> >> Hi! >> >> >> >> > Hi Sehger, >> >> > >> >> > Segher Boessenkool writes: >> >> > >> >> >> On

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > On Wed, Feb 23, 2022 at 07:32:55PM +0800, guojiufu wrote: >> >We already have TARGET_INSN_COST which you could ask for a cost. >> >Like if we'd have a single_set then just temporarily substitute >> >the RHS with the candidate and cost the insns and compare against >>

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >> I'm assuming we're always dealing with >> >> (set (reg:MODE ..) ) >> >> here and CSE is not substituting into random places of an >> instruction(?). I don't know what 'rtx_cost' should evaluate

Re: [PATCH] Check if loading const from mem is faster

2022-02-23 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches writes: > Segher Boessenkool writes: > >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >>> I'm assuming we're always dealing with >>> >>> (set (reg:MODE ..) ) >>> >>> here and CSE is

Re: [PATCH] Check if loading const from mem is faster

2022-02-24 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Thu, 24 Feb 2022, Jiufu Guo wrote: > >> Jiufu Guo via Gcc-patches writes: >> >> > Segher Boessenkool writes: >> > >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >> >>> I'm ass

Re: [PATCH] Check if loading const from mem is faster

2022-02-28 Thread Jiufu Guo via Gcc-patches
Richard Biener writes: > On Fri, 25 Feb 2022, Jiufu Guo wrote: > >> Richard Biener writes: >> >> > On Fri, 25 Feb 2022, Jiufu Guo wrote: >> > >> >> Richard Biener writes: >> >> >> >> > On Thu, 24 Fe

Re: [PATCH] Check if loading const from mem is faster

2022-02-28 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: > On Thu, Feb 24, 2022 at 09:50:28AM +0100, Richard Biener wrote: >> On Thu, 24 Feb 2022, Jiufu Guo wrote: >> > And another thing as Segher pointed out, CSE is doing too >> > much work. It may be ok to separate the constant handling >> > logic from CSE. >> >> Not

[PATCH] Check if loading const from mem is faster

2022-02-21 Thread Jiufu Guo via Gcc-patches
Hi, For constants, there are some codes to check: if it is able to put to instruction as an immediate operand or it is profitable to load from mem. There are still some places that could be improved for platforms. This patch could handle PR63281/57836. This patch does not change too much on

Re: [PATCH] Check if loading const from mem is faster

2022-03-01 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool writes: Hi! > Hi! > > On Thu, Feb 24, 2022 at 03:48:54PM +0800, Jiufu Guo wrote: >> Segher Boessenkool writes: >> > That is the problem yes. You need insns to call insn_cost on. You can >> > look in combine.c:combine_validate_cost to see how this can be done; but >> > you

  1   2   3   4   >