RE: [PATCH PR96053] Add "#pragma GCC no_reduc_chain"

2020-07-27 Thread zhoukaipeng (A)
Sorry for the late reply!

-Original Message-
From: Richard Biener [mailto:rguent...@suse.de]
Sent: Wednesday, July 22, 2020 3:02 PM

> First of all I think giving users more control over vectorization is 
> good.  Now as for "#pragma GCC no_reduc_chain" I'd like to avoid 
> negatives and terms internal to GCC.  I also would like to see 
> vectorization pragmas to be grouped somehow, also to avoid bit 
> explosion in struct loop.  There's already annot_expr_no_vector_kind 
> and annot_expr_vector_kind both only used by the fortran FE at the 
> moment.  Note ANNOATE_EXPR already allows an extra argument thus only 
> annot_expr_vector_kind should prevail with its argument specifying a 
> bitmask of vectorizer hints.  We'd have an extra enum for those like
> 
> enum annot_vector_subkind {
>   annot_vector_never = 0,
>   annot_vector_auto = 1, // this is the default
>   annot_vector_always = 3,
>   your new flag
> };
> 
> and the user would specify it via
> 
> #pragma GCC vect [(never|always|auto)] [your new flag]

I'll add the code to complete the above modifications.
 
> now, I honestly have a difficulty in suggesting a better name than 
> no_reduc_chain.  Quoting the testcase:
> 
> +double f(double *a, double *b)
> +{
> +  double res1 = 0;
> +  double res0 = 0;
> +#pragma GCC no_reduc_chain
> +  for (int i = 0 ; i < 1000; i+=4) {
> +res0 += a[i] * b[i];
> +res1 += a[i+1] * b[i*1];
> +res0 += a[i+2] * b[i+2];
> +res1 += a[i+3] * b[i+3];
> +  }
> +  return res0 + res1;
> +}
> 
> for your case with IIRC V2DF vectors using reduction chains will 
> result in a vectorization factor of two while with a SLP reduction the 
> vectorization factor is one.

I reconfirmed the situation.  Using reduction chains also result in a 
vectorization factor of one as the same as using reductions.
Related logs are followed:

1.using reduction chains
pr96053.c:6:3: note:   Final SLP tree for instance:
pr96053.c:6:3: note:   node 0x1a4f4ba0 (max_nunits=2, refcnt=2)
pr96053.c:6:3: note:stmt 0 res1_37 = _14 + res1_48;
pr96053.c:6:3: note:stmt 1 res1_39 = _28 + res1_37;
...
pr96053.c:6:3: note:   === vect_make_slp_decision ===
pr96053.c:6:3: note:   Decided to SLP 2 instances. Unrolling factor 1
pr96053.c:6:3: note:   === vect_detect_hybrid_slp ===
pr96053.c:6:3: note:   === vect_update_vf_for_slp ===
pr96053.c:6:3: note:   Loop contains only SLP stmts
pr96053.c:6:3: note:   Updating vectorization factor to 1.
pr96053.c:6:3: note:  vectorization_factor = 1, niters = 250

2.using reductions
pr96053.c:6:3: note:   Final SLP tree for instance:
pr96053.c:6:3: note:   node 0x3a7f2cb0 (max_nunits=2, refcnt=2)
pr96053.c:6:3: note:stmt 0 res0_38 = _21 + res0_36;
pr96053.c:6:3: note:stmt 1 res1_39 = _28 + res1_37;
...
pr96053.c:6:3: note:   === vect_make_slp_decision ===
pr96053.c:6:3: note:   Decided to SLP 1 instances. Unrolling factor 1
pr96053.c:6:3: note:   === vect_detect_hybrid_slp ===
pr96053.c:6:3: note:   === vect_update_vf_for_slp ===
pr96053.c:6:3: note:   Loop contains only SLP stmts
pr96053.c:6:3: note:   Updating vectorization factor to 1.
pr96053.c:6:3: note:  vectorization_factor = 1, niters = 250

> So maybe it is better to give the user control over the vectorization factor?
> That's desirable in other cases where the user wants to force a larger 
> VF to get extra unrolling for example.  For the testcase above you'd 
> use
> 
> #pragma GCC vect vf(1)
> 
> or so (syntax to be discussed).  The side-effect would be that with a 
> reduction chain the VF request cannot be fulfilled but with a SLP 
> reduction it can.  Of course no_reduc_chain is much easier to actually 
> implement in a strict way while specifying VF will likely need to be 
> documented as a hint (with an eventual diagnostic if it wasn't 
> fulfilled)
> 
> Richard/Jakub, any thoughts?
> 
> Thanks,
> Richard.

Thanks,
Kaipeng Zhou


[PATCH PR96053] Add "#pragma GCC no_reduc_chain"

2020-07-21 Thread zhoukaipeng (A)
Hi,

It is the patch to add "#pragma GCC no_reduc_chain" for pr96053.  It only 
completes the front end of C language.

For the testcase, it successfully skipped doing slp by finding sequences from 
reduction chains.  Without "#pragma GCC no_reduc_chain", it will fail to do 
vectorization.

Please help to check if there is any problem.  If there is no problem, I will 
continue to complete the front end of the remaining languages.

Thanks,
Kaipeng Zhou


add-pragma-v1.diff
Description: add-pragma-v1.diff


RE: [PATCH] pass correct parameters to c_parser_do_statement

2020-07-06 Thread zhoukaipeng (A)
Sorry for my mistake.

The previous patch description is incorrect.  A new patch was attached.

Can anyone help me install this patch?

Thanks,
Kaipeng Zhou

> -Original Message-
> From: zhoukaipeng (A)
> Sent: Tuesday, July 7, 2020 11:26 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] pass correct parameters to c_parser_do_statement
> 
> Hi,
> 
> It is a patch to pass correct parameters to c_parser_do_statement.
> 
> Can anyone help me install this patch?
> 
> Thanks,
> Kaipeng Zhou


code-cleanup-v2.diff
Description: code-cleanup-v2.diff


[PATCH] pass correct parameters to c_parser_do_statement

2020-07-06 Thread zhoukaipeng (A)
Hi,

It is a patch to pass correct parameters to c_parser_do_statement.

Can anyone help me install this patch?

Thanks,
Kaipeng Zhou


code-cleanup-v1.diff
Description: code-cleanup-v1.diff


RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging

2020-06-29 Thread zhoukaipeng (A)
> On Sun, 28 Jun 2020, zhoukaipeng (A) wrote:
> 
> > Hi,
> >
> > Thanks for your good suggestions!
> >
> > This patch was remade and attached.  Does the v2 patch look better?
> >
> > Bootstrap and new testcase tested on aarch64 & x86 Linux platform.
> 
> OK.
> 
> Thanks,
> Richard.

Thanks for reviewing this.  Could you please help install it?

Kaipeng Zhou


RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging

2020-06-27 Thread zhoukaipeng (A)
Hi,

Thanks for your good suggestions!

This patch was remade and attached.  Does the v2 patch look better?

Bootstrap and new testcase tested on aarch64 & x86 Linux platform.

Kaipeng Zhou


PR95854-v2.diff
Description: PR95854-v2.diff


[PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging

2020-06-24 Thread zhoukaipeng (A)
Hi,

This is a fix for pr95854.

Only add a judgement to make sure operand1 and operand2 are both INTEGER_CST.

Bootstrap and tested on aarch64 Linux platform. No new regression witnessed.

Is it ok to be merged?

Thanks,
Kaipeng Zhou


0001-store-merging-ICE-in-find_bswap_or_nop_1-PR95854.patch
Description: 0001-store-merging-ICE-in-find_bswap_or_nop_1-PR95854.patch


RE: [PATCH PR95199 v2] vect: CSE for bump and offset in strided load/store operations

2020-06-16 Thread zhoukaipeng (A)
Bootstrapped and tested on aarch64-linux-gnu & x86_64-linux-gnu.  Could you 
please help install this patch?

Kaipeng Zhou

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Tuesday, June 16, 2020 5:49 PM
> To: zhoukaipeng (A) 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org
> Subject: Re: [PATCH PR95199 v2] vect: CSE for bump and offset in strided
> load/store operations
> 
> On Tue, 16 Jun 2020, zhoukaipeng (A) wrote:
> 
> > Hi,
> >
> > I try to eliminate the common stmts for *vec_offset also.  But I am
> > not sure it is a good way.
> >
> > New patch attached.  Bootstraped and testsuites are being tested.
> 
> Looks good to me and certainly worth if it makes a difference for IV
> calculation (that was the main motivation of the machinery).
> 
> Richard.
> 
> > Kaipeng Zhou
> >
> > > -Original Message-
> > > From: Richard Biener [mailto:rguent...@suse.de]
> > > Sent: Monday, June 15, 2020 2:21 PM
> > > To: zhoukaipeng (A) 
> > > Cc: Richard Sandiford ; gcc-
> > > patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org
> > > Subject: RE: [PATCH PR95199] vect: Remove extra variable created for
> > > memory reference
> > >
> > > On Sun, 14 Jun 2020, zhoukaipeng (A) wrote:
> > >
> > > > Hi,
> > > >
> > > > I modified the issue you mentioned.  Bootstrap and tested on
> > > > aarch64
> > > Linux platform again.  No new regression witnessed.
> > > >
> > > > For "*vec_offset", it is indeed a point optimizable.  However, it
> > > > is not able to eliminate it by the same way as "*dataref_bump".
> > > > The cse_and_gimplify_to_preheader will return the cached operand
> > > > if operand_compare::operand_equal_p for the cached one and the
> new
> > > > one returns true.  But it did not work well for "*vec_offset".
> > >
> > > You have to see what is actually in *vec_offset, if there's partly
> > > gimplified but not CSEd stmts in there then that's the issue.
> > >
> > > Richard.
> > >
> > > > Do you have any suggestions for the problem?
> > > >
> > > > Thanks,
> > > > Kaipeng Zhou
> > > >
> > > > > -Original Message-
> > > > > From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> > > > > Sent: Friday, June 12, 2020 3:48 PM
> > > > > To: zhoukaipeng (A) 
> > > > > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de;
> > > > > am...@gcc.gnu.org; rgue...@gcc.gnu.org
> > > > > Subject: Re: [PATCH PR95199] vect: Remove extra variable created
> > > > > for memory reference
> > > > >
> > > > > "zhoukaipeng (A)"  writes:
> > > > > > Hi,
> > > > > >
> > > > > > This is a fix for pr95199.
> > > > > > In vectorization, vinfo->ivexpr_map is supposed to catch those
> > > > > > "IV base
> > > > > and/or step expressions".  Just call
> > > > > cse_and_gimplify_to_preheader to handle gathering/scattering to
> avoid the extra variable.
> > > > > >
> > > > > > Bootstrap and tested on aarch64/x86_64 Linux platform. No new
> > > > > regression witnessed.
> > > > > >
> > > > > > Is it ok?
> > > > > >
> > > > > > Thanks,
> > > > > > Kaipeng Zhou
> > > > > >
> > > > > > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17
> > > 00:00:00
> > > > > 2001
> > > > > > From: Kaipeng Zhou 
> > > > > > Date: Fri, 12 Jun 2020 00:54:15 +0800
> > > > > > Subject: [PATCH] vect: Remove extra variable created for
> > > > > > memory reference in  loop vectorization.
> > > > > >
> > > > > > Vectorization create two copies of the same IV and IVOPTs does
> > > > > > not perform CSE.  But vinfo->ivexpr_map is supposed to catch
> > > > > > those "IV base and/or step expressions".  Just call
> > > > > > cse_and_gimplify_to_preheader to handle gathering/scattering
> > > > > > to avoid
> > > > > the extra variable.
> > > > > >
> > > > > > 2020-

[PATCH PR95199 v2] vect: CSE for bump and offset in strided load/store operations

2020-06-16 Thread zhoukaipeng (A)
Hi,

I try to eliminate the common stmts for *vec_offset also.  But I am not sure
it is a good way.

New patch attached.  Bootstraped and testsuites are being tested.

Kaipeng Zhou

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: Monday, June 15, 2020 2:21 PM
> To: zhoukaipeng (A) 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org
> Subject: RE: [PATCH PR95199] vect: Remove extra variable created for
> memory reference
> 
> On Sun, 14 Jun 2020, zhoukaipeng (A) wrote:
> 
> > Hi,
> >
> > I modified the issue you mentioned.  Bootstrap and tested on aarch64
> Linux platform again.  No new regression witnessed.
> >
> > For "*vec_offset", it is indeed a point optimizable.  However, it is
> > not able to eliminate it by the same way as "*dataref_bump".  The
> > cse_and_gimplify_to_preheader will return the cached operand if
> > operand_compare::operand_equal_p for the cached one and the new one
> > returns true.  But it did not work well for "*vec_offset".
> 
> You have to see what is actually in *vec_offset, if there's partly gimplified 
> but
> not CSEd stmts in there then that's the issue.
> 
> Richard.
> 
> > Do you have any suggestions for the problem?
> >
> > Thanks,
> > Kaipeng Zhou
> >
> > > -Original Message-
> > > From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> > > Sent: Friday, June 12, 2020 3:48 PM
> > > To: zhoukaipeng (A) 
> > > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; am...@gcc.gnu.org;
> > > rgue...@gcc.gnu.org
> > > Subject: Re: [PATCH PR95199] vect: Remove extra variable created for
> > > memory reference
> > >
> > > "zhoukaipeng (A)"  writes:
> > > > Hi,
> > > >
> > > > This is a fix for pr95199.
> > > > In vectorization, vinfo->ivexpr_map is supposed to catch those "IV
> > > > base
> > > and/or step expressions".  Just call cse_and_gimplify_to_preheader
> > > to handle gathering/scattering to avoid the extra variable.
> > > >
> > > > Bootstrap and tested on aarch64/x86_64 Linux platform. No new
> > > regression witnessed.
> > > >
> > > > Is it ok?
> > > >
> > > > Thanks,
> > > > Kaipeng Zhou
> > > >
> > > > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17
> 00:00:00
> > > 2001
> > > > From: Kaipeng Zhou 
> > > > Date: Fri, 12 Jun 2020 00:54:15 +0800
> > > > Subject: [PATCH] vect: Remove extra variable created for memory
> > > > reference in  loop vectorization.
> > > >
> > > > Vectorization create two copies of the same IV and IVOPTs does not
> > > > perform CSE.  But vinfo->ivexpr_map is supposed to catch those "IV
> > > > base and/or step expressions".  Just call
> > > > cse_and_gimplify_to_preheader to handle gathering/scattering to
> > > > avoid
> > > the extra variable.
> > > >
> > > > 2020-06-12  Bin Cheng 
> > > > Kaipeng Zhou  
> > > >
> > > > PR tree-optimization/95199
> > > > * tree-vect-stmts.c: Use CSE map to catch the IV step and 
> > > > eliminate
> > > > extra variable.
> > > >
> > > > 2020-06-12  Bin Cheng 
> > > > Kaipeng Zhou  
> > > >
> > > > PR tree-optimization/95199
> > > > * gcc.dg/vect/pr95199.c: New test.
> > > > ---
> > > >  gcc/ChangeLog   |  7 +++
> > > >  gcc/testsuite/ChangeLog |  6 ++
> > > >  gcc/testsuite/gcc.dg/vect/pr95199.c | 17 +
> > > >  gcc/tree-vect-stmts.c   |  1 +
> > > >  4 files changed, 31 insertions(+)  create mode 100644
> > > > gcc/testsuite/gcc.dg/vect/pr95199.c
> > > >
> > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index
> > > > c92582df7fe..753d70fc94b 100644
> > > > --- a/gcc/ChangeLog
> > > > +++ b/gcc/ChangeLog
> > > > @@ -1,3 +1,10 @@
> > > > +2020-06-12  Bin Cheng 
> > > > +   Kaipeng Zhou  
> > > > +
> > > > +   PR tree-optimization/95199
> > > > +   * tree-vect-stmts.c: Use CSE map to catch the IV step and 
> > > > eliminate
> > > > +   extra 

RE: [PATCH PR95199] vect: Remove extra variable created for memory reference

2020-06-14 Thread zhoukaipeng (A)
Hi, 

I modified the issue you mentioned.  Bootstrap and tested on aarch64 Linux 
platform again.  No new regression witnessed.

For "*vec_offset", it is indeed a point optimizable.  However, it is not able 
to eliminate it by the same way as "*dataref_bump".  The 
cse_and_gimplify_to_preheader will return the cached operand if 
operand_compare::operand_equal_p for the cached one and the new one returns 
true.  But it did not work well for "*vec_offset".

Do you have any suggestions for the problem?

Thanks,
Kaipeng Zhou

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Friday, June 12, 2020 3:48 PM
> To: zhoukaipeng (A) 
> Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; am...@gcc.gnu.org;
> rgue...@gcc.gnu.org
> Subject: Re: [PATCH PR95199] vect: Remove extra variable created for
> memory reference
> 
> "zhoukaipeng (A)"  writes:
> > Hi,
> >
> > This is a fix for pr95199.
> > In vectorization, vinfo->ivexpr_map is supposed to catch those "IV base
> and/or step expressions".  Just call cse_and_gimplify_to_preheader to
> handle gathering/scattering to avoid the extra variable.
> >
> > Bootstrap and tested on aarch64/x86_64 Linux platform. No new
> regression witnessed.
> >
> > Is it ok?
> >
> > Thanks,
> > Kaipeng Zhou
> >
> > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17 00:00:00
> 2001
> > From: Kaipeng Zhou 
> > Date: Fri, 12 Jun 2020 00:54:15 +0800
> > Subject: [PATCH] vect: Remove extra variable created for memory
> > reference in  loop vectorization.
> >
> > Vectorization create two copies of the same IV and IVOPTs does not
> > perform CSE.  But vinfo->ivexpr_map is supposed to catch those "IV
> > base and/or step expressions".  Just call
> > cse_and_gimplify_to_preheader to handle gathering/scattering to avoid
> the extra variable.
> >
> > 2020-06-12  Bin Cheng 
> > Kaipeng Zhou  
> >
> > PR tree-optimization/95199
> > * tree-vect-stmts.c: Use CSE map to catch the IV step and eliminate
> > extra variable.
> >
> > 2020-06-12  Bin Cheng 
> > Kaipeng Zhou  
> >
> > PR tree-optimization/95199
> > * gcc.dg/vect/pr95199.c: New test.
> > ---
> >  gcc/ChangeLog   |  7 +++
> >  gcc/testsuite/ChangeLog |  6 ++
> >  gcc/testsuite/gcc.dg/vect/pr95199.c | 17 +
> >  gcc/tree-vect-stmts.c   |  1 +
> >  4 files changed, 31 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr95199.c
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index
> > c92582df7fe..753d70fc94b 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,10 @@
> > +2020-06-12  Bin Cheng 
> > +   Kaipeng Zhou  
> > +
> > +   PR tree-optimization/95199
> > +   * tree-vect-stmts.c: Use CSE map to catch the IV step and eliminate
> > +   extra variable.
> > +
> >  2020-06-08  Tobias Burnus  
> >
> > PR lto/94848
> > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index
> > 60d9ecca3ed..a27dd3fa072 100644
> > --- a/gcc/testsuite/ChangeLog
> > +++ b/gcc/testsuite/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2020-06-12  Bin Cheng 
> > +   Kaipeng Zhou  
> > +
> > +   PR tree-optimization/95199
> > +   * gcc.dg/vect/pr95199.c: New test.
> > +
> >  2020-06-08  Harald Anlauf  
> >
> > PR fortran/95195
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr95199.c
> > b/gcc/testsuite/gcc.dg/vect/pr95199.c
> > new file mode 100644
> > index 000..ec201feaec8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr95199.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile { target aarch64-*-* } } */
> > +/* { dg-options "-O3 -march=armv8.2-a+fp+sve" } */
> > +
> > +void
> > +foo (double *a, double *b, double m, int inc_x, int inc_y) {
> > +  int ix = 0, iy = 0;
> > +  for (int i = 0; i < 1000; ++i)
> > +{
> > +  a[ix] += m * b[iy];
> > +  ix += inc_x;
> > +  iy += inc_y;
> > +}
> > +  return ;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "add" 12 } } */
> 
> For asm tests, it's probably better to put this in aarch64/sve instead.
> The scan-assembler should be a bit more precise though, since just "add"
> could catch filenames like /homd/ladd/gcc.  E.g. maybe it would be clearer to
> make the regexp match

[PATCH PR95199] vect: Remove extra variable created for memory reference

2020-06-12 Thread zhoukaipeng (A)
Hi,

This is a fix for pr95199.
In vectorization, vinfo->ivexpr_map is supposed to catch those "IV base and/or 
step expressions".  Just call cse_and_gimplify_to_preheader to handle 
gathering/scattering to avoid the extra variable.

Bootstrap and tested on aarch64/x86_64 Linux platform. No new regression 
witnessed.

Is it ok?

Thanks,
Kaipeng Zhou


0001-vect-Remove-extra-variable-created-for-memory-refere.patch
Description: 0001-vect-Remove-extra-variable-created-for-memory-refere.patch