RE: [PATCH PR96053] Add "#pragma GCC no_reduc_chain"
Sorry for the late reply! -Original Message- From: Richard Biener [mailto:rguent...@suse.de] Sent: Wednesday, July 22, 2020 3:02 PM > First of all I think giving users more control over vectorization is > good. Now as for "#pragma GCC no_reduc_chain" I'd like to avoid > negatives and terms internal to GCC. I also would like to see > vectorization pragmas to be grouped somehow, also to avoid bit > explosion in struct loop. There's already annot_expr_no_vector_kind > and annot_expr_vector_kind both only used by the fortran FE at the > moment. Note ANNOATE_EXPR already allows an extra argument thus only > annot_expr_vector_kind should prevail with its argument specifying a > bitmask of vectorizer hints. We'd have an extra enum for those like > > enum annot_vector_subkind { > annot_vector_never = 0, > annot_vector_auto = 1, // this is the default > annot_vector_always = 3, > your new flag > }; > > and the user would specify it via > > #pragma GCC vect [(never|always|auto)] [your new flag] I'll add the code to complete the above modifications. > now, I honestly have a difficulty in suggesting a better name than > no_reduc_chain. Quoting the testcase: > > +double f(double *a, double *b) > +{ > + double res1 = 0; > + double res0 = 0; > +#pragma GCC no_reduc_chain > + for (int i = 0 ; i < 1000; i+=4) { > +res0 += a[i] * b[i]; > +res1 += a[i+1] * b[i*1]; > +res0 += a[i+2] * b[i+2]; > +res1 += a[i+3] * b[i+3]; > + } > + return res0 + res1; > +} > > for your case with IIRC V2DF vectors using reduction chains will > result in a vectorization factor of two while with a SLP reduction the > vectorization factor is one. I reconfirmed the situation. Using reduction chains also result in a vectorization factor of one as the same as using reductions. Related logs are followed: 1.using reduction chains pr96053.c:6:3: note: Final SLP tree for instance: pr96053.c:6:3: note: node 0x1a4f4ba0 (max_nunits=2, refcnt=2) pr96053.c:6:3: note:stmt 0 res1_37 = _14 + res1_48; pr96053.c:6:3: note:stmt 1 res1_39 = _28 + res1_37; ... pr96053.c:6:3: note: === vect_make_slp_decision === pr96053.c:6:3: note: Decided to SLP 2 instances. Unrolling factor 1 pr96053.c:6:3: note: === vect_detect_hybrid_slp === pr96053.c:6:3: note: === vect_update_vf_for_slp === pr96053.c:6:3: note: Loop contains only SLP stmts pr96053.c:6:3: note: Updating vectorization factor to 1. pr96053.c:6:3: note: vectorization_factor = 1, niters = 250 2.using reductions pr96053.c:6:3: note: Final SLP tree for instance: pr96053.c:6:3: note: node 0x3a7f2cb0 (max_nunits=2, refcnt=2) pr96053.c:6:3: note:stmt 0 res0_38 = _21 + res0_36; pr96053.c:6:3: note:stmt 1 res1_39 = _28 + res1_37; ... pr96053.c:6:3: note: === vect_make_slp_decision === pr96053.c:6:3: note: Decided to SLP 1 instances. Unrolling factor 1 pr96053.c:6:3: note: === vect_detect_hybrid_slp === pr96053.c:6:3: note: === vect_update_vf_for_slp === pr96053.c:6:3: note: Loop contains only SLP stmts pr96053.c:6:3: note: Updating vectorization factor to 1. pr96053.c:6:3: note: vectorization_factor = 1, niters = 250 > So maybe it is better to give the user control over the vectorization factor? > That's desirable in other cases where the user wants to force a larger > VF to get extra unrolling for example. For the testcase above you'd > use > > #pragma GCC vect vf(1) > > or so (syntax to be discussed). The side-effect would be that with a > reduction chain the VF request cannot be fulfilled but with a SLP > reduction it can. Of course no_reduc_chain is much easier to actually > implement in a strict way while specifying VF will likely need to be > documented as a hint (with an eventual diagnostic if it wasn't > fulfilled) > > Richard/Jakub, any thoughts? > > Thanks, > Richard. Thanks, Kaipeng Zhou
[PATCH PR96053] Add "#pragma GCC no_reduc_chain"
Hi, It is the patch to add "#pragma GCC no_reduc_chain" for pr96053. It only completes the front end of C language. For the testcase, it successfully skipped doing slp by finding sequences from reduction chains. Without "#pragma GCC no_reduc_chain", it will fail to do vectorization. Please help to check if there is any problem. If there is no problem, I will continue to complete the front end of the remaining languages. Thanks, Kaipeng Zhou add-pragma-v1.diff Description: add-pragma-v1.diff
RE: [PATCH] pass correct parameters to c_parser_do_statement
Sorry for my mistake. The previous patch description is incorrect. A new patch was attached. Can anyone help me install this patch? Thanks, Kaipeng Zhou > -Original Message- > From: zhoukaipeng (A) > Sent: Tuesday, July 7, 2020 11:26 AM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] pass correct parameters to c_parser_do_statement > > Hi, > > It is a patch to pass correct parameters to c_parser_do_statement. > > Can anyone help me install this patch? > > Thanks, > Kaipeng Zhou code-cleanup-v2.diff Description: code-cleanup-v2.diff
[PATCH] pass correct parameters to c_parser_do_statement
Hi, It is a patch to pass correct parameters to c_parser_do_statement. Can anyone help me install this patch? Thanks, Kaipeng Zhou code-cleanup-v1.diff Description: code-cleanup-v1.diff
RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging
> On Sun, 28 Jun 2020, zhoukaipeng (A) wrote: > > > Hi, > > > > Thanks for your good suggestions! > > > > This patch was remade and attached. Does the v2 patch look better? > > > > Bootstrap and new testcase tested on aarch64 & x86 Linux platform. > > OK. > > Thanks, > Richard. Thanks for reviewing this. Could you please help install it? Kaipeng Zhou
RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging
Hi, Thanks for your good suggestions! This patch was remade and attached. Does the v2 patch look better? Bootstrap and new testcase tested on aarch64 & x86 Linux platform. Kaipeng Zhou PR95854-v2.diff Description: PR95854-v2.diff
[PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging
Hi, This is a fix for pr95854. Only add a judgement to make sure operand1 and operand2 are both INTEGER_CST. Bootstrap and tested on aarch64 Linux platform. No new regression witnessed. Is it ok to be merged? Thanks, Kaipeng Zhou 0001-store-merging-ICE-in-find_bswap_or_nop_1-PR95854.patch Description: 0001-store-merging-ICE-in-find_bswap_or_nop_1-PR95854.patch
RE: [PATCH PR95199 v2] vect: CSE for bump and offset in strided load/store operations
Bootstrapped and tested on aarch64-linux-gnu & x86_64-linux-gnu. Could you please help install this patch? Kaipeng Zhou > -Original Message- > From: Richard Biener [mailto:rguent...@suse.de] > Sent: Tuesday, June 16, 2020 5:49 PM > To: zhoukaipeng (A) > Cc: Richard Sandiford ; gcc- > patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org > Subject: Re: [PATCH PR95199 v2] vect: CSE for bump and offset in strided > load/store operations > > On Tue, 16 Jun 2020, zhoukaipeng (A) wrote: > > > Hi, > > > > I try to eliminate the common stmts for *vec_offset also. But I am > > not sure it is a good way. > > > > New patch attached. Bootstraped and testsuites are being tested. > > Looks good to me and certainly worth if it makes a difference for IV > calculation (that was the main motivation of the machinery). > > Richard. > > > Kaipeng Zhou > > > > > -Original Message- > > > From: Richard Biener [mailto:rguent...@suse.de] > > > Sent: Monday, June 15, 2020 2:21 PM > > > To: zhoukaipeng (A) > > > Cc: Richard Sandiford ; gcc- > > > patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org > > > Subject: RE: [PATCH PR95199] vect: Remove extra variable created for > > > memory reference > > > > > > On Sun, 14 Jun 2020, zhoukaipeng (A) wrote: > > > > > > > Hi, > > > > > > > > I modified the issue you mentioned. Bootstrap and tested on > > > > aarch64 > > > Linux platform again. No new regression witnessed. > > > > > > > > For "*vec_offset", it is indeed a point optimizable. However, it > > > > is not able to eliminate it by the same way as "*dataref_bump". > > > > The cse_and_gimplify_to_preheader will return the cached operand > > > > if operand_compare::operand_equal_p for the cached one and the > new > > > > one returns true. But it did not work well for "*vec_offset". > > > > > > You have to see what is actually in *vec_offset, if there's partly > > > gimplified but not CSEd stmts in there then that's the issue. > > > > > > Richard. > > > > > > > Do you have any suggestions for the problem? > > > > > > > > Thanks, > > > > Kaipeng Zhou > > > > > > > > > -Original Message- > > > > > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > > > > > Sent: Friday, June 12, 2020 3:48 PM > > > > > To: zhoukaipeng (A) > > > > > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; > > > > > am...@gcc.gnu.org; rgue...@gcc.gnu.org > > > > > Subject: Re: [PATCH PR95199] vect: Remove extra variable created > > > > > for memory reference > > > > > > > > > > "zhoukaipeng (A)" writes: > > > > > > Hi, > > > > > > > > > > > > This is a fix for pr95199. > > > > > > In vectorization, vinfo->ivexpr_map is supposed to catch those > > > > > > "IV base > > > > > and/or step expressions". Just call > > > > > cse_and_gimplify_to_preheader to handle gathering/scattering to > avoid the extra variable. > > > > > > > > > > > > Bootstrap and tested on aarch64/x86_64 Linux platform. No new > > > > > regression witnessed. > > > > > > > > > > > > Is it ok? > > > > > > > > > > > > Thanks, > > > > > > Kaipeng Zhou > > > > > > > > > > > > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17 > > > 00:00:00 > > > > > 2001 > > > > > > From: Kaipeng Zhou > > > > > > Date: Fri, 12 Jun 2020 00:54:15 +0800 > > > > > > Subject: [PATCH] vect: Remove extra variable created for > > > > > > memory reference in loop vectorization. > > > > > > > > > > > > Vectorization create two copies of the same IV and IVOPTs does > > > > > > not perform CSE. But vinfo->ivexpr_map is supposed to catch > > > > > > those "IV base and/or step expressions". Just call > > > > > > cse_and_gimplify_to_preheader to handle gathering/scattering > > > > > > to avoid > > > > > the extra variable. > > > > > > > > > > > > 2020-
[PATCH PR95199 v2] vect: CSE for bump and offset in strided load/store operations
Hi, I try to eliminate the common stmts for *vec_offset also. But I am not sure it is a good way. New patch attached. Bootstraped and testsuites are being tested. Kaipeng Zhou > -Original Message- > From: Richard Biener [mailto:rguent...@suse.de] > Sent: Monday, June 15, 2020 2:21 PM > To: zhoukaipeng (A) > Cc: Richard Sandiford ; gcc- > patc...@gcc.gnu.org; am...@gcc.gnu.org; rgue...@gcc.gnu.org > Subject: RE: [PATCH PR95199] vect: Remove extra variable created for > memory reference > > On Sun, 14 Jun 2020, zhoukaipeng (A) wrote: > > > Hi, > > > > I modified the issue you mentioned. Bootstrap and tested on aarch64 > Linux platform again. No new regression witnessed. > > > > For "*vec_offset", it is indeed a point optimizable. However, it is > > not able to eliminate it by the same way as "*dataref_bump". The > > cse_and_gimplify_to_preheader will return the cached operand if > > operand_compare::operand_equal_p for the cached one and the new one > > returns true. But it did not work well for "*vec_offset". > > You have to see what is actually in *vec_offset, if there's partly gimplified > but > not CSEd stmts in there then that's the issue. > > Richard. > > > Do you have any suggestions for the problem? > > > > Thanks, > > Kaipeng Zhou > > > > > -Original Message- > > > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > > > Sent: Friday, June 12, 2020 3:48 PM > > > To: zhoukaipeng (A) > > > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; am...@gcc.gnu.org; > > > rgue...@gcc.gnu.org > > > Subject: Re: [PATCH PR95199] vect: Remove extra variable created for > > > memory reference > > > > > > "zhoukaipeng (A)" writes: > > > > Hi, > > > > > > > > This is a fix for pr95199. > > > > In vectorization, vinfo->ivexpr_map is supposed to catch those "IV > > > > base > > > and/or step expressions". Just call cse_and_gimplify_to_preheader > > > to handle gathering/scattering to avoid the extra variable. > > > > > > > > Bootstrap and tested on aarch64/x86_64 Linux platform. No new > > > regression witnessed. > > > > > > > > Is it ok? > > > > > > > > Thanks, > > > > Kaipeng Zhou > > > > > > > > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17 > 00:00:00 > > > 2001 > > > > From: Kaipeng Zhou > > > > Date: Fri, 12 Jun 2020 00:54:15 +0800 > > > > Subject: [PATCH] vect: Remove extra variable created for memory > > > > reference in loop vectorization. > > > > > > > > Vectorization create two copies of the same IV and IVOPTs does not > > > > perform CSE. But vinfo->ivexpr_map is supposed to catch those "IV > > > > base and/or step expressions". Just call > > > > cse_and_gimplify_to_preheader to handle gathering/scattering to > > > > avoid > > > the extra variable. > > > > > > > > 2020-06-12 Bin Cheng > > > > Kaipeng Zhou > > > > > > > > PR tree-optimization/95199 > > > > * tree-vect-stmts.c: Use CSE map to catch the IV step and > > > > eliminate > > > > extra variable. > > > > > > > > 2020-06-12 Bin Cheng > > > > Kaipeng Zhou > > > > > > > > PR tree-optimization/95199 > > > > * gcc.dg/vect/pr95199.c: New test. > > > > --- > > > > gcc/ChangeLog | 7 +++ > > > > gcc/testsuite/ChangeLog | 6 ++ > > > > gcc/testsuite/gcc.dg/vect/pr95199.c | 17 + > > > > gcc/tree-vect-stmts.c | 1 + > > > > 4 files changed, 31 insertions(+) create mode 100644 > > > > gcc/testsuite/gcc.dg/vect/pr95199.c > > > > > > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index > > > > c92582df7fe..753d70fc94b 100644 > > > > --- a/gcc/ChangeLog > > > > +++ b/gcc/ChangeLog > > > > @@ -1,3 +1,10 @@ > > > > +2020-06-12 Bin Cheng > > > > + Kaipeng Zhou > > > > + > > > > + PR tree-optimization/95199 > > > > + * tree-vect-stmts.c: Use CSE map to catch the IV step and > > > > eliminate > > > > + extra
RE: [PATCH PR95199] vect: Remove extra variable created for memory reference
Hi, I modified the issue you mentioned. Bootstrap and tested on aarch64 Linux platform again. No new regression witnessed. For "*vec_offset", it is indeed a point optimizable. However, it is not able to eliminate it by the same way as "*dataref_bump". The cse_and_gimplify_to_preheader will return the cached operand if operand_compare::operand_equal_p for the cached one and the new one returns true. But it did not work well for "*vec_offset". Do you have any suggestions for the problem? Thanks, Kaipeng Zhou > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Friday, June 12, 2020 3:48 PM > To: zhoukaipeng (A) > Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de; am...@gcc.gnu.org; > rgue...@gcc.gnu.org > Subject: Re: [PATCH PR95199] vect: Remove extra variable created for > memory reference > > "zhoukaipeng (A)" writes: > > Hi, > > > > This is a fix for pr95199. > > In vectorization, vinfo->ivexpr_map is supposed to catch those "IV base > and/or step expressions". Just call cse_and_gimplify_to_preheader to > handle gathering/scattering to avoid the extra variable. > > > > Bootstrap and tested on aarch64/x86_64 Linux platform. No new > regression witnessed. > > > > Is it ok? > > > > Thanks, > > Kaipeng Zhou > > > > From 12cf9b362576735e4c584c48cd6db3d2b0f2e14b Mon Sep 17 00:00:00 > 2001 > > From: Kaipeng Zhou > > Date: Fri, 12 Jun 2020 00:54:15 +0800 > > Subject: [PATCH] vect: Remove extra variable created for memory > > reference in loop vectorization. > > > > Vectorization create two copies of the same IV and IVOPTs does not > > perform CSE. But vinfo->ivexpr_map is supposed to catch those "IV > > base and/or step expressions". Just call > > cse_and_gimplify_to_preheader to handle gathering/scattering to avoid > the extra variable. > > > > 2020-06-12 Bin Cheng > > Kaipeng Zhou > > > > PR tree-optimization/95199 > > * tree-vect-stmts.c: Use CSE map to catch the IV step and eliminate > > extra variable. > > > > 2020-06-12 Bin Cheng > > Kaipeng Zhou > > > > PR tree-optimization/95199 > > * gcc.dg/vect/pr95199.c: New test. > > --- > > gcc/ChangeLog | 7 +++ > > gcc/testsuite/ChangeLog | 6 ++ > > gcc/testsuite/gcc.dg/vect/pr95199.c | 17 + > > gcc/tree-vect-stmts.c | 1 + > > 4 files changed, 31 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/vect/pr95199.c > > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index > > c92582df7fe..753d70fc94b 100644 > > --- a/gcc/ChangeLog > > +++ b/gcc/ChangeLog > > @@ -1,3 +1,10 @@ > > +2020-06-12 Bin Cheng > > + Kaipeng Zhou > > + > > + PR tree-optimization/95199 > > + * tree-vect-stmts.c: Use CSE map to catch the IV step and eliminate > > + extra variable. > > + > > 2020-06-08 Tobias Burnus > > > > PR lto/94848 > > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index > > 60d9ecca3ed..a27dd3fa072 100644 > > --- a/gcc/testsuite/ChangeLog > > +++ b/gcc/testsuite/ChangeLog > > @@ -1,3 +1,9 @@ > > +2020-06-12 Bin Cheng > > + Kaipeng Zhou > > + > > + PR tree-optimization/95199 > > + * gcc.dg/vect/pr95199.c: New test. > > + > > 2020-06-08 Harald Anlauf > > > > PR fortran/95195 > > diff --git a/gcc/testsuite/gcc.dg/vect/pr95199.c > > b/gcc/testsuite/gcc.dg/vect/pr95199.c > > new file mode 100644 > > index 000..ec201feaec8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/pr95199.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile { target aarch64-*-* } } */ > > +/* { dg-options "-O3 -march=armv8.2-a+fp+sve" } */ > > + > > +void > > +foo (double *a, double *b, double m, int inc_x, int inc_y) { > > + int ix = 0, iy = 0; > > + for (int i = 0; i < 1000; ++i) > > +{ > > + a[ix] += m * b[iy]; > > + ix += inc_x; > > + iy += inc_y; > > +} > > + return ; > > +} > > + > > +/* { dg-final { scan-assembler-times "add" 12 } } */ > > For asm tests, it's probably better to put this in aarch64/sve instead. > The scan-assembler should be a bit more precise though, since just "add" > could catch filenames like /homd/ladd/gcc. E.g. maybe it would be clearer to > make the regexp match
[PATCH PR95199] vect: Remove extra variable created for memory reference
Hi, This is a fix for pr95199. In vectorization, vinfo->ivexpr_map is supposed to catch those "IV base and/or step expressions". Just call cse_and_gimplify_to_preheader to handle gathering/scattering to avoid the extra variable. Bootstrap and tested on aarch64/x86_64 Linux platform. No new regression witnessed. Is it ok? Thanks, Kaipeng Zhou 0001-vect-Remove-extra-variable-created-for-memory-refere.patch Description: 0001-vect-Remove-extra-variable-created-for-memory-refere.patch