Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)
(resent because of mail issues on my end) On Mon, 22 Oct 2018, Thomas Schwinge wrote: I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsigned int) j_10; _2 = _1 * 1025; _3 = (sizetype) i_11; _4 = _2 + _3; or for a power of 2 j_10 = tmp_17 >> 10; i_11 = tmp_17 & 1023; _1 = (long unsigned int) j_10; _2 = _1 * 1024; _3 = (sizetype) i_11; _4 = _2 + _3; and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true). So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed. With a very simplistic "match.pd" rule (not yet any special cases checking etc.): diff --git gcc/match.pd gcc/match.pd index b36d7ccb5dc3..4c23116308da 100644 --- gcc/match.pd +++ gcc/match.pd @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { wide_int_to_tree (sizetype, off); }) { swap_p ? @0 : @2; })) { rhs_tree; }) + +/* Given: + + j = in / N_I + i = in % N_I + + ..., fold: + + out = j * N_I + i + + ..., into: + + out = in +*/ + +/* As long as only considering N_I being INTEGER_CST (which are always second + argument?), probably don't need ":c" variants? */ + +(simplify + (plus:c + (mult:c + (trunc_div @0 INTEGER_CST@1) + INTEGER_CST@1) + (trunc_mod @0 INTEGER_CST@1)) + (convert @0)) You should only specify INTEGER_CST@1 on the first occurence, the others can be just @1. (you may be interested in @@1 at some point, but that gets tricky) ..., the original code: int f1(int in) { int j = in / N_I; int i = in % N_I; int out = j * N_I + i; return out; } ... gets simplified from ("div-mod-0.c.027t.objsz1"): f1 (int in) { int out; int i; int j; int _1; int _6; : gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_return <_6> } ... to ("div-mod-0.c.028t.ccp1"): f1 (int in) { int out; int i; int j; int _1; : gimple_assign gimple_assign gimple_assign gimple_return } (The three dead "gimple_assign"s get eliminated later on.) So, that works. However, it doesn't work yet for the original construct that I'd ran into, which looks like this: [...] int i; int j; [...] signed int .offset.5_2; [...] unsigned int .offset.7_23; unsigned int .iter.0_24; unsigned int _25; unsigned int _26; [...] unsigned int .iter.0_32; [...] : # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Resolving the "a[j][i] = 123" we'll need to look into later. As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we get: [...] int i; int j; long unsigned int _1; long unsigned int _2; sizetype _3; sizetype _4; sizetype _5; int * _6; [...] signed int .offset.5_8; [...] unsigned int .offset.7_29; unsigned int .iter.0_30; unsigned int _31; unsigned int _32; [...] : # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which I have not yet figured out how to handle, I suppose? I tried some things but couldn't get it to work. Apparently the existing instances of "(match (nop_convert @0)" and "Basic strip-useless-type-conversions / strip_nops" rule also don't handle these; should they? Or, are in fact here the types mixed up too much? "(match (nop_convert @0)" defines a shortcut so some transformations can use nop_convert to detect some specific conversions, but it doesn't do anything by itself. "Basic strip-useless-type-conversions" strips conversions that are *useless*, essentially from a type to the same type. If you want to handle true conversions, you need to do that explicitly, see the many transformations that use convert? convert1? convert2? and specify for which particular conversions the transformation is valid. Finding out the right conditions to detect these conversions is often the most painful part of writing a match.pd transformation. I hope to get some time again soon to continue looking into this, but if anybody got any ideas, I'm all ears. -- Marc Glisse
Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)
On Mon, Oct 22, 2018 at 6:35 PM Thomas Schwinge wrote: > > Hi! > > Thanks for all your comments already! I continued looked into this for a > bit (but then got interrupted by a higher-priority task). Regarding this > one specifically: > > On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote: > > On Fri, 12 Oct 2018, Thomas Schwinge wrote: > > > > > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is > > > also present when running the following code through the vectorizer: > > > > > >for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > > { > > >int j = tmp / N_I; > > >int i = tmp % N_I; > > >a[j][i] = 0; > > > } > > > > > > ... whereas the following variant (obviously) does vectorize: > > > > > >int a[NJ * NI]; > > > > > >for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > > a[tmp] = 0; > > > > I had a quick look at the difference, and a[j][i] remains in this form > > throughout optimization. If I write instead *((*(a+j))+i) = 0; I get > > > >j_10 = tmp_17 / 1025; > >i_11 = tmp_17 % 1025; > >_1 = (long unsigned int) j_10; > >_2 = _1 * 1025; > >_3 = (sizetype) i_11; > >_4 = _2 + _3; > > > > or for a power of 2 > > > >j_10 = tmp_17 >> 10; > >i_11 = tmp_17 & 1023; > >_1 = (long unsigned int) j_10; > >_2 = _1 * 1024; > >_3 = (sizetype) i_11; > >_4 = _2 + _3; > > > > and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least > > I think that's true). > > > > So there are missing match.pd transformations in addition to whatever > > scev/ivdep/other work is needed. > > With a very simplistic "match.pd" rule (not yet any special cases > checking etc.): > > diff --git gcc/match.pd gcc/match.pd > index b36d7ccb5dc3..4c23116308da 100644 > --- gcc/match.pd > +++ gcc/match.pd > @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > { wide_int_to_tree (sizetype, off); }) > { swap_p ? @0 : @2; })) > { rhs_tree; }) > + > +/* Given: > + > + j = in / N_I > + i = in % N_I > + > + ..., fold: > + > + out = j * N_I + i > + > + ..., into: > + > + out = in > +*/ > + > +/* As long as only considering N_I being INTEGER_CST (which are always second > + argument?), probably don't need ":c" variants? */ > + > +(simplify > + (plus:c > + (mult:c > + (trunc_div @0 INTEGER_CST@1) > + INTEGER_CST@1) > + (trunc_mod @0 INTEGER_CST@1)) > + (convert @0)) > > ..., the original code: > > int f1(int in) > { > int j = in / N_I; > int i = in % N_I; > > int out = j * N_I + i; > > return out; > } > > ... gets simplified from ("div-mod-0.c.027t.objsz1"): > > f1 (int in) > { > int out; > int i; > int j; > int _1; > int _6; > >: > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_return <_6> > > } > > ... to ("div-mod-0.c.028t.ccp1"): > > f1 (int in) > { > int out; > int i; > int j; > int _1; > >: > gimple_assign > gimple_assign > gimple_assign > gimple_return > > } > > (The three dead "gimple_assign"s get eliminated later on.) > > So, that works. > > However, it doesn't work yet for the original construct that I'd ran > into, which looks like this: > > [...] > int i; > int j; > [...] > signed int .offset.5_2; > [...] > unsigned int .offset.7_23; > unsigned int .iter.0_24; > unsigned int _25; > unsigned int _26; > [...] > unsigned int .iter.0_32; > [...] > > : > # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > [...] > > Resolving the "a[j][i] = 123" we'll need to look into later. > > As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we > get: > > [...] > int i; > int j; > long unsigned int _1; > long unsigned int _2; > sizetype _3; > sizetype _4; > sizetype _5; > int * _6; > [...] > signed int .offset.5_8; > [...] > unsigned int .offset.7_29; > unsigned int .iter.0_30; > unsigned int _31; > unsigned int _32; > [...] > > : > # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > gimple_assign > [...] > > Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", > but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which > I have not yet figured out how to handle, I suppose? I tried
"match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)
Hi! Thanks for all your comments already! I continued looked into this for a bit (but then got interrupted by a higher-priority task). Regarding this one specifically: On Fri, 12 Oct 2018 21:14:11 +0200, Marc Glisse wrote: > On Fri, 12 Oct 2018, Thomas Schwinge wrote: > > > Hmm, and without any OpenACC/OpenMP etc., actually the same problem is > > also present when running the following code through the vectorizer: > > > >for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > { > >int j = tmp / N_I; > >int i = tmp % N_I; > >a[j][i] = 0; > > } > > > > ... whereas the following variant (obviously) does vectorize: > > > >int a[NJ * NI]; > > > >for (int tmp = 0; tmp < N_J * N_I; ++tmp) > > a[tmp] = 0; > > I had a quick look at the difference, and a[j][i] remains in this form > throughout optimization. If I write instead *((*(a+j))+i) = 0; I get > >j_10 = tmp_17 / 1025; >i_11 = tmp_17 % 1025; >_1 = (long unsigned int) j_10; >_2 = _1 * 1025; >_3 = (sizetype) i_11; >_4 = _2 + _3; > > or for a power of 2 > >j_10 = tmp_17 >> 10; >i_11 = tmp_17 & 1023; >_1 = (long unsigned int) j_10; >_2 = _1 * 1024; >_3 = (sizetype) i_11; >_4 = _2 + _3; > > and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least > I think that's true). > > So there are missing match.pd transformations in addition to whatever > scev/ivdep/other work is needed. With a very simplistic "match.pd" rule (not yet any special cases checking etc.): diff --git gcc/match.pd gcc/match.pd index b36d7ccb5dc3..4c23116308da 100644 --- gcc/match.pd +++ gcc/match.pd @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { wide_int_to_tree (sizetype, off); }) { swap_p ? @0 : @2; })) { rhs_tree; }) + +/* Given: + + j = in / N_I + i = in % N_I + + ..., fold: + + out = j * N_I + i + + ..., into: + + out = in +*/ + +/* As long as only considering N_I being INTEGER_CST (which are always second + argument?), probably don't need ":c" variants? */ + +(simplify + (plus:c + (mult:c + (trunc_div @0 INTEGER_CST@1) + INTEGER_CST@1) + (trunc_mod @0 INTEGER_CST@1)) + (convert @0)) ..., the original code: int f1(int in) { int j = in / N_I; int i = in % N_I; int out = j * N_I + i; return out; } ... gets simplified from ("div-mod-0.c.027t.objsz1"): f1 (int in) { int out; int i; int j; int _1; int _6; : gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_return <_6> } ... to ("div-mod-0.c.028t.ccp1"): f1 (int in) { int out; int i; int j; int _1; : gimple_assign gimple_assign gimple_assign gimple_return } (The three dead "gimple_assign"s get eliminated later on.) So, that works. However, it doesn't work yet for the original construct that I'd ran into, which looks like this: [...] int i; int j; [...] signed int .offset.5_2; [...] unsigned int .offset.7_23; unsigned int .iter.0_24; unsigned int _25; unsigned int _26; [...] unsigned int .iter.0_32; [...] : # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Resolving the "a[j][i] = 123" we'll need to look into later. As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we get: [...] int i; int j; long unsigned int _1; long unsigned int _2; sizetype _3; sizetype _4; sizetype _5; int * _6; [...] signed int .offset.5_8; [...] unsigned int .offset.7_29; unsigned int .iter.0_30; unsigned int _31; unsigned int _32; [...] : # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which I have not yet figured out how to handle, I suppose? I tried some things but couldn't get it to work. Apparently the existing instances of "(match (nop_convert @0)" and "Basic strip-useless-type-conversions / strip_nops" rule also don't handle these; should they? Or, are in fact here the types mixed up too much? I hope to get some time again soon to continue looking into this, but if anybody got any ideas, I'm all ears. Grüße Thomas