Re: [PATCH] Fix PR84512
Hi Eric, >> So it failed before Toms original patch. Please add sparc-solaris >> to the list of XFAILed targets. > > SPARC/Linux is affected too so sparc*-*-* instead. actually, it's sparc*-*-* && lp64 only. Done like this after testing on sparc-sun-solaris2.11 and i386-pc-solaris2.11. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-03-21 Rainer Orth* gcc.dg/tree-ssa/pr84512.c: xfail on 64-bit SPARC. # HG changeset patch # Parent 50996d41bbbc78ab2cf0002ba6479559089a2337 xfail gcc.dg/tree-ssa/pr84512.c on 64-bit sparc diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c --- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c @@ -13,4 +13,4 @@ int foo() } /* Target nvptx xfail due to PR84958. */ -/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail nvptx*-*-* } } } */ +/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { nvptx*-*-* || { sparc*-*-* && lp64 } } } } } */
Re: [PATCH] Fix PR84512
> So it failed before Toms original patch. Please add sparc-solaris > to the list of XFAILed targets. SPARC/Linux is affected too so sparc*-*-* instead. -- Eric Botcazou
Re: [PATCH] Fix PR84512
On Tue, 20 Mar 2018, Rainer Orth wrote: > Hi Tom, > > > On 03/19/2018 10:11 AM, Richard Biener wrote: > >> On Fri, 16 Mar 2018, Tom de Vries wrote: > >> > >>> On 03/16/2018 12:55 PM, Richard Biener wrote: > On Fri, 16 Mar 2018, Tom de Vries wrote: > > > On 02/27/2018 01:42 PM, Richard Biener wrote: > >> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > >> === > >> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c(nonexistent) > >> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c(working copy) > >> @@ -0,0 +1,15 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O3 -fdump-tree-optimized" } */ > >> + > >> +int foo() > >> +{ > >> + int a[10]; > >> + for(int i = 0; i < 10; ++i) > >> +a[i] = i*i; > >> + int res = 0; > >> + for(int i = 0; i < 10; ++i) > >> +res += a[i]; > >> + return res; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > > > This fails for nvptx, because it doesn't have the required vector > > operations. > > To fix the fail, I've added requiring effective target vect_int_mult. > > On targets that do not vectorize you should see the scalar loops unrolled > instead. Or do you have only one loop vectorized? > >>> > >>> Sort of. Loop vectorization has no effect, and the scalar loops are > >>> completely > >>> unrolled. But then slp vectorization vectorizes the stores. > >>> > >>> So at optimized we have: > >>> ... > >>>MEM[(int *)] = { 0, 1 }; > >>>MEM[(int *) + 8B] = { 4, 9 }; > >>>MEM[(int *) + 16B] = { 16, 25 }; > >>>MEM[(int *) + 24B] = { 36, 49 }; > >>>MEM[(int *) + 32B] = { 64, 81 }; > >>>_6 = a[0]; > >>>_28 = a[1]; > >>>res_29 = _6 + _28; > >>>_35 = a[2]; > >>>res_36 = res_29 + _35; > >>>_42 = a[3]; > >>>res_43 = res_36 + _42; > >>>_49 = a[4]; > >>>res_50 = res_43 + _49; > >>>_56 = a[5]; > >>>res_57 = res_50 + _56; > >>>_63 = a[6]; > >>>res_64 = res_57 + _63; > >>>_70 = a[7]; > >>>res_71 = res_64 + _70; > >>>_77 = a[8]; > >>>res_78 = res_71 + _77; > >>>_2 = a[9]; > >>>res_11 = _2 + res_78; > >>>a ={v} {CLOBBER}; > >>>return res_11; > >>> ... > >>> > >>> The stores and loads are eliminated by dse1 in the rtl phase, and in the > >>> end > >>> we have: > >>> ... > >>> .visible .func (.param.u32 %value_out) foo > >>> { > >>> .reg.u32 %value; > >>> .local .align 16 .b8 %frame_ar[48]; > >>> .reg.u64 %frame; > >>> cvta.local.u64 %frame, %frame_ar; > >>> mov.u32 %value, 285; > >>> st.param.u32[%value_out], %value; > >>> ret; > >>> } > >>> ... > >>> > That's precisely > what the PR was about... which means it isn't fixed for nvptx :/ > >>> > >>> Indeed the assembly is not optimal, and would be optimal if we'd have > >>> optimal > >>> code at optimized. > >>> > >>> FWIW, using this patch we generate optimal code at optimized: > >>> ... > >>> diff --git a/gcc/passes.def b/gcc/passes.def > >>> index 3ebcfc30349..6b64f600c4a 100644 > >>> --- a/gcc/passes.def > >>> +++ b/gcc/passes.def > >>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see > >>> NEXT_PASS (pass_tracer); > >>> NEXT_PASS (pass_thread_jumps); > >>> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > >>> + NEXT_PASS (pass_fre); > >>> NEXT_PASS (pass_strlen); > >>> NEXT_PASS (pass_thread_jumps); > >>> NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > >>> ... > >>> > >>> and we get: > >>> ... > >>> .visible .func (.param.u32 %value_out) foo > >>> { > >>> .reg.u32 %value; > >>> mov.u32 %value, 285; > >>> st.param.u32[%value_out], %value; > >>> ret; > >>> } > >>> ... > >>> > >>> I could file a missing optimization PR for nvptx, but I'm not sure where > >>> this > >>> should be fixed. > >> > >> Ah, yeah... the usual issue then. > >> > >> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? > >> > > > > Done. > > > > Committed at attached. > > this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11: > > FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" > > where it was UNSUPPORTED before. So it failed before Toms original patch. Please add sparc-solaris to the list of XFAILed targets. > The dump has > > ;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, > symbol_order=0) > > foo () > { > int res; > int a[10]; > int _2; > int _6; > int _28; > int _35; > int _42; > int _49; > int _56; > int _63; > int _70; > int _77; > >[local count: 97603132]: > MEM[(int *)] = { 0, 1 }; > MEM[(int *) + 8B] = { 4, 9 }; > MEM[(int *) + 16B] = { 16, 25 }; > MEM[(int *) + 24B] = { 36, 49 }; >
Re: [PATCH] Fix PR84512
Hi Tom, > On 03/19/2018 10:11 AM, Richard Biener wrote: >> On Fri, 16 Mar 2018, Tom de Vries wrote: >> >>> On 03/16/2018 12:55 PM, Richard Biener wrote: On Fri, 16 Mar 2018, Tom de Vries wrote: > On 02/27/2018 01:42 PM, Richard Biener wrote: >> Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c >> === >> --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) >> +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) >> @@ -0,0 +1,15 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O3 -fdump-tree-optimized" } */ >> + >> +int foo() >> +{ >> + int a[10]; >> + for(int i = 0; i < 10; ++i) >> +a[i] = i*i; >> + int res = 0; >> + for(int i = 0; i < 10; ++i) >> +res += a[i]; >> + return res; >> +} >> + >> +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > This fails for nvptx, because it doesn't have the required vector > operations. > To fix the fail, I've added requiring effective target vect_int_mult. On targets that do not vectorize you should see the scalar loops unrolled instead. Or do you have only one loop vectorized? >>> >>> Sort of. Loop vectorization has no effect, and the scalar loops are >>> completely >>> unrolled. But then slp vectorization vectorizes the stores. >>> >>> So at optimized we have: >>> ... >>>MEM[(int *)] = { 0, 1 }; >>>MEM[(int *) + 8B] = { 4, 9 }; >>>MEM[(int *) + 16B] = { 16, 25 }; >>>MEM[(int *) + 24B] = { 36, 49 }; >>>MEM[(int *) + 32B] = { 64, 81 }; >>>_6 = a[0]; >>>_28 = a[1]; >>>res_29 = _6 + _28; >>>_35 = a[2]; >>>res_36 = res_29 + _35; >>>_42 = a[3]; >>>res_43 = res_36 + _42; >>>_49 = a[4]; >>>res_50 = res_43 + _49; >>>_56 = a[5]; >>>res_57 = res_50 + _56; >>>_63 = a[6]; >>>res_64 = res_57 + _63; >>>_70 = a[7]; >>>res_71 = res_64 + _70; >>>_77 = a[8]; >>>res_78 = res_71 + _77; >>>_2 = a[9]; >>>res_11 = _2 + res_78; >>>a ={v} {CLOBBER}; >>>return res_11; >>> ... >>> >>> The stores and loads are eliminated by dse1 in the rtl phase, and in the end >>> we have: >>> ... >>> .visible .func (.param.u32 %value_out) foo >>> { >>> .reg.u32 %value; >>> .local .align 16 .b8 %frame_ar[48]; >>> .reg.u64 %frame; >>> cvta.local.u64 %frame, %frame_ar; >>> mov.u32 %value, 285; >>> st.param.u32[%value_out], %value; >>> ret; >>> } >>> ... >>> That's precisely what the PR was about... which means it isn't fixed for nvptx :/ >>> >>> Indeed the assembly is not optimal, and would be optimal if we'd have >>> optimal >>> code at optimized. >>> >>> FWIW, using this patch we generate optimal code at optimized: >>> ... >>> diff --git a/gcc/passes.def b/gcc/passes.def >>> index 3ebcfc30349..6b64f600c4a 100644 >>> --- a/gcc/passes.def >>> +++ b/gcc/passes.def >>> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see >>> NEXT_PASS (pass_tracer); >>> NEXT_PASS (pass_thread_jumps); >>> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); >>> + NEXT_PASS (pass_fre); >>> NEXT_PASS (pass_strlen); >>> NEXT_PASS (pass_thread_jumps); >>> NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); >>> ... >>> >>> and we get: >>> ... >>> .visible .func (.param.u32 %value_out) foo >>> { >>> .reg.u32 %value; >>> mov.u32 %value, 285; >>> st.param.u32[%value_out], %value; >>> ret; >>> } >>> ... >>> >>> I could file a missing optimization PR for nvptx, but I'm not sure where >>> this >>> should be fixed. >> >> Ah, yeah... the usual issue then. >> >> Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? >> > > Done. > > Committed at attached. this caused the test to FAIL on 64-bit (only) sparc-sun-solaris2.11: FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;" where it was UNSUPPORTED before. The dump has ;; Function foo (foo, funcdef_no=0, decl_uid=1557, cgraph_uid=0, symbol_order=0) foo () { int res; int a[10]; int _2; int _6; int _28; int _35; int _42; int _49; int _56; int _63; int _70; int _77; [local count: 97603132]: MEM[(int *)] = { 0, 1 }; MEM[(int *) + 8B] = { 4, 9 }; MEM[(int *) + 16B] = { 16, 25 }; MEM[(int *) + 24B] = { 36, 49 }; MEM[(int *) + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; } Rainer --
Re: [PATCH] Fix PR84512
On 03/19/2018 10:11 AM, Richard Biener wrote: On Fri, 16 Mar 2018, Tom de Vries wrote: On 03/16/2018 12:55 PM, Richard Biener wrote: On Fri, 16 Mar 2018, Tom de Vries wrote: On 02/27/2018 01:42 PM, Richard Biener wrote: Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int foo() +{ + int a[10]; + for(int i = 0; i < 10; ++i) +a[i] = i*i; + int res = 0; + for(int i = 0; i < 10; ++i) +res += a[i]; + return res; +} + +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ This fails for nvptx, because it doesn't have the required vector operations. To fix the fail, I've added requiring effective target vect_int_mult. On targets that do not vectorize you should see the scalar loops unrolled instead. Or do you have only one loop vectorized? Sort of. Loop vectorization has no effect, and the scalar loops are completely unrolled. But then slp vectorization vectorizes the stores. So at optimized we have: ... MEM[(int *)] = { 0, 1 }; MEM[(int *) + 8B] = { 4, 9 }; MEM[(int *) + 16B] = { 16, 25 }; MEM[(int *) + 24B] = { 36, 49 }; MEM[(int *) + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; ... The stores and loads are eliminated by dse1 in the rtl phase, and in the end we have: ... .visible .func (.param.u32 %value_out) foo { .reg.u32 %value; .local .align 16 .b8 %frame_ar[48]; .reg.u64 %frame; cvta.local.u64 %frame, %frame_ar; mov.u32 %value, 285; st.param.u32[%value_out], %value; ret; } ... That's precisely what the PR was about... which means it isn't fixed for nvptx :/ Indeed the assembly is not optimal, and would be optimal if we'd have optimal code at optimized. FWIW, using this patch we generate optimal code at optimized: ... diff --git a/gcc/passes.def b/gcc/passes.def index 3ebcfc30349..6b64f600c4a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tracer); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); + NEXT_PASS (pass_fre); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); ... and we get: ... .visible .func (.param.u32 %value_out) foo { .reg.u32 %value; mov.u32 %value, 285; st.param.u32[%value_out], %value; ret; } ... I could file a missing optimization PR for nvptx, but I'm not sure where this should be fixed. Ah, yeah... the usual issue then. Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? Done. Committed at attached. Thanks, - Tom [testsuite] Add nvptx xfail to pr84512.c 2018-03-19 Tom de Vries* gcc.dg/tree-ssa/pr84512.c: Don't require effective target vect_int_mult. Add nvptx xfail for PR84958. --- gcc/testsuite/ChangeLog | 5 + gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c index 41b6c06..9560160 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c @@ -1,6 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-optimized" } */ -/* { dg-require-effective-target vect_int_mult } */ int foo() { @@ -13,4 +12,5 @@ int foo() return res; } -/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ +/* Target nvptx xfail due to PR84958. */ +/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail nvptx*-*-* } } } */
Re: [PATCH] Fix PR84512
On Fri, 16 Mar 2018, Tom de Vries wrote: > On 03/16/2018 12:55 PM, Richard Biener wrote: > > On Fri, 16 Mar 2018, Tom de Vries wrote: > > > > > On 02/27/2018 01:42 PM, Richard Biener wrote: > > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > > > > === > > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) > > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) > > > > @@ -0,0 +1,15 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > > > + > > > > +int foo() > > > > +{ > > > > + int a[10]; > > > > + for(int i = 0; i < 10; ++i) > > > > +a[i] = i*i; > > > > + int res = 0; > > > > + for(int i = 0; i < 10; ++i) > > > > +res += a[i]; > > > > + return res; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > > > > > This fails for nvptx, because it doesn't have the required vector > > > operations. > > > To fix the fail, I've added requiring effective target vect_int_mult. > > > > On targets that do not vectorize you should see the scalar loops unrolled > > instead. Or do you have only one loop vectorized? > > Sort of. Loop vectorization has no effect, and the scalar loops are completely > unrolled. But then slp vectorization vectorizes the stores. > > So at optimized we have: > ... > MEM[(int *)] = { 0, 1 }; > MEM[(int *) + 8B] = { 4, 9 }; > MEM[(int *) + 16B] = { 16, 25 }; > MEM[(int *) + 24B] = { 36, 49 }; > MEM[(int *) + 32B] = { 64, 81 }; > _6 = a[0]; > _28 = a[1]; > res_29 = _6 + _28; > _35 = a[2]; > res_36 = res_29 + _35; > _42 = a[3]; > res_43 = res_36 + _42; > _49 = a[4]; > res_50 = res_43 + _49; > _56 = a[5]; > res_57 = res_50 + _56; > _63 = a[6]; > res_64 = res_57 + _63; > _70 = a[7]; > res_71 = res_64 + _70; > _77 = a[8]; > res_78 = res_71 + _77; > _2 = a[9]; > res_11 = _2 + res_78; > a ={v} {CLOBBER}; > return res_11; > ... > > The stores and loads are eliminated by dse1 in the rtl phase, and in the end > we have: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > .local .align 16 .b8 %frame_ar[48]; > .reg.u64 %frame; > cvta.local.u64 %frame, %frame_ar; > mov.u32 %value, 285; > st.param.u32[%value_out], %value; > ret; > } > ... > > > That's precisely > > what the PR was about... which means it isn't fixed for nvptx :/ > > Indeed the assembly is not optimal, and would be optimal if we'd have optimal > code at optimized. > > FWIW, using this patch we generate optimal code at optimized: > ... > diff --git a/gcc/passes.def b/gcc/passes.def > index 3ebcfc30349..6b64f600c4a 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_tracer); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > + NEXT_PASS (pass_fre); >NEXT_PASS (pass_strlen); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > ... > > and we get: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > mov.u32 %value, 285; > st.param.u32[%value_out], %value; > ret; > } > ... > > I could file a missing optimization PR for nvptx, but I'm not sure where this > should be fixed. Ah, yeah... the usual issue then. Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? Thanks, Richard.
Re: [PATCH] Fix PR84512
On 03/16/2018 12:55 PM, Richard Biener wrote: On Fri, 16 Mar 2018, Tom de Vries wrote: On 02/27/2018 01:42 PM, Richard Biener wrote: Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int foo() +{ + int a[10]; + for(int i = 0; i < 10; ++i) +a[i] = i*i; + int res = 0; + for(int i = 0; i < 10; ++i) +res += a[i]; + return res; +} + +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ This fails for nvptx, because it doesn't have the required vector operations. To fix the fail, I've added requiring effective target vect_int_mult. On targets that do not vectorize you should see the scalar loops unrolled instead. Or do you have only one loop vectorized? Sort of. Loop vectorization has no effect, and the scalar loops are completely unrolled. But then slp vectorization vectorizes the stores. So at optimized we have: ... MEM[(int *)] = { 0, 1 }; MEM[(int *) + 8B] = { 4, 9 }; MEM[(int *) + 16B] = { 16, 25 }; MEM[(int *) + 24B] = { 36, 49 }; MEM[(int *) + 32B] = { 64, 81 }; _6 = a[0]; _28 = a[1]; res_29 = _6 + _28; _35 = a[2]; res_36 = res_29 + _35; _42 = a[3]; res_43 = res_36 + _42; _49 = a[4]; res_50 = res_43 + _49; _56 = a[5]; res_57 = res_50 + _56; _63 = a[6]; res_64 = res_57 + _63; _70 = a[7]; res_71 = res_64 + _70; _77 = a[8]; res_78 = res_71 + _77; _2 = a[9]; res_11 = _2 + res_78; a ={v} {CLOBBER}; return res_11; ... The stores and loads are eliminated by dse1 in the rtl phase, and in the end we have: ... .visible .func (.param.u32 %value_out) foo { .reg.u32 %value; .local .align 16 .b8 %frame_ar[48]; .reg.u64 %frame; cvta.local.u64 %frame, %frame_ar; mov.u32 %value, 285; st.param.u32[%value_out], %value; ret; } ... That's precisely what the PR was about... which means it isn't fixed for nvptx :/ Indeed the assembly is not optimal, and would be optimal if we'd have optimal code at optimized. FWIW, using this patch we generate optimal code at optimized: ... diff --git a/gcc/passes.def b/gcc/passes.def index 3ebcfc30349..6b64f600c4a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tracer); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); + NEXT_PASS (pass_fre); NEXT_PASS (pass_strlen); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); ... and we get: ... .visible .func (.param.u32 %value_out) foo { .reg.u32 %value; mov.u32 %value, 285; st.param.u32[%value_out], %value; ret; } ... I could file a missing optimization PR for nvptx, but I'm not sure where this should be fixed. Thanks, - Tom
Re: [PATCH] Fix PR84512
On Fri, 16 Mar 2018, Tom de Vries wrote: > On 02/27/2018 01:42 PM, Richard Biener wrote: > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > > === > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > + > > +int foo() > > +{ > > + int a[10]; > > + for(int i = 0; i < 10; ++i) > > +a[i] = i*i; > > + int res = 0; > > + for(int i = 0; i < 10; ++i) > > +res += a[i]; > > + return res; > > +} > > + > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > This fails for nvptx, because it doesn't have the required vector operations. > To fix the fail, I've added requiring effective target vect_int_mult. On targets that do not vectorize you should see the scalar loops unrolled instead. Or do you have only one loop vectorized? That's precisely what the PR was about... which means it isn't fixed for nvptx :/ Richard. > Thanks, > - Tom > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Fix PR84512
On 02/27/2018 01:42 PM, Richard Biener wrote: Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int foo() +{ + int a[10]; + for(int i = 0; i < 10; ++i) +a[i] = i*i; + int res = 0; + for(int i = 0; i < 10; ++i) +res += a[i]; + return res; +} + +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ This fails for nvptx, because it doesn't have the required vector operations. To fix the fail, I've added requiring effective target vect_int_mult. Thanks, - Tom [testsuite] Require vect_int_mult in pr84512.c 2018-03-16 Tom de Vries* gcc.dg/tree-ssa/pr84512.c: Require effective target vect_int_mult. --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c index 288fa5d..41b6c06 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O3 -fdump-tree-optimized" } */ +/* { dg-require-effective-target vect_int_mult } */ int foo() {