Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
On Mon, 8 Jul 2019, Kewen.Lin wrote: > Hi Richard, > > Thanks a lot for your review and reply, I've updated the patch accordingly. > > Main changes compared to previous one: > - Consider SVE (poly_int) on bit_field_ref size/offset. > - Filter valid candidates with sbitmap first. > - Support multiple modes (TODO 1). > - Test cases: add SSE2 support for 1..5, update run result check for 1, > add two more cases (5,6)to verify multiple mode support (x86). > > Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu > and x86_64-linux-gnu. > > Could you help to review it again? Thanks in advance! + tree rhs = gimple_assign_rhs1 (oe1def); + tree vec = TREE_OPERAND (rhs, 0); + tree vec_type = TREE_TYPE (vec); + + if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type)) + continue; ... + /* Ignore it if target machine can't support this VECTOR type. */ + if (!VECTOR_MODE_P (TYPE_MODE (vec_type))) + continue; put the check below next to the one above, it's cheaper than the poly-int stuff that currently preceeds it. + v_info_ptr *info_ptr = v_info_map.get (vec); + if (info_ptr) + { there is get_or_insert which enables you to mostly merge the two cases with a preceeding if (!existed) v_info_ptr = new v_info; also eliding the extra hash-lookup the put operation otherwise requires. + for (hash_map::iterator it = v_info_map.begin (); + it != v_info_map.end (); ++it) +{ + tree cand_vec = (*it).first; + v_info_ptr cand_info = (*it).second; + unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant (); please add a quick out like if (cand_info->length () != num_elems) continue; since to have no holes and no overlap you need exactly that many. I think you can avoid the somewhat ugly mode_to_total counter array. Since you have sorted valid_vecs after modes simply do for (unsigned i = 0; i < valid_vecs.length () - 1; ++i) { tree tvec = valid_vecs[i]; enum machine_mode mode = TYPE_MODE (TREE_TYPE (tvec)); (please don't use unsigned int for mode!) /* Skip modes with only a single candidate. */ if (TYPE_MODE (TREE_TYPE (valid_vecs[i+1])) != mode) continue; do { ... } while (...) Richard. > Kewen > > > > gcc/ChangeLog > > 2019-07-08 Kewen Lin > > PR tree-optimization/88497 > * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of > GIMPLE_BINARY_RHS check and gimple_visited_p check, call new > function undistribute_bitref_for_vector. > (undistribute_bitref_for_vector): New function. > (cleanup_vinfo_map): Likewise. > (sort_by_mach_mode): Likewise. > > gcc/testsuite/ChangeLog > > 2019-07-08 Kewen Lin > > * gcc.dg/tree-ssa/pr88497-1.c: New test. > * gcc.dg/tree-ssa/pr88497-2.c: Likewise. > * gcc.dg/tree-ssa/pr88497-3.c: Likewise. > * gcc.dg/tree-ssa/pr88497-4.c: Likewise. > * gcc.dg/tree-ssa/pr88497-5.c: Likewise. > * gcc.dg/tree-ssa/pr88497-6.c: Likewise. > * gcc.dg/tree-ssa/pr88497-7.c: Likewise. > > > -- Richard Biener SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
On Tue, Jul 09, 2019 at 10:28:06AM +0800, Kewen.Lin wrote: > on 2019/7/9 上午12:32, Segher Boessenkool wrote: > > On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote: > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c > >> @@ -0,0 +1,60 @@ > >> +/* { dg-do run } */ > >> +/* { dg-require-effective-target vect_double } */ > >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } > >> } } */ > > > > For "dg-do run" tests, you need "powerpc_vsx_hw". "_ok" only tests if > > the assembler can handle VSX instructions, not whether the test system > > can run them. (powerpc_vsx_ok is what you need for "dg-do assemble" or > > "dg-do link" tests. It also tests if you can use -mvsx, but that doesn't > > do what you might hope it does: you can use -mvsx together with a -mcpu= > > that doesn't support VSX, for example). > > Thanks, I will update it. But sorry that I can't find "powerpc_vsx_hw" but > "vsx_hw_available". I guess it's the one you are referring to? Yeah, sorry. You can also use just "vsx_hw". > And I happened > to find the vect_double will force powerpc to check vsx_hw_available. Yes :-) So this whole line is unnecessary. Segher
Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
Hi Segher, on 2019/7/9 上午12:32, Segher Boessenkool wrote: > Hi Kewen, > > On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote: >> gcc/ChangeLog > > (You have trailing spaces in the changelog, fwiw). > Thanks for catching! >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c >> @@ -0,0 +1,60 @@ >> +/* { dg-do run } */ >> +/* { dg-require-effective-target vect_double } */ >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } >> } */ > > For "dg-do run" tests, you need "powerpc_vsx_hw". "_ok" only tests if > the assembler can handle VSX instructions, not whether the test system > can run them. (powerpc_vsx_ok is what you need for "dg-do assemble" or > "dg-do link" tests. It also tests if you can use -mvsx, but that doesn't > do what you might hope it does: you can use -mvsx together with a -mcpu= > that doesn't support VSX, for example). > Thanks, I will update it. But sorry that I can't find "powerpc_vsx_hw" but "vsx_hw_available". I guess it's the one you are referring to? And I happened to find the vect_double will force powerpc to check vsx_hw_available. This reminds me I should use sse2 instead of sse2-runtime for case 2-5. >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c >> @@ -0,0 +1,37 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_float } */ >> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* >> } } } */ > > This one is fine, and the rest of the tests as well. > > > Segher >
Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
Hi Kewen, On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote: > gcc/ChangeLog (You have trailing spaces in the changelog, fwiw). > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c > @@ -0,0 +1,60 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target vect_double } */ > +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } > } */ For "dg-do run" tests, you need "powerpc_vsx_hw". "_ok" only tests if the assembler can handle VSX instructions, not whether the test system can run them. (powerpc_vsx_ok is what you need for "dg-do assemble" or "dg-do link" tests. It also tests if you can use -mvsx, but that doesn't do what you might hope it does: you can use -mvsx together with a -mcpu= that doesn't support VSX, for example). > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_float } */ > +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* > } } } */ This one is fine, and the rest of the tests as well. Segher
[PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
Hi Richard, Thanks a lot for your review and reply, I've updated the patch accordingly. Main changes compared to previous one: - Consider SVE (poly_int) on bit_field_ref size/offset. - Filter valid candidates with sbitmap first. - Support multiple modes (TODO 1). - Test cases: add SSE2 support for 1..5, update run result check for 1, add two more cases (5,6)to verify multiple mode support (x86). Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu and x86_64-linux-gnu. Could you help to review it again? Thanks in advance! Kewen gcc/ChangeLog 2019-07-08 Kewen Lin PR tree-optimization/88497 * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of GIMPLE_BINARY_RHS check and gimple_visited_p check, call new function undistribute_bitref_for_vector. (undistribute_bitref_for_vector): New function. (cleanup_vinfo_map): Likewise. (sort_by_mach_mode): Likewise. gcc/testsuite/ChangeLog 2019-07-08 Kewen Lin * gcc.dg/tree-ssa/pr88497-1.c: New test. * gcc.dg/tree-ssa/pr88497-2.c: Likewise. * gcc.dg/tree-ssa/pr88497-3.c: Likewise. * gcc.dg/tree-ssa/pr88497-4.c: Likewise. * gcc.dg/tree-ssa/pr88497-5.c: Likewise. * gcc.dg/tree-ssa/pr88497-6.c: Likewise. * gcc.dg/tree-ssa/pr88497-7.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c new file mode 100644 index 000..d17de42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c @@ -0,0 +1,60 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */ +/* { dg-additional-options "-mvsx" { target { powerpc*-*-* } } } */ +/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */ + +/* To test reassoc can undistribute vector bit_field_ref summation. + + arg1 and arg2 are two arrays whose elements of type vector double. + Assuming: + A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3], + B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3], + + Then: + V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3, + + reassoc transforms + + accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1] + + V3[0] + V3[1]; + + into: + + T = V0 + V1 + V2 + V3 + accumulator += T[0] + T[1]; + + Fewer bit_field_refs, only two for 128 or more bits vector. */ + +typedef double v2df __attribute__ ((vector_size (16))); +__attribute__ ((noinline)) double +test (double accumulator, v2df arg1[], v2df arg2[]) +{ + v2df temp; + temp = arg1[0] * arg2[0]; + accumulator += temp[0] + temp[1]; + temp = arg1[1] * arg2[1]; + accumulator += temp[0] + temp[1]; + temp = arg1[2] * arg2[2]; + accumulator += temp[0] + temp[1]; + temp = arg1[3] * arg2[3]; + accumulator += temp[0] + temp[1]; + return accumulator; +} + +extern void abort (void); + +int +main () +{ + v2df v2[4] = {{1.0, 2.0}, {4.0, 8.0}, {1.0, 3.0}, {9.0, 27.0}}; + v2df v3[4] = {{1.0, 4.0}, {16.0, 64.0}, {1.0, 2.0}, {3.0, 4.0}}; + double acc = 100.0; + double res = test (acc, v2, v3); + if (res != 827.0) +abort (); + return 0; +} +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c new file mode 100644 index 000..ce34c7a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */ +/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */ +/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */ + +/* To test reassoc can undistribute vector bit_field_ref on multiplication. + + v1, v2, v3, v4 of type vector float. + + reassoc transforms + + accumulator *= v1[0] * v1[1] * v1[2] * v1[3] * +v2[0] * v2[1] * v2[2] * v2[3] * +v3[0] * v3[1] * v3[2] * v3[3] * +v4[0] * v4[1] * v4[2] * v4[3] ; + + into: + + T = v1 * v2 * v3 * v4; + accumulator *= T[0] * T[1] * T[2] * T[3]; + + Fewer bit_field_refs, only four for 128 or more bits vector. */ + +typedef float v4sf __attribute__ ((vector_size (16))); +float +test (float accumulator, v4sf v1, v4sf v2, v4sf v3, v4sf v4) +{ + accumulator *= v1[0] * v1[1] * v1[2] * v1[3]; + accumulator *= v2[0] * v2[1] * v2[2] * v2[3]; + accumulator *=