Re: [PATCH] [x86] Some tidy up for RA related hooks.
On Mon, Nov 21, 2022 at 3:17 PM Uros Bizjak wrote: > > On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu wrote: > > > > On Mon, Nov 21, 2022 at 10:13 AM liuhongt wrote: > > > > > > When i'm working at [1] for ix86_can_change_mode_class, > > > I notice there're some incorrectness/misoptimization in current > > > RA-related hook. > > > This patch tries to do some fix and tidy up for them: > > > > > > 1. We also need to guard size of TO to be > > > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. > > > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode > > > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move > > > above SSE2, so no need for the condition of AVX512FP16 for those evex > > > sse registers. > > > 3. Allocate DI/HImode to sse register for SSE2 above just like > > > SImode since we've supported 16-bit data move between sse and gpr > > > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) > > > 0) or else RA will spill it. This enable optimization for > > > pieces-memset-{3,37,39}.c > > > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. > > > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard > > > size of TO. > > > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE > > > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to > > > .. > > > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. > > > (VALID_SSE_REG_MODE): Add DI/HImode. > > > * config/i386/mmx.md (*mov_internal): Add > > > ix86_hard_reg_move_ok to condition. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pieces-memset-3.c: Remove xfail. > > > * gcc.target/i386/pieces-memset-37.c: Remove xfail. > > > * gcc.target/i386/pieces-memset-39.c: Remove xfail. > > OK. > > This is somehow tricky part of the compiler, so it would be nice if > the patch can be split to a couple of patches to ease bisecting if > something goes wrong. OTOH, recently there were a couple of similar > changes in this area, and there were no problems. Ok, I'll separate ix86_hard_reg_move_ok part into a separate patch. > > Thanks, > Uros. > > > > --- > > > gcc/config/i386/i386.cc | 9 ++--- > > > gcc/config/i386/i386.h | 16 > > > gcc/config/i386/mmx.md | 6 -- > > > gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- > > > gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- > > > gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- > > > 6 files changed, 20 insertions(+), 23 deletions(-) > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index 292b32c5e99..030c26965ab 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, > > > machine_mode to, > > > the vec_dupv4hi pattern. > > > NB: SSE2 can load 16bit data to sse register via pinsrw. */ > > >int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; > > > - if (GET_MODE_SIZE (from) < mov_size) > > > + if (GET_MODE_SIZE (from) < mov_size > > > + || GET_MODE_SIZE (to) < mov_size) > > > return false; > > > } > > > > > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, > > > machine_mode mode) > > > || VALID_AVX512F_SCALAR_MODE (mode))) > > > return true; > > > > > > - /* For AVX512FP16, vmovw supports movement of HImode > > > -and HFmode between GPR and SSE registers. */ > > > - if (TARGET_AVX512FP16 > > > - && VALID_AVX512FP16_SCALAR_MODE (mode)) > > > - return true; > > > - > > >/* For AVX-5124FMAPS or AVX-5124VNNIW > > > allow V64SF and V64SI modes for special regnos. */ > > >if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > > index 3869db8f2d3..d9a1fb0e420 100644 > > > --- a/gcc/config/i386/i386.h > > > +++ b/gcc/config/i386/i386.h > > > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int > > > argc, const char **argv); > > >(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) > > > > > > #define VALID_AVX512F_SCALAR_MODE(MODE) > > > \ > > > - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\ > > > - || (MODE) == SFmode) > > > - > > > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ > > > - ((MODE) == HImode || (MODE) == HFmode) > > > + ((MODE) == DImode || (MODE) == DFmode > > > \ >
Re: [PATCH] [x86] Some tidy up for RA related hooks.
On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu wrote: > > On Mon, Nov 21, 2022 at 10:13 AM liuhongt wrote: > > > > When i'm working at [1] for ix86_can_change_mode_class, > > I notice there're some incorrectness/misoptimization in current RA-related > > hook. > > This patch tries to do some fix and tidy up for them: > > > > 1. We also need to guard size of TO to be > > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. > > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode > > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move > > above SSE2, so no need for the condition of AVX512FP16 for those evex > > sse registers. > > 3. Allocate DI/HImode to sse register for SSE2 above just like > > SImode since we've supported 16-bit data move between sse and gpr > > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) > > 0) or else RA will spill it. This enable optimization for > > pieces-memset-{3,37,39}.c > > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard > > size of TO. > > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE > > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to > > .. > > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. > > (VALID_SSE_REG_MODE): Add DI/HImode. > > * config/i386/mmx.md (*mov_internal): Add > > ix86_hard_reg_move_ok to condition. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pieces-memset-3.c: Remove xfail. > > * gcc.target/i386/pieces-memset-37.c: Remove xfail. > > * gcc.target/i386/pieces-memset-39.c: Remove xfail. OK. This is somehow tricky part of the compiler, so it would be nice if the patch can be split to a couple of patches to ease bisecting if something goes wrong. OTOH, recently there were a couple of similar changes in this area, and there were no problems. Thanks, Uros. > > --- > > gcc/config/i386/i386.cc | 9 ++--- > > gcc/config/i386/i386.h | 16 > > gcc/config/i386/mmx.md | 6 -- > > gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- > > gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- > > gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- > > 6 files changed, 20 insertions(+), 23 deletions(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 292b32c5e99..030c26965ab 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, > > machine_mode to, > > the vec_dupv4hi pattern. > > NB: SSE2 can load 16bit data to sse register via pinsrw. */ > >int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; > > - if (GET_MODE_SIZE (from) < mov_size) > > + if (GET_MODE_SIZE (from) < mov_size > > + || GET_MODE_SIZE (to) < mov_size) > > return false; > > } > > > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, > > machine_mode mode) > > || VALID_AVX512F_SCALAR_MODE (mode))) > > return true; > > > > - /* For AVX512FP16, vmovw supports movement of HImode > > -and HFmode between GPR and SSE registers. */ > > - if (TARGET_AVX512FP16 > > - && VALID_AVX512FP16_SCALAR_MODE (mode)) > > - return true; > > - > >/* For AVX-5124FMAPS or AVX-5124VNNIW > > allow V64SF and V64SI modes for special regnos. */ > >if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index 3869db8f2d3..d9a1fb0e420 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, > > const char **argv); > >(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) > > > > #define VALID_AVX512F_SCALAR_MODE(MODE) > > \ > > - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\ > > - || (MODE) == SFmode) > > - > > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ > > - ((MODE) == HImode || (MODE) == HFmode) > > + ((MODE) == DImode || (MODE) == DFmode > > \ > > + || (MODE) == SImode || (MODE) == SFmode \ > > + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) > > > > #define VALID_AVX512F_REG_MODE(MODE) \ > >((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ > > @@ -1045,13 +1043,15 @@
Re: [PATCH] [x86] Some tidy up for RA related hooks.
On Mon, Nov 21, 2022 at 10:13 AM liuhongt wrote: > > When i'm working at [1] for ix86_can_change_mode_class, > I notice there're some incorrectness/misoptimization in current RA-related > hook. > This patch tries to do some fix and tidy up for them: > > 1. We also need to guard size of TO to be > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move > above SSE2, so no need for the condition of AVX512FP16 for those evex > sse registers. > 3. Allocate DI/HImode to sse register for SSE2 above just like > SImode since we've supported 16-bit data move between sse and gpr > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) > 0) or else RA will spill it. This enable optimization for > pieces-memset-{3,37,39}.c > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard > size of TO. > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to > .. > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. > (VALID_SSE_REG_MODE): Add DI/HImode. > * config/i386/mmx.md (*mov_internal): Add > ix86_hard_reg_move_ok to condition. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pieces-memset-3.c: Remove xfail. > * gcc.target/i386/pieces-memset-37.c: Remove xfail. > * gcc.target/i386/pieces-memset-39.c: Remove xfail. > --- > gcc/config/i386/i386.cc | 9 ++--- > gcc/config/i386/i386.h | 16 > gcc/config/i386/mmx.md | 6 -- > gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- > gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- > gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- > 6 files changed, 20 insertions(+), 23 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 292b32c5e99..030c26965ab 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, > machine_mode to, > the vec_dupv4hi pattern. > NB: SSE2 can load 16bit data to sse register via pinsrw. */ >int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; > - if (GET_MODE_SIZE (from) < mov_size) > + if (GET_MODE_SIZE (from) < mov_size > + || GET_MODE_SIZE (to) < mov_size) > return false; > } > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, > machine_mode mode) > || VALID_AVX512F_SCALAR_MODE (mode))) > return true; > > - /* For AVX512FP16, vmovw supports movement of HImode > -and HFmode between GPR and SSE registers. */ > - if (TARGET_AVX512FP16 > - && VALID_AVX512FP16_SCALAR_MODE (mode)) > - return true; > - >/* For AVX-5124FMAPS or AVX-5124VNNIW > allow V64SF and V64SI modes for special regnos. */ >if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index 3869db8f2d3..d9a1fb0e420 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, > const char **argv); >(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) > > #define VALID_AVX512F_SCALAR_MODE(MODE) > \ > - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\ > - || (MODE) == SFmode) > - > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ > - ((MODE) == HImode || (MODE) == HFmode) > + ((MODE) == DImode || (MODE) == DFmode > \ > + || (MODE) == SImode || (MODE) == SFmode \ > + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) > > #define VALID_AVX512F_REG_MODE(MODE) \ >((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ > @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, > const char **argv); > || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \ > || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \ > || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ > - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \ > - || (MODE) == HFmode || (MODE) == BFmode) > + || (MODE) == V2DImode || (MODE) == V2QImode \ > + || (MODE) == DFmode || (MODE) ==
[PATCH] [x86] Some tidy up for RA related hooks.
When i'm working at [1] for ix86_can_change_mode_class, I notice there're some incorrectness/misoptimization in current RA-related hook. This patch tries to do some fix and tidy up for them: 1. We also need to guard size of TO to be less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move above SSE2, so no need for the condition of AVX512FP16 for those evex sse registers. 3. Allocate DI/HImode to sse register for SSE2 above just like SImode since we've supported 16-bit data move between sse and gpr above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) 0) or else RA will spill it. This enable optimization for pieces-memset-{3,37,39}.c 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_can_change_mode_class): Also guard size of TO. (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to .. (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. (VALID_SSE_REG_MODE): Add DI/HImode. * config/i386/mmx.md (*mov_internal): Add ix86_hard_reg_move_ok to condition. gcc/testsuite/ChangeLog: * gcc.target/i386/pieces-memset-3.c: Remove xfail. * gcc.target/i386/pieces-memset-37.c: Remove xfail. * gcc.target/i386/pieces-memset-39.c: Remove xfail. --- gcc/config/i386/i386.cc | 9 ++--- gcc/config/i386/i386.h | 16 gcc/config/i386/mmx.md | 6 -- gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- 6 files changed, 20 insertions(+), 23 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 292b32c5e99..030c26965ab 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, the vec_dupv4hi pattern. NB: SSE2 can load 16bit data to sse register via pinsrw. */ int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; - if (GET_MODE_SIZE (from) < mov_size) + if (GET_MODE_SIZE (from) < mov_size + || GET_MODE_SIZE (to) < mov_size) return false; } @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) || VALID_AVX512F_SCALAR_MODE (mode))) return true; - /* For AVX512FP16, vmovw supports movement of HImode -and HFmode between GPR and SSE registers. */ - if (TARGET_AVX512FP16 - && VALID_AVX512FP16_SCALAR_MODE (mode)) - return true; - /* For AVX-5124FMAPS or AVX-5124VNNIW allow V64SF and V64SI modes for special regnos. */ if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 3869db8f2d3..d9a1fb0e420 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) #define VALID_AVX512F_SCALAR_MODE(MODE) \ - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\ - || (MODE) == SFmode) - -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ - ((MODE) == HImode || (MODE) == HFmode) + ((MODE) == DImode || (MODE) == DFmode \ + || (MODE) == SImode || (MODE) == SFmode \ + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) #define VALID_AVX512F_REG_MODE(MODE) \ ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \ || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \ || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \ - || (MODE) == HFmode || (MODE) == BFmode) + || (MODE) == V2DImode || (MODE) == V2QImode \ + || (MODE) == DFmode || (MODE) == DImode \ + || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode) #define VALID_SSE_REG_MODE(MODE) \ ((MODE) == V1TImode || (MODE) == TImode \