Re: [PATCH] [x86] Some tidy up for RA related hooks.

2022-11-21 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 21, 2022 at 3:17 PM Uros Bizjak  wrote:
>
> On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu  wrote:
> >
> > On Mon, Nov 21, 2022 at 10:13 AM liuhongt  wrote:
> > >
> > > When i'm working at [1] for ix86_can_change_mode_class,
> > > I notice there're some incorrectness/misoptimization in current 
> > > RA-related hook.
> > > This patch tries to do some fix and tidy up for them:
> > >
> > > 1. We also need to guard size of TO to be
> > > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> > > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> > > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> > > above SSE2, so no need for the condition of AVX512FP16 for those evex
> > > sse registers.
> > > 3. Allocate DI/HImode to sse register for SSE2 above just like
> > > SImode since we've supported 16-bit data move between sse and gpr
> > > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> > > 0) or else RA will spill it. This enable optimization for
> > > pieces-memset-{3,37,39}.c
> > > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> > > size of TO.
> > > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> > > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> > > ..
> > > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> > > (VALID_SSE_REG_MODE): Add DI/HImode.
> > > * config/i386/mmx.md (*mov_internal): Add
> > > ix86_hard_reg_move_ok to condition.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> > > * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> > > * gcc.target/i386/pieces-memset-39.c: Remove xfail.
>
> OK.
>
> This is somehow tricky part of the compiler, so it would be nice if
> the patch can be split to a couple of patches to ease bisecting if
> something goes wrong. OTOH, recently there were a couple of similar
> changes in this area, and there were no problems.
Ok, I'll separate ix86_hard_reg_move_ok part into a separate patch.
>
> Thanks,
> Uros.
>
> > > ---
> > >  gcc/config/i386/i386.cc  |  9 ++---
> > >  gcc/config/i386/i386.h   | 16 
> > >  gcc/config/i386/mmx.md   |  6 --
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
> > >  6 files changed, 20 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 292b32c5e99..030c26965ab 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, 
> > > machine_mode to,
> > >  the vec_dupv4hi pattern.
> > >  NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> > >int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> > > -  if (GET_MODE_SIZE (from) < mov_size)
> > > +  if (GET_MODE_SIZE (from) < mov_size
> > > + || GET_MODE_SIZE (to) < mov_size)
> > > return false;
> > >  }
> > >
> > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
> > > machine_mode mode)
> > >   || VALID_AVX512F_SCALAR_MODE (mode)))
> > > return true;
> > >
> > > -  /* For AVX512FP16, vmovw supports movement of HImode
> > > -and HFmode between GPR and SSE registers.  */
> > > -  if (TARGET_AVX512FP16
> > > - && VALID_AVX512FP16_SCALAR_MODE (mode))
> > > -   return true;
> > > -
> > >/* For AVX-5124FMAPS or AVX-5124VNNIW
> > >  allow V64SF and V64SI modes for special regnos.  */
> > >if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > > index 3869db8f2d3..d9a1fb0e420 100644
> > > --- a/gcc/config/i386/i386.h
> > > +++ b/gcc/config/i386/i386.h
> > > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int 
> > > argc, const char **argv);
> > >(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> > >
> > >  #define VALID_AVX512F_SCALAR_MODE(MODE)  
> > >   \
> > > -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\
> > > -   || (MODE) == SFmode)
> > > -
> > > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \
> > > -  ((MODE) == HImode || (MODE) == HFmode)
> > > +  ((MODE) == DImode || (MODE) == DFmode  
> > >   \
> 

Re: [PATCH] [x86] Some tidy up for RA related hooks.

2022-11-20 Thread Uros Bizjak via Gcc-patches
On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu  wrote:
>
> On Mon, Nov 21, 2022 at 10:13 AM liuhongt  wrote:
> >
> > When i'm working at [1] for ix86_can_change_mode_class,
> > I notice there're some incorrectness/misoptimization in current RA-related 
> > hook.
> > This patch tries to do some fix and tidy up for them:
> >
> > 1. We also need to guard size of TO to be
> > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> > above SSE2, so no need for the condition of AVX512FP16 for those evex
> > sse registers.
> > 3. Allocate DI/HImode to sse register for SSE2 above just like
> > SImode since we've supported 16-bit data move between sse and gpr
> > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> > 0) or else RA will spill it. This enable optimization for
> > pieces-memset-{3,37,39}.c
> > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> > size of TO.
> > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> > ..
> > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> > (VALID_SSE_REG_MODE): Add DI/HImode.
> > * config/i386/mmx.md (*mov_internal): Add
> > ix86_hard_reg_move_ok to condition.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> > * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> > * gcc.target/i386/pieces-memset-39.c: Remove xfail.

OK.

This is somehow tricky part of the compiler, so it would be nice if
the patch can be split to a couple of patches to ease bisecting if
something goes wrong. OTOH, recently there were a couple of similar
changes in this area, and there were no problems.

Thanks,
Uros.

> > ---
> >  gcc/config/i386/i386.cc  |  9 ++---
> >  gcc/config/i386/i386.h   | 16 
> >  gcc/config/i386/mmx.md   |  6 --
> >  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
> >  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
> >  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
> >  6 files changed, 20 insertions(+), 23 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 292b32c5e99..030c26965ab 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, 
> > machine_mode to,
> >  the vec_dupv4hi pattern.
> >  NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> >int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> > -  if (GET_MODE_SIZE (from) < mov_size)
> > +  if (GET_MODE_SIZE (from) < mov_size
> > + || GET_MODE_SIZE (to) < mov_size)
> > return false;
> >  }
> >
> > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
> > machine_mode mode)
> >   || VALID_AVX512F_SCALAR_MODE (mode)))
> > return true;
> >
> > -  /* For AVX512FP16, vmovw supports movement of HImode
> > -and HFmode between GPR and SSE registers.  */
> > -  if (TARGET_AVX512FP16
> > - && VALID_AVX512FP16_SCALAR_MODE (mode))
> > -   return true;
> > -
> >/* For AVX-5124FMAPS or AVX-5124VNNIW
> >  allow V64SF and V64SI modes for special regnos.  */
> >if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 3869db8f2d3..d9a1fb0e420 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, 
> > const char **argv);
> >(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> >
> >  #define VALID_AVX512F_SCALAR_MODE(MODE)
> > \
> > -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\
> > -   || (MODE) == SFmode)
> > -
> > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \
> > -  ((MODE) == HImode || (MODE) == HFmode)
> > +  ((MODE) == DImode || (MODE) == DFmode
> > \
> > +   || (MODE) == SImode || (MODE) == SFmode \
> > +   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
> >
> >  #define VALID_AVX512F_REG_MODE(MODE)   \
> >((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \
> > @@ -1045,13 +1043,15 @@ 

Re: [PATCH] [x86] Some tidy up for RA related hooks.

2022-11-20 Thread Hongtao Liu via Gcc-patches
On Mon, Nov 21, 2022 at 10:13 AM liuhongt  wrote:
>
> When i'm working at [1] for ix86_can_change_mode_class,
> I notice there're some incorrectness/misoptimization in current RA-related 
> hook.
> This patch tries to do some fix and tidy up for them:
>
> 1. We also need to guard size of TO to be
> less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> above SSE2, so no need for the condition of AVX512FP16 for those evex
> sse registers.
> 3. Allocate DI/HImode to sse register for SSE2 above just like
> SImode since we've supported 16-bit data move between sse and gpr
> above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> 0) or else RA will spill it. This enable optimization for
> pieces-memset-{3,37,39}.c
> 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> size of TO.
> (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> ..
> (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> (VALID_SSE_REG_MODE): Add DI/HImode.
> * config/i386/mmx.md (*mov_internal): Add
> ix86_hard_reg_move_ok to condition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> * gcc.target/i386/pieces-memset-39.c: Remove xfail.
> ---
>  gcc/config/i386/i386.cc  |  9 ++---
>  gcc/config/i386/i386.h   | 16 
>  gcc/config/i386/mmx.md   |  6 --
>  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
>  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
>  6 files changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 292b32c5e99..030c26965ab 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>  the vec_dupv4hi pattern.
>  NB: SSE2 can load 16bit data to sse register via pinsrw.  */
>int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> -  if (GET_MODE_SIZE (from) < mov_size)
> +  if (GET_MODE_SIZE (from) < mov_size
> + || GET_MODE_SIZE (to) < mov_size)
> return false;
>  }
>
> @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>   || VALID_AVX512F_SCALAR_MODE (mode)))
> return true;
>
> -  /* For AVX512FP16, vmovw supports movement of HImode
> -and HFmode between GPR and SSE registers.  */
> -  if (TARGET_AVX512FP16
> - && VALID_AVX512FP16_SCALAR_MODE (mode))
> -   return true;
> -
>/* For AVX-5124FMAPS or AVX-5124VNNIW
>  allow V64SF and V64SI modes for special regnos.  */
>if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 3869db8f2d3..d9a1fb0e420 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>(VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)  
>   \
> -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\
> -   || (MODE) == SFmode)
> -
> -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \
> -  ((MODE) == HImode || (MODE) == HFmode)
> +  ((MODE) == DImode || (MODE) == DFmode  
>   \
> +   || (MODE) == SImode || (MODE) == SFmode \
> +   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
>
>  #define VALID_AVX512F_REG_MODE(MODE)   \
>((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \
> @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
> || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
> || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode   \
> || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> -   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \
> -   || (MODE) == HFmode || (MODE) == BFmode)
> +   || (MODE) == V2DImode || (MODE) == V2QImode \
> +   || (MODE) == DFmode || (MODE) == 

[PATCH] [x86] Some tidy up for RA related hooks.

2022-11-20 Thread liuhongt via Gcc-patches
When i'm working at [1] for ix86_can_change_mode_class,
I notice there're some incorrectness/misoptimization in current RA-related hook.
This patch tries to do some fix and tidy up for them:

1. We also need to guard size of TO to be
less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
above SSE2, so no need for the condition of AVX512FP16 for those evex
sse registers.
3. Allocate DI/HImode to sse register for SSE2 above just like
SImode since we've supported 16-bit data move between sse and gpr
above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
0) or else RA will spill it. This enable optimization for
pieces-memset-{3,37,39}.c
4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.cc (ix86_can_change_mode_class): Also guard
size of TO.
(ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
* config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
..
(VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
(VALID_SSE_REG_MODE): Add DI/HImode.
* config/i386/mmx.md (*mov_internal): Add
ix86_hard_reg_move_ok to condition.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pieces-memset-3.c: Remove xfail.
* gcc.target/i386/pieces-memset-37.c: Remove xfail.
* gcc.target/i386/pieces-memset-39.c: Remove xfail.
---
 gcc/config/i386/i386.cc  |  9 ++---
 gcc/config/i386/i386.h   | 16 
 gcc/config/i386/mmx.md   |  6 --
 gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
 gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
 6 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 292b32c5e99..030c26965ab 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 the vec_dupv4hi pattern.
 NB: SSE2 can load 16bit data to sse register via pinsrw.  */
   int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
-  if (GET_MODE_SIZE (from) < mov_size)
+  if (GET_MODE_SIZE (from) < mov_size
+ || GET_MODE_SIZE (to) < mov_size)
return false;
 }
 
@@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
  || VALID_AVX512F_SCALAR_MODE (mode)))
return true;
 
-  /* For AVX512FP16, vmovw supports movement of HImode
-and HFmode between GPR and SSE registers.  */
-  if (TARGET_AVX512FP16
- && VALID_AVX512FP16_SCALAR_MODE (mode))
-   return true;
-
   /* For AVX-5124FMAPS or AVX-5124VNNIW
 allow V64SF and V64SI modes for special regnos.  */
   if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3869db8f2d3..d9a1fb0e420 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
   (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)
\
-  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode\
-   || (MODE) == SFmode)
-
-#define VALID_AVX512FP16_SCALAR_MODE(MODE) \
-  ((MODE) == HImode || (MODE) == HFmode)
+  ((MODE) == DImode || (MODE) == DFmode
\
+   || (MODE) == SImode || (MODE) == SFmode \
+   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
 
 #define VALID_AVX512F_REG_MODE(MODE)   \
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \
@@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
|| (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
|| (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode   \
|| (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
-   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \
-   || (MODE) == HFmode || (MODE) == BFmode)
+   || (MODE) == V2DImode || (MODE) == V2QImode \
+   || (MODE) == DFmode || (MODE) == DImode \
+   || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
 
 #define VALID_SSE_REG_MODE(MODE)   \
   ((MODE) == V1TImode || (MODE) == TImode  \