Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-30 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> "Andre Vieira (lists)"  writes:
>> Hi,
>>
>> This patch teaches GCC that zero-extending a MVE predicate from 16-bits 
>> to 32-bits and then only using 16-bits is a no-op.
>> It does so in two steps:
>> - it lets gcc know that it can access any MVE predicate mode using any 
>> other MVE predicate mode without needing to copy it, using the 
>> TARGET_MODES_TIEABLE_P hook,
>> - it teaches simplify_subreg to optimize a subreg with a vector 
>> outermode, by replacing this outermode with a same-sized integer mode 
>> and trying the avalailable optimizations, then if successful it 
>> surrounds the result with a subreg casting it back to the original 
>> vector outermode.
>>
>> This removes the unnecessary zero-extending shown on PR 107674 (though 
>> it's a sign-extend there), that was introduced in gcc 11.
>>
>> Bootstrapped on aarch64-none-linux-gnu and regression tested on 
>> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.
>>
>> OK for trunk?
>>
>> gcc/ChangeLog:
>>
>>  PR target/107674
>>  * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
>>  (arm_modes_tieable_p): Make MVE predicate modes tieable.
>>  * config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
>>  * simplify-rtx.cc (simplify_context::simplify_subreg): Teach
>>  simplify_subreg to simplify subregs where the outermode is not scalar.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
>>  zero-extend.
>>
>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index 
>> 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035
>>  100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[];
>> || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
>> || (MODE) == V2DFmode)
>>  
>> +#define VALID_MVE_PRED_MODE(MODE) \
>> +  ((MODE) == HImode \
>> +   || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode)
>> +
>>  #define VALID_MVE_SI_MODE(MODE) \
>>((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
>> || (MODE) == V16QImode)
>> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
>> index 
>> 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a
>>  100644
>> --- a/gcc/config/arm/arm.cc
>> +++ b/gcc/config/arm/arm.cc
>> @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 
>> machine_mode mode)
>>  return false;
>>  
>>if (IS_VPR_REGNUM (regno))
>> -return mode == HImode
>> -  || mode == V16BImode
>> -  || mode == V8BImode
>> -  || mode == V4BImode;
>> +return VALID_MVE_PRED_MODE (mode);
>>  
>>if (TARGET_THUMB1)
>>  /* For the Thumb we only allow values bigger than SImode in
>> @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, 
>> machine_mode mode2)
>>if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2))
>>  return true;
>>  
>> +  if (TARGET_HAVE_MVE
>> +  && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2)))
>> +return true;
>> +
>>/* We specifically want to allow elements of "structure" modes to
>>   be tieable to the structure.  This more general condition allows
>>   other rarer situations too.  */
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index 
>> 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42
>>  100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode 
>> outermode, rtx op,
>>  }
>>  }
>>  
>> +  /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by 
>> using a
>> + NEW_OUTERMODE of the same size instead, other simplifications rely on
>> + integer to integer subregs and we'd potentially miss out on 
>> optimizations
>> + otherwise.  */
>
> How about:
>
>   /* If the outer mode is not integral, try taking a subreg with the
>  equivalent integer outer mode and then bitcasting the result.
>  Other simplifications rely on integer to integer subregs and we'd
>  potentially miss out on optimizations otherwise.  */
>
> (Think it's easier to read as two sentences, and there's no
> new_outermode in the code.)
>
>> +  if (known_gt (GET_MODE_SIZE (innermode),
>> +GET_MODE_SIZE (outermode))
>> +  && SCALAR_INT_MODE_P (innermode)
>> +  && !SCALAR_INT_MODE_P (outermode)
>> +  && int_mode_for_size (GET_MODE_BITSIZE (outermode),
>> +0).exists (_outermode))
>> +{
>> +  rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
>> +  if (tem)
>> +return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte);
>
> Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns
> out to be constant.


Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-30 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch teaches GCC that zero-extending a MVE predicate from 16-bits 
> to 32-bits and then only using 16-bits is a no-op.
> It does so in two steps:
> - it lets gcc know that it can access any MVE predicate mode using any 
> other MVE predicate mode without needing to copy it, using the 
> TARGET_MODES_TIEABLE_P hook,
> - it teaches simplify_subreg to optimize a subreg with a vector 
> outermode, by replacing this outermode with a same-sized integer mode 
> and trying the avalailable optimizations, then if successful it 
> surrounds the result with a subreg casting it back to the original 
> vector outermode.
>
> This removes the unnecessary zero-extending shown on PR 107674 (though 
> it's a sign-extend there), that was introduced in gcc 11.
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested on 
> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
>   PR target/107674
>  * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
>  (arm_modes_tieable_p): Make MVE predicate modes tieable.
>   * config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
>   * simplify-rtx.cc (simplify_context::simplify_subreg): Teach
>   simplify_subreg to simplify subregs where the outermode is not scalar.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
>   zero-extend.
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035
>  100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[];
> || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
> || (MODE) == V2DFmode)
>  
> +#define VALID_MVE_PRED_MODE(MODE) \
> +  ((MODE) == HImode  \
> +   || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode)
> +
>  #define VALID_MVE_SI_MODE(MODE) \
>((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
> || (MODE) == V16QImode)
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>  return false;
>  
>if (IS_VPR_REGNUM (regno))
> -return mode == HImode
> -  || mode == V16BImode
> -  || mode == V8BImode
> -  || mode == V4BImode;
> +return VALID_MVE_PRED_MODE (mode);
>  
>if (TARGET_THUMB1)
>  /* For the Thumb we only allow values bigger than SImode in
> @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, 
> machine_mode mode2)
>if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2))
>  return true;
>  
> +  if (TARGET_HAVE_MVE
> +  && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2)))
> +return true;
> +
>/* We specifically want to allow elements of "structure" modes to
>   be tieable to the structure.  This more general condition allows
>   other rarer situations too.  */
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 
> 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42
>  100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode 
> outermode, rtx op,
>   }
>  }
>  
> +  /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using 
> a
> + NEW_OUTERMODE of the same size instead, other simplifications rely on
> + integer to integer subregs and we'd potentially miss out on 
> optimizations
> + otherwise.  */

How about:

  /* If the outer mode is not integral, try taking a subreg with the
 equivalent integer outer mode and then bitcasting the result.
 Other simplifications rely on integer to integer subregs and we'd
 potentially miss out on optimizations otherwise.  */

(Think it's easier to read as two sentences, and there's no
new_outermode in the code.)

> +  if (known_gt (GET_MODE_SIZE (innermode),
> + GET_MODE_SIZE (outermode))
> +  && SCALAR_INT_MODE_P (innermode)
> +  && !SCALAR_INT_MODE_P (outermode)
> +  && int_mode_for_size (GET_MODE_BITSIZE (outermode),
> + 0).exists (_outermode))
> +{
> +  rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
> +  if (tem)
> + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte);

Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns
out to be constant.

OK for the simplify-rtx.cc part with those changes.

Thanks,
Richard


> +}
> +
>/* If OP is a vector comparison and the subreg is not changing the
>   

Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-30 Thread Andre Vieira (lists) via Gcc-patches
Changed the testcase to be more robust (as per the discussion for the 
first patch).


Still need the OK for the mid-end (simplify-rtx) part.

Kind regards,
Andre

On 27/01/2023 09:59, Kyrylo Tkachov wrote:




-Original Message-
From: Andre Vieira (lists) 
Sent: Friday, January 27, 2023 9:58 AM
To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
Cc: Richard Sandiford ; Richard Earnshaw
; Richard Biener 
Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
predicates before use [PR 107674]



On 26/01/2023 15:06, Kyrylo Tkachov wrote:

Hi Andre,


-Original Message-
From: Andre Vieira (lists) 
Sent: Tuesday, January 24, 2023 1:54 PM
To: gcc-patches@gcc.gnu.org
Cc: Richard Sandiford ; Richard Earnshaw
; Richard Biener ;
Kyrylo Tkachov 
Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
predicates before use [PR 107674]

Hi,

This patch teaches GCC that zero-extending a MVE predicate from 16-bits
to 32-bits and then only using 16-bits is a no-op.
It does so in two steps:
- it lets gcc know that it can access any MVE predicate mode using any
other MVE predicate mode without needing to copy it, using the
TARGET_MODES_TIEABLE_P hook,
- it teaches simplify_subreg to optimize a subreg with a vector
outermode, by replacing this outermode with a same-sized integer mode
and trying the avalailable optimizations, then if successful it
surrounds the result with a subreg casting it back to the original
vector outermode.

This removes the unnecessary zero-extending shown on PR 107674

(though

it's a sign-extend there), that was introduced in gcc 11.

Bootstrapped on aarch64-none-linux-gnu and regression tested on
arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.

OK for trunk?

gcc/ChangeLog:

PR target/107674
   * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
   (arm_modes_tieable_p): Make MVE predicate modes tieable.
* config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
* simplify-rtx.cc (simplify_context::simplify_subreg): Teach
simplify_subreg to simplify subregs where the outermode is not
scalar.


The arm changes look ok to me. We'll want a midend maintainer to have a

look at simplify-rtx.cc




gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
zero-extend.


diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c

b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c

index

26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5
acf9af0a2155b5c5 100644

--- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
@@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
   **   vldrb.8 q2, \[r0\]
   **   vldrb.8 q1, \[r1\]
   **   vcmp.i8 eq, q2, q1
-** vmrsr3, p0  @ movhi
-** uxthr3, r3
-** vmsrp0, r3  @ movhi
   **   vpst
   **   vaddt.i8q3, q2, q1
   **   vpst

Ah I see, that's the testcase from patch 1/3 that I criticized :)
Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more

robust?

Thanks,
Kyrill

I could, but I would rather not. I have a patch series waiting for GCC
14 that does further improvements to this (and other VPST codegen)
sequences and if I do scan for 'absence' of an instruction I have to
break them up into single tests each. Also it wouldn't then fail if we
start spilling the predicate directly to memory for instance. Like I
mentioned in the previous patch, the sequence is unlikely to be able to
change through scheduling (other than maybe the reordering of the loads
through some bad luck, but I could make it robust to that).


Ok, looks like it was thought through, so fine by me.
Thanks,
Kyrilldiff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
632728371d5cef364e47bf33bfa0faba738db871..8325e7a876e2e03f14cba07385cc5a1ddd771655
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1104,6 +1104,10 @@ extern const int arm_arch_cde_coproc_bits[];
|| (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
|| (MODE) == V2DFmode)
 
+#define VALID_MVE_PRED_MODE(MODE) \
+  ((MODE) == HImode\
+   || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode)
+
 #define VALID_MVE_SI_MODE(MODE) \
   ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
|| (MODE) == V16QImode)
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
efc48349dd3508e6790c1a9f3bba5da689a986bc..4d9d202cad1f39ba386df9d8e4277007fd960262
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -25656,10 +25656,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 return false;
 
   if (IS_VPR_REGNUM (regno))
-return mode == HImode
-  || mode == V16BImode
-  || mode == V8BImode
-  || mode == V4BImode;
+return VALID_MVE_PRED_MODE (mode);
 
   if (TARGET_THUMB1)
 /* For the Thumb we only

RE: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-27 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Friday, January 27, 2023 9:58 AM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Richard Biener 
> Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
> predicates before use [PR 107674]
> 
> 
> 
> On 26/01/2023 15:06, Kyrylo Tkachov wrote:
> > Hi Andre,
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: Tuesday, January 24, 2023 1:54 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Richard Sandiford ; Richard Earnshaw
> >> ; Richard Biener ;
> >> Kyrylo Tkachov 
> >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
> >> predicates before use [PR 107674]
> >>
> >> Hi,
> >>
> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits
> >> to 32-bits and then only using 16-bits is a no-op.
> >> It does so in two steps:
> >> - it lets gcc know that it can access any MVE predicate mode using any
> >> other MVE predicate mode without needing to copy it, using the
> >> TARGET_MODES_TIEABLE_P hook,
> >> - it teaches simplify_subreg to optimize a subreg with a vector
> >> outermode, by replacing this outermode with a same-sized integer mode
> >> and trying the avalailable optimizations, then if successful it
> >> surrounds the result with a subreg casting it back to the original
> >> vector outermode.
> >>
> >> This removes the unnecessary zero-extending shown on PR 107674
> (though
> >> it's a sign-extend there), that was introduced in gcc 11.
> >>
> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.
> >>
> >> OK for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >>PR target/107674
> >>   * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
> >>   (arm_modes_tieable_p): Make MVE predicate modes tieable.
> >>* config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
> >>* simplify-rtx.cc (simplify_context::simplify_subreg): Teach
> >>simplify_subreg to simplify subregs where the outermode is not
> >> scalar.
> >
> > The arm changes look ok to me. We'll want a midend maintainer to have a
> look at simplify-rtx.cc
> >
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
> >>zero-extend.
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > index
> 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5
> acf9af0a2155b5c5 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
> > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
> >   **vldrb.8 q2, \[r0\]
> >   **vldrb.8 q1, \[r1\]
> >   **vcmp.i8 eq, q2, q1
> > -** vmrsr3, p0  @ movhi
> > -** uxthr3, r3
> > -** vmsrp0, r3  @ movhi
> >   **vpst
> >   **vaddt.i8q3, q2, q1
> >   **vpst
> >
> > Ah I see, that's the testcase from patch 1/3 that I criticized :)
> > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more
> robust?
> > Thanks,
> > Kyrill
> I could, but I would rather not. I have a patch series waiting for GCC
> 14 that does further improvements to this (and other VPST codegen)
> sequences and if I do scan for 'absence' of an instruction I have to
> break them up into single tests each. Also it wouldn't then fail if we
> start spilling the predicate directly to memory for instance. Like I
> mentioned in the previous patch, the sequence is unlikely to be able to
> change through scheduling (other than maybe the reordering of the loads
> through some bad luck, but I could make it robust to that).

Ok, looks like it was thought through, so fine by me.
Thanks,
Kyrill


Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-27 Thread Andre Vieira (lists) via Gcc-patches




On 26/01/2023 15:06, Kyrylo Tkachov wrote:

Hi Andre,


-Original Message-
From: Andre Vieira (lists) 
Sent: Tuesday, January 24, 2023 1:54 PM
To: gcc-patches@gcc.gnu.org
Cc: Richard Sandiford ; Richard Earnshaw
; Richard Biener ;
Kyrylo Tkachov 
Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
predicates before use [PR 107674]

Hi,

This patch teaches GCC that zero-extending a MVE predicate from 16-bits
to 32-bits and then only using 16-bits is a no-op.
It does so in two steps:
- it lets gcc know that it can access any MVE predicate mode using any
other MVE predicate mode without needing to copy it, using the
TARGET_MODES_TIEABLE_P hook,
- it teaches simplify_subreg to optimize a subreg with a vector
outermode, by replacing this outermode with a same-sized integer mode
and trying the avalailable optimizations, then if successful it
surrounds the result with a subreg casting it back to the original
vector outermode.

This removes the unnecessary zero-extending shown on PR 107674 (though
it's a sign-extend there), that was introduced in gcc 11.

Bootstrapped on aarch64-none-linux-gnu and regression tested on
arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.

OK for trunk?

gcc/ChangeLog:

PR target/107674
  * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
  (arm_modes_tieable_p): Make MVE predicate modes tieable.
* config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
* simplify-rtx.cc (simplify_context::simplify_subreg): Teach
simplify_subreg to simplify subregs where the outermode is not
scalar.


The arm changes look ok to me. We'll want a midend maintainer to have a look at 
simplify-rtx.cc



gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
zero-extend.


diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
index 
26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5
 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
@@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
  **vldrb.8 q2, \[r0\]
  **vldrb.8 q1, \[r1\]
  **vcmp.i8 eq, q2, q1
-** vmrsr3, p0  @ movhi
-** uxthr3, r3
-** vmsrp0, r3  @ movhi
  **vpst
  **vaddt.i8q3, q2, q1
  **vpst

Ah I see, that's the testcase from patch 1/3 that I criticized :)
Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more 
robust?
Thanks,
Kyrill
I could, but I would rather not. I have a patch series waiting for GCC 
14 that does further improvements to this (and other VPST codegen) 
sequences and if I do scan for 'absence' of an instruction I have to 
break them up into single tests each. Also it wouldn't then fail if we 
start spilling the predicate directly to memory for instance. Like I 
mentioned in the previous patch, the sequence is unlikely to be able to 
change through scheduling (other than maybe the reordering of the loads 
through some bad luck, but I could make it robust to that).


RE: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]

2023-01-26 Thread Kyrylo Tkachov via Gcc-patches
Hi Andre,

> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Tuesday, January 24, 2023 1:54 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Richard Biener ;
> Kyrylo Tkachov 
> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE
> predicates before use [PR 107674]
> 
> Hi,
> 
> This patch teaches GCC that zero-extending a MVE predicate from 16-bits
> to 32-bits and then only using 16-bits is a no-op.
> It does so in two steps:
> - it lets gcc know that it can access any MVE predicate mode using any
> other MVE predicate mode without needing to copy it, using the
> TARGET_MODES_TIEABLE_P hook,
> - it teaches simplify_subreg to optimize a subreg with a vector
> outermode, by replacing this outermode with a same-sized integer mode
> and trying the avalailable optimizations, then if successful it
> surrounds the result with a subreg casting it back to the original
> vector outermode.
> 
> This removes the unnecessary zero-extending shown on PR 107674 (though
> it's a sign-extend there), that was introduced in gcc 11.
> 
> Bootstrapped on aarch64-none-linux-gnu and regression tested on
> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> 
>   PR target/107674
>  * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
>  (arm_modes_tieable_p): Make MVE predicate modes tieable.
>   * config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
>   * simplify-rtx.cc (simplify_context::simplify_subreg): Teach
>   simplify_subreg to simplify subregs where the outermode is not
> scalar.

The arm changes look ok to me. We'll want a midend maintainer to have a look at 
simplify-rtx.cc

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary
>   zero-extend.

diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
index 
26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5
 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c
@@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c)
 ** vldrb.8 q2, \[r0\]
 ** vldrb.8 q1, \[r1\]
 ** vcmp.i8 eq, q2, q1
-** vmrsr3, p0  @ movhi
-** uxthr3, r3
-** vmsrp0, r3  @ movhi
 ** vpst
 ** vaddt.i8q3, q2, q1
 ** vpst

Ah I see, that's the testcase from patch 1/3 that I criticized :)
Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more 
robust?
Thanks,
Kyrill