Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
Richard Sandiford via Gcc-patches writes: > "Andre Vieira (lists)" writes: >> Hi, >> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >> to 32-bits and then only using 16-bits is a no-op. >> It does so in two steps: >> - it lets gcc know that it can access any MVE predicate mode using any >> other MVE predicate mode without needing to copy it, using the >> TARGET_MODES_TIEABLE_P hook, >> - it teaches simplify_subreg to optimize a subreg with a vector >> outermode, by replacing this outermode with a same-sized integer mode >> and trying the avalailable optimizations, then if successful it >> surrounds the result with a subreg casting it back to the original >> vector outermode. >> >> This removes the unnecessary zero-extending shown on PR 107674 (though >> it's a sign-extend there), that was introduced in gcc 11. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >> >> OK for trunk? >> >> gcc/ChangeLog: >> >> PR target/107674 >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >> (arm_modes_tieable_p): Make MVE predicate modes tieable. >> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >> simplify_subreg to simplify subregs where the outermode is not scalar. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >> zero-extend. >> >> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h >> index >> 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035 >> 100644 >> --- a/gcc/config/arm/arm.h >> +++ b/gcc/config/arm/arm.h >> @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[]; >> || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ >> || (MODE) == V2DFmode) >> >> +#define VALID_MVE_PRED_MODE(MODE) \ >> + ((MODE) == HImode \ >> + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) >> + >> #define VALID_MVE_SI_MODE(MODE) \ >>((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ >> || (MODE) == V16QImode) >> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc >> index >> 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a >> 100644 >> --- a/gcc/config/arm/arm.cc >> +++ b/gcc/config/arm/arm.cc >> @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, >> machine_mode mode) >> return false; >> >>if (IS_VPR_REGNUM (regno)) >> -return mode == HImode >> - || mode == V16BImode >> - || mode == V8BImode >> - || mode == V4BImode; >> +return VALID_MVE_PRED_MODE (mode); >> >>if (TARGET_THUMB1) >> /* For the Thumb we only allow values bigger than SImode in >> @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, >> machine_mode mode2) >>if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) >> return true; >> >> + if (TARGET_HAVE_MVE >> + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) >> +return true; >> + >>/* We specifically want to allow elements of "structure" modes to >> be tieable to the structure. This more general condition allows >> other rarer situations too. */ >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index >> 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 >> 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode >> outermode, rtx op, >> } >> } >> >> + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by >> using a >> + NEW_OUTERMODE of the same size instead, other simplifications rely on >> + integer to integer subregs and we'd potentially miss out on >> optimizations >> + otherwise. */ > > How about: > > /* If the outer mode is not integral, try taking a subreg with the > equivalent integer outer mode and then bitcasting the result. > Other simplifications rely on integer to integer subregs and we'd > potentially miss out on optimizations otherwise. */ > > (Think it's easier to read as two sentences, and there's no > new_outermode in the code.) > >> + if (known_gt (GET_MODE_SIZE (innermode), >> +GET_MODE_SIZE (outermode)) >> + && SCALAR_INT_MODE_P (innermode) >> + && !SCALAR_INT_MODE_P (outermode) >> + && int_mode_for_size (GET_MODE_BITSIZE (outermode), >> +0).exists (_outermode)) >> +{ >> + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); >> + if (tem) >> +return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); > > Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns > out to be constant.
Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
"Andre Vieira (lists)" writes: > Hi, > > This patch teaches GCC that zero-extending a MVE predicate from 16-bits > to 32-bits and then only using 16-bits is a no-op. > It does so in two steps: > - it lets gcc know that it can access any MVE predicate mode using any > other MVE predicate mode without needing to copy it, using the > TARGET_MODES_TIEABLE_P hook, > - it teaches simplify_subreg to optimize a subreg with a vector > outermode, by replacing this outermode with a same-sized integer mode > and trying the avalailable optimizations, then if successful it > surrounds the result with a subreg casting it back to the original > vector outermode. > > This removes the unnecessary zero-extending shown on PR 107674 (though > it's a sign-extend there), that was introduced in gcc 11. > > Bootstrapped on aarch64-none-linux-gnu and regression tested on > arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > > OK for trunk? > > gcc/ChangeLog: > > PR target/107674 > * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > (arm_modes_tieable_p): Make MVE predicate modes tieable. > * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > * simplify-rtx.cc (simplify_context::simplify_subreg): Teach > simplify_subreg to simplify subregs where the outermode is not scalar. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > zero-extend. > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index > 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035 > 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[]; > || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ > || (MODE) == V2DFmode) > > +#define VALID_MVE_PRED_MODE(MODE) \ > + ((MODE) == HImode \ > + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) > + > #define VALID_MVE_SI_MODE(MODE) \ >((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ > || (MODE) == V16QImode) > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index > 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a > 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, > machine_mode mode) > return false; > >if (IS_VPR_REGNUM (regno)) > -return mode == HImode > - || mode == V16BImode > - || mode == V8BImode > - || mode == V4BImode; > +return VALID_MVE_PRED_MODE (mode); > >if (TARGET_THUMB1) > /* For the Thumb we only allow values bigger than SImode in > @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, > machine_mode mode2) >if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) > return true; > > + if (TARGET_HAVE_MVE > + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) > +return true; > + >/* We specifically want to allow elements of "structure" modes to > be tieable to the structure. This more general condition allows > other rarer situations too. */ > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index > 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 > 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode > outermode, rtx op, > } > } > > + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using > a > + NEW_OUTERMODE of the same size instead, other simplifications rely on > + integer to integer subregs and we'd potentially miss out on > optimizations > + otherwise. */ How about: /* If the outer mode is not integral, try taking a subreg with the equivalent integer outer mode and then bitcasting the result. Other simplifications rely on integer to integer subregs and we'd potentially miss out on optimizations otherwise. */ (Think it's easier to read as two sentences, and there's no new_outermode in the code.) > + if (known_gt (GET_MODE_SIZE (innermode), > + GET_MODE_SIZE (outermode)) > + && SCALAR_INT_MODE_P (innermode) > + && !SCALAR_INT_MODE_P (outermode) > + && int_mode_for_size (GET_MODE_BITSIZE (outermode), > + 0).exists (_outermode)) > +{ > + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); > + if (tem) > + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns out to be constant. OK for the simplify-rtx.cc part with those changes. Thanks, Richard > +} > + >/* If OP is a vector comparison and the subreg is not changing the >
Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
Changed the testcase to be more robust (as per the discussion for the first patch). Still need the OK for the mid-end (simplify-rtx) part. Kind regards, Andre On 27/01/2023 09:59, Kyrylo Tkachov wrote: -Original Message- From: Andre Vieira (lists) Sent: Friday, January 27, 2023 9:58 AM To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org Cc: Richard Sandiford ; Richard Earnshaw ; Richard Biener Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674] On 26/01/2023 15:06, Kyrylo Tkachov wrote: Hi Andre, -Original Message- From: Andre Vieira (lists) Sent: Tuesday, January 24, 2023 1:54 PM To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford ; Richard Earnshaw ; Richard Biener ; Kyrylo Tkachov Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674] Hi, This patch teaches GCC that zero-extending a MVE predicate from 16-bits to 32-bits and then only using 16-bits is a no-op. It does so in two steps: - it lets gcc know that it can access any MVE predicate mode using any other MVE predicate mode without needing to copy it, using the TARGET_MODES_TIEABLE_P hook, - it teaches simplify_subreg to optimize a subreg with a vector outermode, by replacing this outermode with a same-sized integer mode and trying the avalailable optimizations, then if successful it surrounds the result with a subreg casting it back to the original vector outermode. This removes the unnecessary zero-extending shown on PR 107674 (though it's a sign-extend there), that was introduced in gcc 11. Bootstrapped on aarch64-none-linux-gnu and regression tested on arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. OK for trunk? gcc/ChangeLog: PR target/107674 * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. (arm_modes_tieable_p): Make MVE predicate modes tieable. * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. * simplify-rtx.cc (simplify_context::simplify_subreg): Teach simplify_subreg to simplify subregs where the outermode is not scalar. The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary zero-extend. diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 acf9af0a2155b5c5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) ** vldrb.8 q2, \[r0\] ** vldrb.8 q1, \[r1\] ** vcmp.i8 eq, q2, q1 -** vmrsr3, p0 @ movhi -** uxthr3, r3 -** vmsrp0, r3 @ movhi ** vpst ** vaddt.i8q3, q2, q1 ** vpst Ah I see, that's the testcase from patch 1/3 that I criticized :) Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? Thanks, Kyrill I could, but I would rather not. I have a patch series waiting for GCC 14 that does further improvements to this (and other VPST codegen) sequences and if I do scan for 'absence' of an instruction I have to break them up into single tests each. Also it wouldn't then fail if we start spilling the predicate directly to memory for instance. Like I mentioned in the previous patch, the sequence is unlikely to be able to change through scheduling (other than maybe the reordering of the loads through some bad luck, but I could make it robust to that). Ok, looks like it was thought through, so fine by me. Thanks, Kyrilldiff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 632728371d5cef364e47bf33bfa0faba738db871..8325e7a876e2e03f14cba07385cc5a1ddd771655 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1104,6 +1104,10 @@ extern const int arm_arch_cde_coproc_bits[]; || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ || (MODE) == V2DFmode) +#define VALID_MVE_PRED_MODE(MODE) \ + ((MODE) == HImode\ + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) + #define VALID_MVE_SI_MODE(MODE) \ ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ || (MODE) == V16QImode) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index efc48349dd3508e6790c1a9f3bba5da689a986bc..4d9d202cad1f39ba386df9d8e4277007fd960262 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -25656,10 +25656,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return false; if (IS_VPR_REGNUM (regno)) -return mode == HImode - || mode == V16BImode - || mode == V8BImode - || mode == V4BImode; +return VALID_MVE_PRED_MODE (mode); if (TARGET_THUMB1) /* For the Thumb we only
RE: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
> -Original Message- > From: Andre Vieira (lists) > Sent: Friday, January 27, 2023 9:58 AM > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Richard Earnshaw > ; Richard Biener > Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > predicates before use [PR 107674] > > > > On 26/01/2023 15:06, Kyrylo Tkachov wrote: > > Hi Andre, > > > >> -Original Message- > >> From: Andre Vieira (lists) > >> Sent: Tuesday, January 24, 2023 1:54 PM > >> To: gcc-patches@gcc.gnu.org > >> Cc: Richard Sandiford ; Richard Earnshaw > >> ; Richard Biener ; > >> Kyrylo Tkachov > >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > >> predicates before use [PR 107674] > >> > >> Hi, > >> > >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits > >> to 32-bits and then only using 16-bits is a no-op. > >> It does so in two steps: > >> - it lets gcc know that it can access any MVE predicate mode using any > >> other MVE predicate mode without needing to copy it, using the > >> TARGET_MODES_TIEABLE_P hook, > >> - it teaches simplify_subreg to optimize a subreg with a vector > >> outermode, by replacing this outermode with a same-sized integer mode > >> and trying the avalailable optimizations, then if successful it > >> surrounds the result with a subreg casting it back to the original > >> vector outermode. > >> > >> This removes the unnecessary zero-extending shown on PR 107674 > (though > >> it's a sign-extend there), that was introduced in gcc 11. > >> > >> Bootstrapped on aarch64-none-linux-gnu and regression tested on > >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > >> > >> OK for trunk? > >> > >> gcc/ChangeLog: > >> > >>PR target/107674 > >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > >> (arm_modes_tieable_p): Make MVE predicate modes tieable. > >>* config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > >>* simplify-rtx.cc (simplify_context::simplify_subreg): Teach > >>simplify_subreg to simplify subregs where the outermode is not > >> scalar. > > > > The arm changes look ok to me. We'll want a midend maintainer to have a > look at simplify-rtx.cc > > > >> > >> gcc/testsuite/ChangeLog: > >> > >>* gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > >>zero-extend. > > > > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > index > 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 > acf9af0a2155b5c5 100644 > > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) > > **vldrb.8 q2, \[r0\] > > **vldrb.8 q1, \[r1\] > > **vcmp.i8 eq, q2, q1 > > -** vmrsr3, p0 @ movhi > > -** uxthr3, r3 > > -** vmsrp0, r3 @ movhi > > **vpst > > **vaddt.i8q3, q2, q1 > > **vpst > > > > Ah I see, that's the testcase from patch 1/3 that I criticized :) > > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more > robust? > > Thanks, > > Kyrill > I could, but I would rather not. I have a patch series waiting for GCC > 14 that does further improvements to this (and other VPST codegen) > sequences and if I do scan for 'absence' of an instruction I have to > break them up into single tests each. Also it wouldn't then fail if we > start spilling the predicate directly to memory for instance. Like I > mentioned in the previous patch, the sequence is unlikely to be able to > change through scheduling (other than maybe the reordering of the loads > through some bad luck, but I could make it robust to that). Ok, looks like it was thought through, so fine by me. Thanks, Kyrill
Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
On 26/01/2023 15:06, Kyrylo Tkachov wrote: Hi Andre, -Original Message- From: Andre Vieira (lists) Sent: Tuesday, January 24, 2023 1:54 PM To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford ; Richard Earnshaw ; Richard Biener ; Kyrylo Tkachov Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674] Hi, This patch teaches GCC that zero-extending a MVE predicate from 16-bits to 32-bits and then only using 16-bits is a no-op. It does so in two steps: - it lets gcc know that it can access any MVE predicate mode using any other MVE predicate mode without needing to copy it, using the TARGET_MODES_TIEABLE_P hook, - it teaches simplify_subreg to optimize a subreg with a vector outermode, by replacing this outermode with a same-sized integer mode and trying the avalailable optimizations, then if successful it surrounds the result with a subreg casting it back to the original vector outermode. This removes the unnecessary zero-extending shown on PR 107674 (though it's a sign-extend there), that was introduced in gcc 11. Bootstrapped on aarch64-none-linux-gnu and regression tested on arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. OK for trunk? gcc/ChangeLog: PR target/107674 * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. (arm_modes_tieable_p): Make MVE predicate modes tieable. * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. * simplify-rtx.cc (simplify_context::simplify_subreg): Teach simplify_subreg to simplify subregs where the outermode is not scalar. The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary zero-extend. diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) **vldrb.8 q2, \[r0\] **vldrb.8 q1, \[r1\] **vcmp.i8 eq, q2, q1 -** vmrsr3, p0 @ movhi -** uxthr3, r3 -** vmsrp0, r3 @ movhi **vpst **vaddt.i8q3, q2, q1 **vpst Ah I see, that's the testcase from patch 1/3 that I criticized :) Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? Thanks, Kyrill I could, but I would rather not. I have a patch series waiting for GCC 14 that does further improvements to this (and other VPST codegen) sequences and if I do scan for 'absence' of an instruction I have to break them up into single tests each. Also it wouldn't then fail if we start spilling the predicate directly to memory for instance. Like I mentioned in the previous patch, the sequence is unlikely to be able to change through scheduling (other than maybe the reordering of the loads through some bad luck, but I could make it robust to that).
RE: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
Hi Andre, > -Original Message- > From: Andre Vieira (lists) > Sent: Tuesday, January 24, 2023 1:54 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Richard Earnshaw > ; Richard Biener ; > Kyrylo Tkachov > Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > predicates before use [PR 107674] > > Hi, > > This patch teaches GCC that zero-extending a MVE predicate from 16-bits > to 32-bits and then only using 16-bits is a no-op. > It does so in two steps: > - it lets gcc know that it can access any MVE predicate mode using any > other MVE predicate mode without needing to copy it, using the > TARGET_MODES_TIEABLE_P hook, > - it teaches simplify_subreg to optimize a subreg with a vector > outermode, by replacing this outermode with a same-sized integer mode > and trying the avalailable optimizations, then if successful it > surrounds the result with a subreg casting it back to the original > vector outermode. > > This removes the unnecessary zero-extending shown on PR 107674 (though > it's a sign-extend there), that was introduced in gcc 11. > > Bootstrapped on aarch64-none-linux-gnu and regression tested on > arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > > OK for trunk? > > gcc/ChangeLog: > > PR target/107674 > * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > (arm_modes_tieable_p): Make MVE predicate modes tieable. > * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > * simplify-rtx.cc (simplify_context::simplify_subreg): Teach > simplify_subreg to simplify subregs where the outermode is not > scalar. The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > zero-extend. diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) ** vldrb.8 q2, \[r0\] ** vldrb.8 q1, \[r1\] ** vcmp.i8 eq, q2, q1 -** vmrsr3, p0 @ movhi -** uxthr3, r3 -** vmsrp0, r3 @ movhi ** vpst ** vaddt.i8q3, q2, q1 ** vpst Ah I see, that's the testcase from patch 1/3 that I criticized :) Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? Thanks, Kyrill