Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
On Sat 6. Apr 2024 at 06:52, Jeff Law wrote: > > > On 3/27/24 4:55 AM, Philipp Tomsich wrote: > > Jeff, > > > > just a heads-up that that trunk (i.e., the soon-to-be GCC14) still > > generates the suboptimal sequence: > >https://godbolt.org/z/K9YYEPsvY > Realistically it's too late to get this into gcc-14. I didn’t expect this for 14, but wanted to make sure we didn’t forget about it once the branch for 15 opens up. Thanks, Philipp. >
Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
On 3/27/24 4:55 AM, Philipp Tomsich wrote: Jeff, just a heads-up that that trunk (i.e., the soon-to-be GCC14) still generates the suboptimal sequence: https://godbolt.org/z/K9YYEPsvY Realistically it's too late to get this into gcc-14. Jeff
Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
Jeff, just a heads-up that that trunk (i.e., the soon-to-be GCC14) still generates the suboptimal sequence: https://godbolt.org/z/K9YYEPsvY Thanks, Philipp. On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich wrote: > > On Sun, 20 Nov 2022 at 17:38, Jeff Law wrote: > > > > > > On 11/9/22 16:10, Philipp Tomsich wrote: > > > The current method of treating shifts of extended values on RISC-V > > > frequently causes sequences of 3 shifts, despite the presence of the > > > 'zero_extendsidi2_shifted' pattern. > > > > > > Consider: > > > unsigned long f(unsigned int a, unsigned long b) > > > { > > > a = a << 1; > > > unsigned long c = (unsigned long) a; > > > c = b + (c<<4); > > > return c; > > > } > > > which will present at combine-time as: > > > Trying 7, 8 -> 9: > > > 7: r78:SI=r81:DI#0<<0x1 > > >REG_DEAD r81:DI > > > 8: r79:DI=zero_extend(r78:SI) > > >REG_DEAD r78:SI > > > 9: r72:DI=r79:DI<<0x4 > > >REG_DEAD r79:DI > > > Failed to match this instruction: > > > (set (reg:DI 72 [ _1 ]) > > > (and:DI (ashift:DI (reg:DI 81) > > > (const_int 5 [0x5])) > > > (const_int 68719476704 [0xfffe0]))) > > > and produce the following (optimized) assembly: > > > f: > > > slliw a5,a0,1 > > > sllia5,a5,32 > > > srlia5,a5,28 > > > add a0,a5,a1 > > > ret > > > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > > doesn't apply for two reasons: > > > - this is seen before reload, and > > > - (more importantly) the constant mask is not 0xul. > > > > > > To address this, we introduce a generalized version of shifting > > > zero-extended values that supports any mask of consecutive ones as > > > long as the number of training zeros is the inner shift-amount. > > > > > > With this new split, we generate the following assembly for the > > > aforementioned function: > > > f: > > > sllia0,a0,33 > > > srlia0,a0,28 > > > add a0,a0,a1 > > > ret > > > > > > Unfortunately, all of this causes some fallout (especially in how it > > > interacts with Zb* extensions and zero_extract expressions formed > > > during combine): this is addressed through additional instruction > > > splitting and handling of zero_extract. > > > > > > gcc/ChangeLog: > > > > > > * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed > > > as an and:DI. > > > (*andi_add.uw): New pattern. > > > (*slli_slli_uw): New pattern. > > > (*shift_then_shNadd.uw): New pattern. > > > (*slliuw): Rename to riscv_slli_uw. > > > (riscv_slli_uw): Renamed from *slliuw. > > > (*zeroextract2_highbits): New pattern. > > > (*zero_extract): New pattern, which will be split to > > > shift-left + shift-right. > > > * config/riscv/predicates.md (dimode_shift_operand): > > > * config/riscv/riscv.md (*zero_extract_lowbits): > > > (zero_extendsidi2_shifted): Rename. > > > (*zero_extendsidi2_shifted): Generalize. > > > (*shift_truthvalue): New pattern. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/riscv/shift-shift-6.c: New test. > > > * gcc.target/riscv/shift-shift-7.c: New test. > > > * gcc.target/riscv/shift-shift-8.c: New test. > > > * gcc.target/riscv/shift-shift-9.c: New test. > > > * gcc.target/riscv/snez.c: New test. > > > > > > Commit notes: > > > - Depends on a predicate posted in "RISC-V: Optimize branches testing > > >a bit-range or a shifted immediate". Depending on the order of > > >applying these, I'll take care to pull that part out of the other > > >patch if needed. > > > > > > Version-changes: 2 > > > - refactor > > > - optimise for additional corner cases and deal with fallout > > > > > > Version-changes: 3 > > > - removed the [WIP] from the commit message (no other changes) > > > > > > Signed-off-by: Philipp Tomsich > > > --- > > > > > > (no changes since v1) > > > > > > gcc/config/riscv/bitmanip.md | 142 ++ > > > gcc/config/riscv/predicates.md| 5 + > > > gcc/config/riscv/riscv.md | 75 +++-- > > > .../gcc.target/riscv/shift-shift-6.c | 14 ++ > > > .../gcc.target/riscv/shift-shift-7.c | 16 ++ > > > .../gcc.target/riscv/shift-shift-8.c | 20 +++ > > > .../gcc.target/riscv/shift-shift-9.c | 15 ++ > > > gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ > > > 8 files changed, 261 insertions(+), 40 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c > > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c > > > create mode 100644
Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
On Sun, 20 Nov 2022 at 17:38, Jeff Law wrote: > > > On 11/9/22 16:10, Philipp Tomsich wrote: > > The current method of treating shifts of extended values on RISC-V > > frequently causes sequences of 3 shifts, despite the presence of the > > 'zero_extendsidi2_shifted' pattern. > > > > Consider: > > unsigned long f(unsigned int a, unsigned long b) > > { > > a = a << 1; > > unsigned long c = (unsigned long) a; > > c = b + (c<<4); > > return c; > > } > > which will present at combine-time as: > > Trying 7, 8 -> 9: > > 7: r78:SI=r81:DI#0<<0x1 > >REG_DEAD r81:DI > > 8: r79:DI=zero_extend(r78:SI) > >REG_DEAD r78:SI > > 9: r72:DI=r79:DI<<0x4 > >REG_DEAD r79:DI > > Failed to match this instruction: > > (set (reg:DI 72 [ _1 ]) > > (and:DI (ashift:DI (reg:DI 81) > > (const_int 5 [0x5])) > > (const_int 68719476704 [0xfffe0]))) > > and produce the following (optimized) assembly: > > f: > > slliw a5,a0,1 > > sllia5,a5,32 > > srlia5,a5,28 > > add a0,a5,a1 > > ret > > > > The current way of handling this (in 'zero_extendsidi2_shifted') > > doesn't apply for two reasons: > > - this is seen before reload, and > > - (more importantly) the constant mask is not 0xul. > > > > To address this, we introduce a generalized version of shifting > > zero-extended values that supports any mask of consecutive ones as > > long as the number of training zeros is the inner shift-amount. > > > > With this new split, we generate the following assembly for the > > aforementioned function: > > f: > > sllia0,a0,33 > > srlia0,a0,28 > > add a0,a0,a1 > > ret > > > > Unfortunately, all of this causes some fallout (especially in how it > > interacts with Zb* extensions and zero_extract expressions formed > > during combine): this is addressed through additional instruction > > splitting and handling of zero_extract. > > > > gcc/ChangeLog: > > > > * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed > > as an and:DI. > > (*andi_add.uw): New pattern. > > (*slli_slli_uw): New pattern. > > (*shift_then_shNadd.uw): New pattern. > > (*slliuw): Rename to riscv_slli_uw. > > (riscv_slli_uw): Renamed from *slliuw. > > (*zeroextract2_highbits): New pattern. > > (*zero_extract): New pattern, which will be split to > > shift-left + shift-right. > > * config/riscv/predicates.md (dimode_shift_operand): > > * config/riscv/riscv.md (*zero_extract_lowbits): > > (zero_extendsidi2_shifted): Rename. > > (*zero_extendsidi2_shifted): Generalize. > > (*shift_truthvalue): New pattern. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/shift-shift-6.c: New test. > > * gcc.target/riscv/shift-shift-7.c: New test. > > * gcc.target/riscv/shift-shift-8.c: New test. > > * gcc.target/riscv/shift-shift-9.c: New test. > > * gcc.target/riscv/snez.c: New test. > > > > Commit notes: > > - Depends on a predicate posted in "RISC-V: Optimize branches testing > >a bit-range or a shifted immediate". Depending on the order of > >applying these, I'll take care to pull that part out of the other > >patch if needed. > > > > Version-changes: 2 > > - refactor > > - optimise for additional corner cases and deal with fallout > > > > Version-changes: 3 > > - removed the [WIP] from the commit message (no other changes) > > > > Signed-off-by: Philipp Tomsich > > --- > > > > (no changes since v1) > > > > gcc/config/riscv/bitmanip.md | 142 ++ > > gcc/config/riscv/predicates.md| 5 + > > gcc/config/riscv/riscv.md | 75 +++-- > > .../gcc.target/riscv/shift-shift-6.c | 14 ++ > > .../gcc.target/riscv/shift-shift-7.c | 16 ++ > > .../gcc.target/riscv/shift-shift-8.c | 20 +++ > > .../gcc.target/riscv/shift-shift-9.c | 15 ++ > > gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ > > 8 files changed, 261 insertions(+), 40 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c > > create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c > > > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md > > index 78fdf02c2ec..06126ac4819 100644 > > --- a/gcc/config/riscv/bitmanip.md > > +++ b/gcc/config/riscv/bitmanip.md > > @@ -29,7 +29,20 @@ > > [(set_attr "type" "bitmanip,load") > > (set_attr "mode" "DI")]) > > > > -(define_insn "riscv_shNadd" > > +;; We may end up forming a slli.uw
Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
On 11/9/22 16:10, Philipp Tomsich wrote: The current method of treating shifts of extended values on RISC-V frequently causes sequences of 3 shifts, despite the presence of the 'zero_extendsidi2_shifted' pattern. Consider: unsigned long f(unsigned int a, unsigned long b) { a = a << 1; unsigned long c = (unsigned long) a; c = b + (c<<4); return c; } which will present at combine-time as: Trying 7, 8 -> 9: 7: r78:SI=r81:DI#0<<0x1 REG_DEAD r81:DI 8: r79:DI=zero_extend(r78:SI) REG_DEAD r78:SI 9: r72:DI=r79:DI<<0x4 REG_DEAD r79:DI Failed to match this instruction: (set (reg:DI 72 [ _1 ]) (and:DI (ashift:DI (reg:DI 81) (const_int 5 [0x5])) (const_int 68719476704 [0xfffe0]))) and produce the following (optimized) assembly: f: slliw a5,a0,1 sllia5,a5,32 srlia5,a5,28 add a0,a5,a1 ret The current way of handling this (in 'zero_extendsidi2_shifted') doesn't apply for two reasons: - this is seen before reload, and - (more importantly) the constant mask is not 0xul. To address this, we introduce a generalized version of shifting zero-extended values that supports any mask of consecutive ones as long as the number of training zeros is the inner shift-amount. With this new split, we generate the following assembly for the aforementioned function: f: sllia0,a0,33 srlia0,a0,28 add a0,a0,a1 ret Unfortunately, all of this causes some fallout (especially in how it interacts with Zb* extensions and zero_extract expressions formed during combine): this is addressed through additional instruction splitting and handling of zero_extract. gcc/ChangeLog: * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed as an and:DI. (*andi_add.uw): New pattern. (*slli_slli_uw): New pattern. (*shift_then_shNadd.uw): New pattern. (*slliuw): Rename to riscv_slli_uw. (riscv_slli_uw): Renamed from *slliuw. (*zeroextract2_highbits): New pattern. (*zero_extract): New pattern, which will be split to shift-left + shift-right. * config/riscv/predicates.md (dimode_shift_operand): * config/riscv/riscv.md (*zero_extract_lowbits): (zero_extendsidi2_shifted): Rename. (*zero_extendsidi2_shifted): Generalize. (*shift_truthvalue): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/shift-shift-6.c: New test. * gcc.target/riscv/shift-shift-7.c: New test. * gcc.target/riscv/shift-shift-8.c: New test. * gcc.target/riscv/shift-shift-9.c: New test. * gcc.target/riscv/snez.c: New test. Commit notes: - Depends on a predicate posted in "RISC-V: Optimize branches testing a bit-range or a shifted immediate". Depending on the order of applying these, I'll take care to pull that part out of the other patch if needed. Version-changes: 2 - refactor - optimise for additional corner cases and deal with fallout Version-changes: 3 - removed the [WIP] from the commit message (no other changes) Signed-off-by: Philipp Tomsich --- (no changes since v1) gcc/config/riscv/bitmanip.md | 142 ++ gcc/config/riscv/predicates.md| 5 + gcc/config/riscv/riscv.md | 75 +++-- .../gcc.target/riscv/shift-shift-6.c | 14 ++ .../gcc.target/riscv/shift-shift-7.c | 16 ++ .../gcc.target/riscv/shift-shift-8.c | 20 +++ .../gcc.target/riscv/shift-shift-9.c | 15 ++ gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ 8 files changed, 261 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 78fdf02c2ec..06126ac4819 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -29,7 +29,20 @@ [(set_attr "type" "bitmanip,load") (set_attr "mode" "DI")]) -(define_insn "riscv_shNadd" +;; We may end up forming a slli.uw with an immediate of 0 (while +;; splitting through "*slli_slli_uw", below). +;; Match this back to a zext.w +(define_insn "*zext.w" + [(set (match_operand:DI 0 "register_operand" "=r") + (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") + (const_int 0)) + (const_int 4294967295)))] + "TARGET_64BIT && TARGET_ZBA" + "zext.w\t%0,%1" + [(set_attr "type" "bitmanip") + (set_attr "mode" "DI")]) Would it be
[PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split
The current method of treating shifts of extended values on RISC-V frequently causes sequences of 3 shifts, despite the presence of the 'zero_extendsidi2_shifted' pattern. Consider: unsigned long f(unsigned int a, unsigned long b) { a = a << 1; unsigned long c = (unsigned long) a; c = b + (c<<4); return c; } which will present at combine-time as: Trying 7, 8 -> 9: 7: r78:SI=r81:DI#0<<0x1 REG_DEAD r81:DI 8: r79:DI=zero_extend(r78:SI) REG_DEAD r78:SI 9: r72:DI=r79:DI<<0x4 REG_DEAD r79:DI Failed to match this instruction: (set (reg:DI 72 [ _1 ]) (and:DI (ashift:DI (reg:DI 81) (const_int 5 [0x5])) (const_int 68719476704 [0xfffe0]))) and produce the following (optimized) assembly: f: slliw a5,a0,1 sllia5,a5,32 srlia5,a5,28 add a0,a5,a1 ret The current way of handling this (in 'zero_extendsidi2_shifted') doesn't apply for two reasons: - this is seen before reload, and - (more importantly) the constant mask is not 0xul. To address this, we introduce a generalized version of shifting zero-extended values that supports any mask of consecutive ones as long as the number of training zeros is the inner shift-amount. With this new split, we generate the following assembly for the aforementioned function: f: sllia0,a0,33 srlia0,a0,28 add a0,a0,a1 ret Unfortunately, all of this causes some fallout (especially in how it interacts with Zb* extensions and zero_extract expressions formed during combine): this is addressed through additional instruction splitting and handling of zero_extract. gcc/ChangeLog: * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed as an and:DI. (*andi_add.uw): New pattern. (*slli_slli_uw): New pattern. (*shift_then_shNadd.uw): New pattern. (*slliuw): Rename to riscv_slli_uw. (riscv_slli_uw): Renamed from *slliuw. (*zeroextract2_highbits): New pattern. (*zero_extract): New pattern, which will be split to shift-left + shift-right. * config/riscv/predicates.md (dimode_shift_operand): * config/riscv/riscv.md (*zero_extract_lowbits): (zero_extendsidi2_shifted): Rename. (*zero_extendsidi2_shifted): Generalize. (*shift_truthvalue): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/shift-shift-6.c: New test. * gcc.target/riscv/shift-shift-7.c: New test. * gcc.target/riscv/shift-shift-8.c: New test. * gcc.target/riscv/shift-shift-9.c: New test. * gcc.target/riscv/snez.c: New test. Commit notes: - Depends on a predicate posted in "RISC-V: Optimize branches testing a bit-range or a shifted immediate". Depending on the order of applying these, I'll take care to pull that part out of the other patch if needed. Version-changes: 2 - refactor - optimise for additional corner cases and deal with fallout Version-changes: 3 - removed the [WIP] from the commit message (no other changes) Signed-off-by: Philipp Tomsich --- (no changes since v1) gcc/config/riscv/bitmanip.md | 142 ++ gcc/config/riscv/predicates.md| 5 + gcc/config/riscv/riscv.md | 75 +++-- .../gcc.target/riscv/shift-shift-6.c | 14 ++ .../gcc.target/riscv/shift-shift-7.c | 16 ++ .../gcc.target/riscv/shift-shift-8.c | 20 +++ .../gcc.target/riscv/shift-shift-9.c | 15 ++ gcc/testsuite/gcc.target/riscv/snez.c | 14 ++ 8 files changed, 261 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 78fdf02c2ec..06126ac4819 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -29,7 +29,20 @@ [(set_attr "type" "bitmanip,load") (set_attr "mode" "DI")]) -(define_insn "riscv_shNadd" +;; We may end up forming a slli.uw with an immediate of 0 (while +;; splitting through "*slli_slli_uw", below). +;; Match this back to a zext.w +(define_insn "*zext.w" + [(set (match_operand:DI 0 "register_operand" "=r") + (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r") + (const_int 0)) + (const_int 4294967295)))] + "TARGET_64BIT && TARGET_ZBA" + "zext.w\t%0,%1" + [(set_attr "type" "bitmanip") + (set_attr "mode" "DI")]) + +(define_insn "*shNadd" [(set (match_operand:X 0 "register_operand" "=r") (plus:X