Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-04-06 Thread Philipp Tomsich
On Sat 6. Apr 2024 at 06:52, Jeff Law  wrote:

>
>
> On 3/27/24 4:55 AM, Philipp Tomsich wrote:
> > Jeff,
> >
> > just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
> > generates the suboptimal sequence:
> >https://godbolt.org/z/K9YYEPsvY
> Realistically it's too late to get this into gcc-14.


I didn’t expect this for 14, but wanted to make sure we didn’t forget about
it once the branch for 15 opens up.

Thanks,
Philipp.

>


Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-04-05 Thread Jeff Law




On 3/27/24 4:55 AM, Philipp Tomsich wrote:

Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
   https://godbolt.org/z/K9YYEPsvY

Realistically it's too late to get this into gcc-14.

Jeff



Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-03-27 Thread Philipp Tomsich
Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
  https://godbolt.org/z/K9YYEPsvY

Thanks,
Philipp.


On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich  wrote:
>
> On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
> >
> >
> > On 11/9/22 16:10, Philipp Tomsich wrote:
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >  unsigned long f(unsigned int a, unsigned long b)
> > >  {
> > >  a = a << 1;
> > >  unsigned long c = (unsigned long) a;
> > >  c = b + (c<<4);
> > >  return c;
> > >  }
> > > which will present at combine-time as:
> > >  Trying 7, 8 -> 9:
> > >  7: r78:SI=r81:DI#0<<0x1
> > >REG_DEAD r81:DI
> > >  8: r79:DI=zero_extend(r78:SI)
> > >REG_DEAD r78:SI
> > >  9: r72:DI=r79:DI<<0x4
> > >REG_DEAD r79:DI
> > >  Failed to match this instruction:
> > >  (set (reg:DI 72 [ _1 ])
> > >  (and:DI (ashift:DI (reg:DI 81)
> > >  (const_int 5 [0x5]))
> > >   (const_int 68719476704 [0xfffe0])))
> > > and produce the following (optimized) assembly:
> > >  f:
> > >   slliw   a5,a0,1
> > >   sllia5,a5,32
> > >   srlia5,a5,28
> > >   add a0,a5,a1
> > >   ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xul.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >  f:
> > >   sllia0,a0,33
> > >   srlia0,a0,28
> > >   add a0,a0,a1
> > >   ret
> > >
> > > Unfortunately, all of this causes some fallout (especially in how it
> > > interacts with Zb* extensions and zero_extract expressions formed
> > > during combine): this is addressed through additional instruction
> > > splitting and handling of zero_extract.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> > >  as an and:DI.
> > >   (*andi_add.uw): New pattern.
> > >   (*slli_slli_uw): New pattern.
> > >   (*shift_then_shNadd.uw): New pattern.
> > >   (*slliuw): Rename to riscv_slli_uw.
> > >   (riscv_slli_uw): Renamed from *slliuw.
> > >   (*zeroextract2_highbits): New pattern.
> > >   (*zero_extract): New pattern, which will be split to
> > >   shift-left + shift-right.
> > >   * config/riscv/predicates.md (dimode_shift_operand):
> > >   * config/riscv/riscv.md (*zero_extract_lowbits):
> > >   (zero_extendsidi2_shifted): Rename.
> > >   (*zero_extendsidi2_shifted): Generalize.
> > >   (*shift_truthvalue): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/shift-shift-6.c: New test.
> > >   * gcc.target/riscv/shift-shift-7.c: New test.
> > >   * gcc.target/riscv/shift-shift-8.c: New test.
> > >   * gcc.target/riscv/shift-shift-9.c: New test.
> > >   * gcc.target/riscv/snez.c: New test.
> > >
> > > Commit notes:
> > > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> > >a bit-range or a shifted immediate".  Depending on the order of
> > >applying these, I'll take care to pull that part out of the other
> > >patch if needed.
> > >
> > > Version-changes: 2
> > > - refactor
> > > - optimise for additional corner cases and deal with fallout
> > >
> > > Version-changes: 3
> > > - removed the [WIP] from the commit message (no other changes)
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > > (no changes since v1)
> > >
> > >   gcc/config/riscv/bitmanip.md  | 142 ++
> > >   gcc/config/riscv/predicates.md|   5 +
> > >   gcc/config/riscv/riscv.md |  75 +++--
> > >   .../gcc.target/riscv/shift-shift-6.c  |  14 ++
> > >   .../gcc.target/riscv/shift-shift-7.c  |  16 ++
> > >   .../gcc.target/riscv/shift-shift-8.c  |  20 +++
> > >   .../gcc.target/riscv/shift-shift-9.c  |  15 ++
> > >   gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
> > >   8 files changed, 261 insertions(+), 40 deletions(-)
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
> > >   create mode 100644 

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2022-11-21 Thread Philipp Tomsich
On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
>
>
> On 11/9/22 16:10, Philipp Tomsich wrote:
> > The current method of treating shifts of extended values on RISC-V
> > frequently causes sequences of 3 shifts, despite the presence of the
> > 'zero_extendsidi2_shifted' pattern.
> >
> > Consider:
> >  unsigned long f(unsigned int a, unsigned long b)
> >  {
> >  a = a << 1;
> >  unsigned long c = (unsigned long) a;
> >  c = b + (c<<4);
> >  return c;
> >  }
> > which will present at combine-time as:
> >  Trying 7, 8 -> 9:
> >  7: r78:SI=r81:DI#0<<0x1
> >REG_DEAD r81:DI
> >  8: r79:DI=zero_extend(r78:SI)
> >REG_DEAD r78:SI
> >  9: r72:DI=r79:DI<<0x4
> >REG_DEAD r79:DI
> >  Failed to match this instruction:
> >  (set (reg:DI 72 [ _1 ])
> >  (and:DI (ashift:DI (reg:DI 81)
> >  (const_int 5 [0x5]))
> >   (const_int 68719476704 [0xfffe0])))
> > and produce the following (optimized) assembly:
> >  f:
> >   slliw   a5,a0,1
> >   sllia5,a5,32
> >   srlia5,a5,28
> >   add a0,a5,a1
> >   ret
> >
> > The current way of handling this (in 'zero_extendsidi2_shifted')
> > doesn't apply for two reasons:
> > - this is seen before reload, and
> > - (more importantly) the constant mask is not 0xul.
> >
> > To address this, we introduce a generalized version of shifting
> > zero-extended values that supports any mask of consecutive ones as
> > long as the number of training zeros is the inner shift-amount.
> >
> > With this new split, we generate the following assembly for the
> > aforementioned function:
> >  f:
> >   sllia0,a0,33
> >   srlia0,a0,28
> >   add a0,a0,a1
> >   ret
> >
> > Unfortunately, all of this causes some fallout (especially in how it
> > interacts with Zb* extensions and zero_extract expressions formed
> > during combine): this is addressed through additional instruction
> > splitting and handling of zero_extract.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> >  as an and:DI.
> >   (*andi_add.uw): New pattern.
> >   (*slli_slli_uw): New pattern.
> >   (*shift_then_shNadd.uw): New pattern.
> >   (*slliuw): Rename to riscv_slli_uw.
> >   (riscv_slli_uw): Renamed from *slliuw.
> >   (*zeroextract2_highbits): New pattern.
> >   (*zero_extract): New pattern, which will be split to
> >   shift-left + shift-right.
> >   * config/riscv/predicates.md (dimode_shift_operand):
> >   * config/riscv/riscv.md (*zero_extract_lowbits):
> >   (zero_extendsidi2_shifted): Rename.
> >   (*zero_extendsidi2_shifted): Generalize.
> >   (*shift_truthvalue): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/shift-shift-6.c: New test.
> >   * gcc.target/riscv/shift-shift-7.c: New test.
> >   * gcc.target/riscv/shift-shift-8.c: New test.
> >   * gcc.target/riscv/shift-shift-9.c: New test.
> >   * gcc.target/riscv/snez.c: New test.
> >
> > Commit notes:
> > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> >a bit-range or a shifted immediate".  Depending on the order of
> >applying these, I'll take care to pull that part out of the other
> >patch if needed.
> >
> > Version-changes: 2
> > - refactor
> > - optimise for additional corner cases and deal with fallout
> >
> > Version-changes: 3
> > - removed the [WIP] from the commit message (no other changes)
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > (no changes since v1)
> >
> >   gcc/config/riscv/bitmanip.md  | 142 ++
> >   gcc/config/riscv/predicates.md|   5 +
> >   gcc/config/riscv/riscv.md |  75 +++--
> >   .../gcc.target/riscv/shift-shift-6.c  |  14 ++
> >   .../gcc.target/riscv/shift-shift-7.c  |  16 ++
> >   .../gcc.target/riscv/shift-shift-8.c  |  20 +++
> >   .../gcc.target/riscv/shift-shift-9.c  |  15 ++
> >   gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
> >   8 files changed, 261 insertions(+), 40 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 78fdf02c2ec..06126ac4819 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -29,7 +29,20 @@
> > [(set_attr "type" "bitmanip,load")
> >  (set_attr "mode" "DI")])
> >
> > -(define_insn "riscv_shNadd"
> > +;; We may end up forming a slli.uw 

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2022-11-20 Thread Jeff Law via Gcc-patches



On 11/9/22 16:10, Philipp Tomsich wrote:

The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.

Consider:
 unsigned long f(unsigned int a, unsigned long b)
 {
 a = a << 1;
 unsigned long c = (unsigned long) a;
 c = b + (c<<4);
 return c;
 }
which will present at combine-time as:
 Trying 7, 8 -> 9:
 7: r78:SI=r81:DI#0<<0x1
   REG_DEAD r81:DI
 8: r79:DI=zero_extend(r78:SI)
   REG_DEAD r78:SI
 9: r72:DI=r79:DI<<0x4
   REG_DEAD r79:DI
 Failed to match this instruction:
 (set (reg:DI 72 [ _1 ])
 (and:DI (ashift:DI (reg:DI 81)
 (const_int 5 [0x5]))
(const_int 68719476704 [0xfffe0])))
and produce the following (optimized) assembly:
 f:
slliw   a5,a0,1
sllia5,a5,32
srlia5,a5,28
add a0,a5,a1
ret

The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xul.

To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.

With this new split, we generate the following assembly for the
aforementioned function:
 f:
sllia0,a0,33
srlia0,a0,28
add a0,a0,a1
ret

Unfortunately, all of this causes some fallout (especially in how it
interacts with Zb* extensions and zero_extract expressions formed
during combine): this is addressed through additional instruction
splitting and handling of zero_extract.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
 as an and:DI.
(*andi_add.uw): New pattern.
(*slli_slli_uw): New pattern.
(*shift_then_shNadd.uw): New pattern.
(*slliuw): Rename to riscv_slli_uw.
(riscv_slli_uw): Renamed from *slliuw.
(*zeroextract2_highbits): New pattern.
(*zero_extract): New pattern, which will be split to
shift-left + shift-right.
* config/riscv/predicates.md (dimode_shift_operand):
* config/riscv/riscv.md (*zero_extract_lowbits):
(zero_extendsidi2_shifted): Rename.
(*zero_extendsidi2_shifted): Generalize.
(*shift_truthvalue): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/shift-shift-6.c: New test.
* gcc.target/riscv/shift-shift-7.c: New test.
* gcc.target/riscv/shift-shift-8.c: New test.
* gcc.target/riscv/shift-shift-9.c: New test.
* gcc.target/riscv/snez.c: New test.

Commit notes:
- Depends on a predicate posted in "RISC-V: Optimize branches testing
   a bit-range or a shifted immediate".  Depending on the order of
   applying these, I'll take care to pull that part out of the other
   patch if needed.

Version-changes: 2
- refactor
- optimise for additional corner cases and deal with fallout

Version-changes: 3
- removed the [WIP] from the commit message (no other changes)

Signed-off-by: Philipp Tomsich 
---

(no changes since v1)

  gcc/config/riscv/bitmanip.md  | 142 ++
  gcc/config/riscv/predicates.md|   5 +
  gcc/config/riscv/riscv.md |  75 +++--
  .../gcc.target/riscv/shift-shift-6.c  |  14 ++
  .../gcc.target/riscv/shift-shift-7.c  |  16 ++
  .../gcc.target/riscv/shift-shift-8.c  |  20 +++
  .../gcc.target/riscv/shift-shift-9.c  |  15 ++
  gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
  8 files changed, 261 insertions(+), 40 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 78fdf02c2ec..06126ac4819 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -29,7 +29,20 @@
[(set_attr "type" "bitmanip,load")
 (set_attr "mode" "DI")])
  
-(define_insn "riscv_shNadd"

+;; We may end up forming a slli.uw with an immediate of 0 (while
+;; splitting through "*slli_slli_uw", below).
+;; Match this back to a zext.w
+(define_insn "*zext.w"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+  (const_int 0))
+   (const_int 4294967295)))]
+  "TARGET_64BIT && TARGET_ZBA"
+  "zext.w\t%0,%1"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "DI")])


Would it be 

[PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2022-11-09 Thread Philipp Tomsich
The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.

Consider:
unsigned long f(unsigned int a, unsigned long b)
{
a = a << 1;
unsigned long c = (unsigned long) a;
c = b + (c<<4);
return c;
}
which will present at combine-time as:
Trying 7, 8 -> 9:
7: r78:SI=r81:DI#0<<0x1
  REG_DEAD r81:DI
8: r79:DI=zero_extend(r78:SI)
  REG_DEAD r78:SI
9: r72:DI=r79:DI<<0x4
  REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 72 [ _1 ])
(and:DI (ashift:DI (reg:DI 81)
(const_int 5 [0x5]))
(const_int 68719476704 [0xfffe0])))
and produce the following (optimized) assembly:
f:
slliw   a5,a0,1
sllia5,a5,32
srlia5,a5,28
add a0,a5,a1
ret

The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xul.

To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.

With this new split, we generate the following assembly for the
aforementioned function:
f:
sllia0,a0,33
srlia0,a0,28
add a0,a0,a1
ret

Unfortunately, all of this causes some fallout (especially in how it
interacts with Zb* extensions and zero_extract expressions formed
during combine): this is addressed through additional instruction
splitting and handling of zero_extract.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
as an and:DI.
(*andi_add.uw): New pattern.
(*slli_slli_uw): New pattern.
(*shift_then_shNadd.uw): New pattern.
(*slliuw): Rename to riscv_slli_uw.
(riscv_slli_uw): Renamed from *slliuw.
(*zeroextract2_highbits): New pattern.
(*zero_extract): New pattern, which will be split to
shift-left + shift-right.
* config/riscv/predicates.md (dimode_shift_operand):
* config/riscv/riscv.md (*zero_extract_lowbits):
(zero_extendsidi2_shifted): Rename.
(*zero_extendsidi2_shifted): Generalize.
(*shift_truthvalue): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/shift-shift-6.c: New test.
* gcc.target/riscv/shift-shift-7.c: New test.
* gcc.target/riscv/shift-shift-8.c: New test.
* gcc.target/riscv/shift-shift-9.c: New test.
* gcc.target/riscv/snez.c: New test.

Commit notes:
- Depends on a predicate posted in "RISC-V: Optimize branches testing
  a bit-range or a shifted immediate".  Depending on the order of
  applying these, I'll take care to pull that part out of the other
  patch if needed.

Version-changes: 2
- refactor
- optimise for additional corner cases and deal with fallout

Version-changes: 3
- removed the [WIP] from the commit message (no other changes)

Signed-off-by: Philipp Tomsich 
---

(no changes since v1)

 gcc/config/riscv/bitmanip.md  | 142 ++
 gcc/config/riscv/predicates.md|   5 +
 gcc/config/riscv/riscv.md |  75 +++--
 .../gcc.target/riscv/shift-shift-6.c  |  14 ++
 .../gcc.target/riscv/shift-shift-7.c  |  16 ++
 .../gcc.target/riscv/shift-shift-8.c  |  20 +++
 .../gcc.target/riscv/shift-shift-9.c  |  15 ++
 gcc/testsuite/gcc.target/riscv/snez.c |  14 ++
 8 files changed, 261 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 78fdf02c2ec..06126ac4819 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -29,7 +29,20 @@
   [(set_attr "type" "bitmanip,load")
(set_attr "mode" "DI")])
 
-(define_insn "riscv_shNadd"
+;; We may end up forming a slli.uw with an immediate of 0 (while
+;; splitting through "*slli_slli_uw", below).
+;; Match this back to a zext.w
+(define_insn "*zext.w"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+  (const_int 0))
+   (const_int 4294967295)))]
+  "TARGET_64BIT && TARGET_ZBA"
+  "zext.w\t%0,%1"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "DI")])
+
+(define_insn "*shNadd"
   [(set (match_operand:X 0 "register_operand" "=r")
(plus:X