Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-07-03 Thread Lehua Ding
Commited, thanks Robin, Kito, and Jeff.


--Original--
From: "juzhe.zh...@rivai.ai"

Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-07-03 Thread juzhe.zh...@rivai.ai
Thanks kito. 
Lehua will merge it for me.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-03 17:01
To: Robin Dapp
CC: juzhe.zh...@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
Tried on local, widen-complicate-7.c, widen-complicate-8.c and
widen-complicate-9.c need those bridge pattern, otherwise will fail to
combine, so give an explicitly LGTM from my side.
 
On Mon, Jul 3, 2023 at 3:48 PM Robin Dapp via Gcc-patches
 wrote:
>
> To reiterate, this is OK from my side.  As discussed in the other
> thread, Jeff would like to have more info on whether a bridge pattern
> is needed at all and I agreed to get back to it in a while.  Until
> then, we can merge this.
>
> Regards
>  Robin
>
 


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-07-03 Thread Kito Cheng via Gcc-patches
Tried on local, widen-complicate-7.c, widen-complicate-8.c and
widen-complicate-9.c need those bridge pattern, otherwise will fail to
combine, so give an explicitly LGTM from my side.

On Mon, Jul 3, 2023 at 3:48 PM Robin Dapp via Gcc-patches
 wrote:
>
> To reiterate, this is OK from my side.  As discussed in the other
> thread, Jeff would like to have more info on whether a bridge pattern
> is needed at all and I agreed to get back to it in a while.  Until
> then, we can merge this.
>
> Regards
>  Robin
>


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-07-03 Thread Robin Dapp via Gcc-patches
To reiterate, this is OK from my side.  As discussed in the other
thread, Jeff would like to have more info on whether a bridge pattern
is needed at all and I agreed to get back to it in a while.  Until
then, we can merge this.

Regards
 Robin



Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai
>> I've triple checked this already.
You mean you still didn't see vfwmul.vv ?

That's odd. Let's wait for kito or Robin test this patch.
Then, I believe they will know what I am saying.

>> I would strongly suggest looking at a dependency height reduction
>> pattern if you want to optimize that code further.
I did it long time ago. Turns out it's better to do that on Combine PASS in 
both GCC and LLVM.

Never mind, I always have this implementation in my downstream and won't affect 
my downstream GCC maintainment.
It's ok that this patch is not approved since I can get the perfect codegen in 
my downstream. 

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 09:26
To: juzhe.zh...@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; palmer; palmer; Robin Dapp
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:
> No, reduction patterns won't help.
> As I said in vfwmul patch. You should make sure your environment is 
> working then try again.
I've triple checked this already.
 
I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:
 
14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269
 
I checked it with the code from the godbolt links you suggested with the 
options shown in those links.
 
More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.
 
The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.
 
This patch is not needed and does not affect code generation.
 
I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.
 
Jeff
 


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches




On 6/29/23 19:14, juzhe.zh...@rivai.ai wrote:

No, reduction patterns won't help.
As I said in vfwmul patch. You should make sure your environment is 
working then try again.

I've triple checked this already.

I checked it again and your patch does not impact behavior, nor should 
it.   I checked it on top of these trunk commits:


14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269

I checked it with the code from the godbolt links you suggested with the 
options shown in those links.


More importantly, your explanation of what the pattern is supposed to do 
shows a misunderstanding of what combine's capabilities actually are.  A 
bridge or intermediate pattern is not needed here, combine can 
substitute multiple sources in combination attempts as can be clearly 
seen from the dump fragments I posted.


The only reason I didn't reject the patch at the outset was the 
possibility that maybe we were trying to combine more than 4 
instructions or that possibility something about the number of operands, 
unspecs, whatever were getting in the way.


This patch is not needed and does not affect code generation.

I would strongly suggest looking at a dependency height reduction 
pattern if you want to optimize that code further.


Jeff


Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread juzhe.zh...@rivai.ai
No, reduction patterns won't help. 
As I said in vfwmul patch. You should make sure your environment is working 
then try again.

Thanks.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-30 07:43
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/28/23 16:56, 钟居哲 wrote:
> 
> 
> 
> juzhe.zh...@rivai.ai
> 
> *From:* Jeff Law <mailto:jeffreya...@gmail.com>
> *Date:* 2023-06-29 06:43
> *To:* 钟居哲 <mailto:juzhe.zh...@rivai.ai>; gcc-patches
> <mailto:gcc-patches@gcc.gnu.org>
> *CC:* kito.cheng <mailto:kito.ch...@gmail.com>; kito.cheng
> <mailto:kito.ch...@sifive.com>; palmer <mailto:pal...@dabbelt.com>;
> palmer <mailto:pal...@rivosinc.com>; rdapp.gcc
> <mailto:rdapp....@gmail.com>
>     *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
> combine lowering
> On 6/28/23 16:10, 钟居哲 wrot
>  > Sure.
>  >
>  > https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
>  >
>  > Failed to match this instruction:
>  > (set (reg:VNx2DF 134 [ vect__31.47 ])
>  >  (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
>  > vect__28.44 ])))
>  >  (reg:VNx2DF 150 [ vect__8.12 ])
>  >  (reg:VNx2DF 171 [ vect__29.45 ])))
> Please attach the full dump.  I would expect to see additional attempts
> with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.
 
Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.
 
jeff
 
 


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-29 Thread Jeff Law via Gcc-patches




On 6/28/23 16:56, 钟居哲 wrote:




juzhe.zh...@rivai.ai

*From:* Jeff Law <mailto:jeffreya...@gmail.com>
*Date:* 2023-06-29 06:43
*To:* 钟居哲 <mailto:juzhe.zh...@rivai.ai>; gcc-patches
<mailto:gcc-patches@gcc.gnu.org>
*CC:* kito.cheng <mailto:kito.ch...@gmail.com>; kito.cheng
<mailto:kito.ch...@sifive.com>; palmer <mailto:pal...@dabbelt.com>;
palmer <mailto:pal...@rivosinc.com>; rdapp.gcc
<mailto:rdapp....@gmail.com>
    *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
combine lowering
On 6/28/23 16:10, 钟居哲 wrot
 > Sure.
 >
 > https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
 >
 > Failed to match this instruction:
 > (set (reg:VNx2DF 134 [ vect__31.47 ])
 >      (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
 > vect__28.44 ])))
 >          (reg:VNx2DF 150 [ vect__8.12 ])
 >          (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump.  I would expect to see additional attempts
with more operands replaced.
THanks for the dump.  I think this fundamentally the same issue as the 
widening problem.


Drop those intermediate patterns.  They're not needed/helpful.  You may 
need a dependency height reduction pattern to get the code you want, but 
I see no evidence those extra patterns will solve anything.


jeff



Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 16:10, 钟居哲 wrote:

Sure.

https://godbolt.org/z/8857KzTno 

Failed to match this instruction:
(set (reg:VNx2DF 134 [ vect__31.47 ])
     (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [ 
vect__28.44 ])))

         (reg:VNx2DF 150 [ vect__8.12 ])
         (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump.  I would expect to see additional attempts 
with more operands replaced.


jeff


Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread 钟居哲
Sure.

https://godbolt.org/z/8857KzTno 

Failed to match this instruction:
(set (reg:VNx2DF 134 [ vect__31.47 ])
(fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [ vect__28.44 
])))
(reg:VNx2DF 150 [ vect__8.12 ])
(reg:VNx2DF 171 [ vect__29.45 ])))



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-29 02:16
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
 
 
On 6/28/23 05:55, Juzhe-Zhong wrote:
> Similar to vfwmacc. Add combine patterns as follows:
> 
> For vfwnmsac:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
> 
> For vfwmsac:
> 1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
> 
> For vfwnmacc:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg 
> (reg)) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec-opt.md (*double_widen_fnma): New 
> pattern.
>  (*single_widen_fnma): Ditto.
>  (*double_widen_fms): Ditto.
>  (*single_widen_fms): Ditto.
>  (*double_widen_fnms): Ditto.
>  (*single_widen_fnms): Ditto.
> 
 
> +
> +;; This helps to match ext + fnma.
> +(define_insn_and_split "*single_widen_fnma"
> +  [(set (match_operand:VWEXTF 0 "register_operand")
> + (fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better.  It looks like it's meant to be a 
bridge to another pattern.  However, it looks like it would be a 4->1 
pattern without needing a bridge.  So I'd like to know why that code 
isn't working.
 
Can you send the before/after combine dumps which show this bridge 
pattern being used?
 
The same concern exists with the other bridge patterns, but I don't 
think I need to see the before/after for each of them.
 
 
 
Thanks,
Jeff
 
 


Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Jeff Law via Gcc-patches




On 6/28/23 05:55, Juzhe-Zhong wrote:

Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) 
)))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

 * config/riscv/autovec-opt.md (*double_widen_fnma): New pattern.
 (*single_widen_fnma): Ditto.
 (*double_widen_fms): Ditto.
 (*single_widen_fms): Ditto.
 (*double_widen_fnms): Ditto.
 (*single_widen_fnms): Ditto.




+
+;; This helps to match ext + fnma.
+(define_insn_and_split "*single_widen_fnma"
+  [(set (match_operand:VWEXTF 0 "register_operand")
+   (fma:VWEXTF
+ (neg:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand")))
+ (match_operand:VWEXTF 3 "register_operand")
+ (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better.  It looks like it's meant to be a 
bridge to another pattern.  However, it looks like it would be a 4->1 
pattern without needing a bridge.  So I'd like to know why that code 
isn't working.


Can you send the before/after combine dumps which show this bridge 
pattern being used?


The same concern exists with the other bridge patterns, but I don't 
think I need to see the before/after for each of them.




Thanks,
Jeff



[PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering

2023-06-28 Thread Juzhe-Zhong
Similar to vfwmacc. Add combine patterns as follows:

For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))

For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))

For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) 
)))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*double_widen_fnma): New pattern.
(*single_widen_fnma): Ditto.
(*double_widen_fms): Ditto.
(*single_widen_fms): Ditto.
(*double_widen_fnms): Ditto.
(*single_widen_fnms): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 182 ++
 .../riscv/rvv/autovec/widen/widen-10.c|  22 +++
 .../riscv/rvv/autovec/widen/widen-11.c|  22 +++
 .../riscv/rvv/autovec/widen/widen-12.c|  22 +++
 .../rvv/autovec/widen/widen-complicate-7.c|  27 +++
 .../rvv/autovec/widen/widen-complicate-8.c|  27 +++
 .../rvv/autovec/widen/widen-complicate-9.c|  27 +++
 .../riscv/rvv/autovec/widen/widen_run-10.c|  32 +++
 .../riscv/rvv/autovec/widen/widen_run-11.c|  32 +++
 .../riscv/rvv/autovec/widen/widen_run-12.c|  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-10.c |  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-11.c |  32 +++
 .../rvv/autovec/widen/widen_run_zvfh-12.c |  32 +++
 13 files changed, 521 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 1a1cef0eaa5..0c0ba685d6b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -502,3 +502,185 @@
   }
   [(set_attr "type" "vfwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [FP] VFWNMSAC
+;; -
+;; Includes:
+;; - vfwnmsac.vv
+;; -
+
+;; Combine ext + ext + fnma ===> widen fnma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;; vect__8.176_40 = (vector([2,2]) double) vect__7.175_41;
+;; vect__11.180_35 = (vector([2,2]) double) vect__10.179_36;
+;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
+(define_insn_and_split "*double_widen_fnma"
+  [(set (match_operand:VWEXTF 0 "register_operand")
+   (fma:VWEXTF
+ (neg:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 2 "register_operand")))
+ (float_extend:VWEXTF
+   (match_operand: 3 "register_operand"))
+ (match_operand:VWEXTF 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul_neg 
(PLUS, mode),
+