Re: [PATCH] [RISC-V] optimize Zicond conditional select cases.

2024-04-15 Thread Fei Gao
Committed. Thanks Kito and Jeff for the reveiw. 

BR
Fei
>
>
>On 4/15/24 7:27 PM, Fei Gao wrote:
>> On 2024-04-15 21:04  Jeff Law  wrote:
>>>
>>>
>>>
>>> On 4/15/24 6:58 AM, Kito Cheng wrote:
>>>> It's simple enough, so LGTM for trunk :)
>>> We're already doing this internally.  I just hadn't submitted it due to
>>> being deep into stage4.
>>>
>>> Jeff
>>
>> Hi Jeff
>>
>> Would you like me to commit it now or leave it to you with your bunch of 
>> optimizations in Zicond?
>Go ahead and commit it.  It's extremely low risk.
>
>jeff

Re: Re: [PATCH] [RISC-V] optimize Zicond conditional select cases.

2024-04-15 Thread Fei Gao
On 2024-04-15 21:04  Jeff Law  wrote:
>
>
>
>On 4/15/24 6:58 AM, Kito Cheng wrote:
>> It's simple enough, so LGTM for trunk :)
>We're already doing this internally.  I just hadn't submitted it due to
>being deep into stage4.
>
>Jeff 

Hi Jeff

Would you like me to commit it now or leave it to you with your bunch of 
optimizations in Zicond?

BR
Fei

[PATCH] [RISC-V] optimize Zicond conditional select cases.

2024-04-15 Thread Fei Gao
When one of the two input operands is 0, ADD and IOR are functionally
equivalent.
ADD is slightly preferred over IOR because ADD has a higher likelihood
of being implemented as a compressed instruction when compared to IOR.
C.ADD uses the CR format with any of the 32 RVI registers availble,
while C.OR uses the CA format with limit to just 8 of them.

Conditional select, if zero case:
rd = (rc == 0) ? rs1 : rs2

before patch:

  czero.nez rd, rs1, rc
  czero.eqz rtmp, rs2, rc
  or rd, rd, rtmp

after patch:

  czero.eqz rd, rs1, rc
  czero.nez rtmp, rs2, rc
  add rd, rd, rtmp

Same trick applies for the conditional select, if non-zero case:
rd = (rc != 0) ? rs1 : rs2

riscv-gnu-toolchain regression tests have been passed with no new failure.
---
 gcc/config/riscv/riscv.cc|  2 +-
 .../gcc.target/riscv/zicond-prefer-add-to-or.c   | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-prefer-add-to-or.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e5f00806bb9..93c736549c9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4709,7 +4709,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
  gen_rtx_IF_THEN_ELSE (mode, cond1,
CONST0_RTX (mode),
alt)));
- riscv_emit_binary (IOR, dest, reg1, reg2);
+ riscv_emit_binary (PLUS, dest, reg1, reg2);
  return true;
}
 }
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-prefer-add-to-or.c 
b/gcc/testsuite/gcc.target/riscv/zicond-prefer-add-to-or.c
new file mode 100644
index 000..f3f7beb0b5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-prefer-add-to-or.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=4" { target { 
rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=4" { target { 
rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
+
+long cond_select_if_zero(long a, long b, long c) {
+  return a == 0 ? c : b;
+}
+
+long cond_select_if_non_zero(long a, long b, long c) {
+  return a != 0 ? c : b;
+}
+
+/* { dg-final { scan-assembler-times {add\t}  2 } } */
+/* { dg-final { scan-assembler-not {or\t} } } */
+
-- 
2.17.1



Re: [PATCH] RISC-V: Fix misaligned stack offset for interrupt function

2023-12-28 Thread Fei Gao
On 2023-12-25 16:45  Kito Cheng  wrote:

>+++ b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
>@@ -0,0 +1,29 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -fno-schedule-insns 
>-fno-schedule-insns2" } */
>+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
>+
>+/*  Make sure no stack offset are misaligned.
>+**  interrupt:
>+**  ...
>+**    sd\tt0,40\(sp\)
>+**    frcsr\tt0
>+**    sw\tt0,32\(sp\)
>+**    sd\tt1,24\(sp\)
>+**    fsd\tft0,8\(sp\)
>+**  ...
>+**    lw\tt0,32\(sp\)
>+**    fscsr\tt0
>+**    ld\tt0,40\(sp\)
>+**    ld\tt1,24\(sp\)
>+**    fld\tft0,8\(sp\)
>+**  ...
>+*/
Hi Kito

The fix is fine but maybe using s0 instead of t0 is better:
1. simpler codes.
2. less stack size

current implementaion:
>+**        sd\tt0,40\(sp\)
>+**        frcsr\tt0
>+**        sw\tt0,32\(sp\)      //save content of frcsr in stack

use s0:
>+**        sd\tt0,40\(sp\)
>+**        frcsr\ts0                //save content of frcsr in s0 instead of 
>stack. If s0 is used as callee saved register, it will be saved again later by 
>legacy codes .

Also adding this change in riscv_expand_prologue & epilogue would be consistent 
with current stack allocation logic.

I can try it if you think necessary. 

BR
Fei
>+
>+
>+void interrupt(void) __attribute__((interrupt));
>+void interrupt(void)
>+{
>+  asm volatile ("# clobber!":::"t0", "t1", "ft0");
>+}
>+
>+/* { dg-final { check-function-bodies "**" "" } } */
>--
>2.40.1

Re: Re: [PATCH 5/5] [ifcvt] optimize extension for x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-14 Thread Fei Gao
On 2023-12-11 13:46  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
>> to support SImode in 64-bit machine.
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>> * ifcvt.cc (noce_cond_zero_binary_op_supported): add support for extension
>>  (noce_bbs_ok_for_cond_zero_arith): likewise
>>  (noce_try_cond_zero_arith): support extension of LSHIFTRT case
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for extension
>So I think this needs to defer to gcc-15.  But even so I think getting
>some review on the effort is useful.
>
>
>> ---
>>   gcc/ifcvt.cc  |  16 ++-
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 126 +-
>>   2 files changed, 139 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index b84be53ec5c..306497a8e37 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -2934,6 +2934,10 @@ noce_cond_zero_binary_op_supported (rtx op)
>>   {
>> enum rtx_code opcode = GET_CODE (op);
>>  
>> +  /* Strip SIGN_EXTEND or ZERO_EXTEND if any.  */
>> +  if (opcode == SIGN_EXTEND || opcode == ZERO_EXTEND)
>> +    opcode = GET_CODE (XEXP (op, 0));
>So it seems to me like that you need to record what the extension was so
>that you can re-apply it to the result.
>
>> @@ -3114,7 +3122,11 @@ noce_try_cond_zero_arith (struct noce_if_info 
>> *if_info)
>> if (CONST_INT_P (*to_replace))
>>   {
>>     if (noce_cond_zero_shift_op_supported (bin_code))
>> -    *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +    {
>> +  *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +  if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
>> +PUT_CODE (a, SIGN_EXTEND);
>> +    }
>This doesn't look correct (ignoring the SUBREG issues with patch #4 in
>this series). 
Agree there's  issue here for const_int case as you mentioned in 
[PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y.

>
>When we looked at this internally the conclusion was we needed to first
>strip the extension, recording what kind of extension it was, then
>reapply the same extension to the result of the now conditional
>operation.  And it's independent of SUBREG handling. 
Ignoring the const_int case, we can reuse the RTL pattern and replace
the z(SUBREG pr REG) in INSN_A(x=y op z) without recording what kind
of extension it was. 

New patch will be sent to gcc15.

BR, 
Fei

>
>
>Jeff

Re: Re: [PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like insns

2023-12-14 Thread Fei Gao
On 2023-12-11 13:38  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT, AND]
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_cond_zero_shift_op_supported): check if OP is 
>>shift like operation
>>  (noce_cond_zero_binary_op_supported): restructure & call 
>>noce_cond_zero_shift_op_supported
>>  (noce_bbs_ok_for_cond_zero_arith): add support for const_int
>>  (noce_try_cond_zero_arith): add support for x=c ? (y op const_int)
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for x=c ? (y op 
>>const_int) : y


>> @@ -3089,7 +3111,18 @@ noce_try_cond_zero_arith (struct noce_if_info 
>> *if_info)
>>     return false;
>>   }
>>  
>> -  *to_replace = target;
>> +  if (CONST_INT_P (*to_replace))
>> +{
>> +  if (noce_cond_zero_shift_op_supported (bin_code))
>> +    *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
>> +  else if (SUBREG_P (bin_op0))
>> +    *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
>> +  else
>> +    *to_replace = target;
>Not all targets use QImode for their shift counts, so you can't just
>force that argument to QImode. 
Thanks for your info. I haven't understood the "complex" you mentioned
regarding subreg until now.

>
>The way this works in our internal tree is that we re-expand the binary
>operation rather than replacing bits of existing RTL.  That allows the
>expanders to do the right thing automatically for the target WRT
>handling of things like the mode of the shift count.  In fact, I don't
>see how you can ever do replacement of a constant with a register with
>the current scheme since the original constant will be modeless, so you
>never know what mode to use. 
Letting the expander to handle const_int case seems a target general solution.

BR, 
Fei

>
>
>
>Jeff

Re: Re: [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

2023-12-10 Thread Fei Gao
On 2023-12-11 04:43  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]
>>
>> Conditional op, if zero
>> rd = (rc == 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> op rd, rs1, rd
>>
>> Conditional op, if non-zero
>> rd = (rc != 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> op rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>> * ifcvt.cc (noce_cond_zero_binary_op_supported): add support for shift like 
>> op.
>>  (get_base_reg): add support for subreg to handle shift amount 
>>operand.
>>  (noce_bbs_ok_for_cond_zero_arith): to replace shift amount operand.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for shift like op.
>So I removed the SUBREG handling code which makes this patch merely an
>addition of the shift/rotate ops which trivally work just like PLUS,
>MINUS, IOR, XOR (by conditionally zero-ing the shift count) tested on
>x86 and pushed it to the trunk.
>
>As I noted before while I think handling SUBREGs is important, now is
>not the time to be adding that support. 

Thanks for your review.
Got your point to defer support for SUBREGs.

Shift-like pattern:
(set (reg/v:DI 137 [ y ])
        (ashift:DI (reg/v:DI 137 [ y ])
            (subreg:QI (reg/v:DI 138 [ z ]) 0)))

No Zicond instructions are generated with the SUBREG handling code removed.
So I noticed your changes in testcases regarding the number of czero 
instruction number scanned.
Then this looks like a NFC patch.

BR, 
Fei


>
>Thanks!
>
>jeff

Re: Re: [PATCH 2/4] [ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao


On 2023-11-29 19:09  Fei Gao  wrote:
>
>On 2023-11-29 13:26  Jeff Law  wrote:
>>
>>
>>
>>On 11/27/23 19:32, Fei Gao wrote:
>>> op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]
>>>
>>> SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
>>> to support SImode in 64-bit machine.
>>Let's defer these for now.  We're supposed to be wrapping up work that
>>was posted before stage1 closed.  If these opcodes were trivial to
>>support, then I would let them through, but SUBREGs for example can be
>>problematical as their semantics can be complex.
>
>I can delete the  32bit-support in 64 bit machine.
>Or split this patch into 2 parts word-size support and 32bit-support in 64 bit 
>machine.
>Or keep current status if the following words persuades you :)
>
>Regarding complexity, the current patch differs from 1st version by 
>"to_replace" and reduces
>complexity significantly. noce_simple_bbs() ensures we have only single set 
>for insn_a like x=sext(subreg(y) op subreg(z)).
>Instead of constructing an insn considering complex extension and subreg from 
>scratch, to_replace locates the rtx z
>in noce_bbs_ok_for_cond_zero_arith,
>and then replace it with czero result. In this way, extension and subreg are 
>as simple as reg cases.
>All the cases for extension and subreg have been uploaded in this patch.
>They can be found by searching "int" in gcc.target/riscv/zicond_ifcvt_opt.c
>
>Could you please let me known which you prefer? 
I split the patches. See new series 
https://patchwork.sourceware.org/project/gcc/list/?series=27924
>
>>
>>
>>>
>>> Conditional op, if zero
>>> rd = (rc == 0) ? (rs1 op rs2) : rs1
>>> -->
>>> czero.nez rd, rs2, rc
>>> op rd, rs1, rd
>>>
>>> Conditional op, if non-zero
>>> rd = (rc != 0) ? (rs1 op rs2) : rs1
>>> -->
>>> czero.eqz rd, rs2, rc
>>> op rd, rs1, rd
>>>
>>> Co-authored-by: Xiao Zeng
>>>
>>> gcc/ChangeLog:
>>>
>>>  * ifcvt.cc (noce_try_cond_zero_arith):handler for condtional zero 
>>>based ifcvt
>>>  (noce_emit_czero): helper for noce_try_cond_zero_arith
>>>  (noce_cond_zero_binary_op_supported): check supported OPs for 
>>>condtional zero based ifcvt
>>>  (get_base_reg): get base reg of a subreg or the reg itself
>>>  (noce_bbs_ok_for_cond_zero_arith): check if BBs are OK for 
>>>condtional zero based ifcvt
>>>  (noce_process_if_block): add noce_try_cond_zero_arith
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>>> ---
>>>   gcc/ifcvt.cc  | 210 ++
>>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 682 ++
>>>   2 files changed, 892 insertions(+)
>>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>>
>>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>>> index a0af553b9ff..8f6a0e7f5fe 100644
>>> --- a/gcc/ifcvt.cc
>>> +++ b/gcc/ifcvt.cc
>>> @@ -787,6 +787,7 @@ static rtx noce_get_alt_condition (struct noce_if_info 
>>> *, rtx, rtx_insn **);
>>>   static bool noce_try_minmax (struct noce_if_info *);
>>>   static bool noce_try_abs (struct noce_if_info *);
>>>   static bool noce_try_sign_mask (struct noce_if_info *);
>>> +static int noce_try_cond_zero_arith (struct noce_if_info *);
>>>  
>>>   /* Return the comparison code for reversed condition for IF_INFO,
>>>  or UNKNOWN if reversing the condition is not possible.  */
>>> @@ -1831,6 +1832,40 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx 
>>> x, enum rtx_code code,
>>>   return NULL_RTX;
>>>   }
>>>  
>>> +/*  Emit a conditional zero, returning TARGET or NULL_RTX upon failure.
>>> +    IF_INFO describes the if-conversion scenario under consideration.
>>> +    CZERO_CODE selects the condition (EQ/NE).
>>> +    NON_ZERO_OP is the nonzero operand of the conditional move
>>> +    TARGET is the desired output register.  */
>>> +
>>> +static rtx
>>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code,
>>> +   rtx non_zero_op, rtx target)
>>[ ... ]
>>The code you wrote is safe in that if constructs a suitable if-then-else
>>as a single object, starts a new sequence the uses emit_insn to put that
>>object onto a sequence.  Then you extr

[PATCH 5/5] [ifcvt] optimize extension for x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao
SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
to support SImode in 64-bit machine.

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for 
extension
(noce_bbs_ok_for_cond_zero_arith): likewise
(noce_try_cond_zero_arith): support extension of LSHIFTRT case

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for extension
---
 gcc/ifcvt.cc  |  16 ++-
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 126 +-
 2 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index b84be53ec5c..306497a8e37 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2934,6 +2934,10 @@ noce_cond_zero_binary_op_supported (rtx op)
 {
   enum rtx_code opcode = GET_CODE (op);
 
+  /* Strip SIGN_EXTEND or ZERO_EXTEND if any.  */
+  if (opcode == SIGN_EXTEND || opcode == ZERO_EXTEND)
+opcode = GET_CODE (XEXP (op, 0));
+
   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
   || opcode == AND || noce_cond_zero_shift_op_supported (opcode))
 return true;
@@ -3000,7 +3004,11 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
   if (!(noce_cond_zero_binary_op_supported (a) && REG_P (b)))
 return false;
 
-  bin_exp = a;
+  /* Strip sign_extend if any.  */
+  if (GET_CODE (a) == SIGN_EXTEND || GET_CODE (a) == ZERO_EXTEND)
+bin_exp = XEXP (a, 0);
+  else
+bin_exp = a;
 
   /* Canonicalize x = (z op y) : y to x = (y op z) : y */
   op1 = get_base_reg (XEXP (bin_exp, 1));
@@ -3114,7 +3122,11 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   if (CONST_INT_P (*to_replace))
{
  if (noce_cond_zero_shift_op_supported (bin_code))
-   *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+   {
+ *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
+   PUT_CODE (a, SIGN_EXTEND);
+   }
  else if (SUBREG_P (bin_op0))
*to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
  else
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index 85743e1734c..53206d76e9f 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -615,6 +615,69 @@ test_RotateR_eqz (unsigned long x, unsigned long y, 
unsigned long z,
   return x;
 }
 
+int
+test_ADD_ceqz_int (int x, int y, int z, int c)
+{
+  if (c)
+x = y + z;
+  else
+x = y;
+  return x;
+}
+
+int
+test_ShiftLeft_eqz_int (int x, int y, int z, int c)
+{
+  if (c)
+x = y << z;
+  else
+x = y;
+  return x;
+}
+
+int
+test_ShiftR_eqz_int (int x, int y, int z, int c)
+{
+  if (c)
+x = y >> z;
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_ShiftR_logical_eqz_int (unsigned int x, unsigned int y, unsigned int z,
+unsigned int c)
+{
+  if (c)
+x = y >> z;
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_RotateL_eqz_int (unsigned int x, unsigned int y, unsigned int z,
+ unsigned int c)
+{
+  if (c)
+x = (y << z) | (y >> (32 - z));
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_RotateR_eqz_int (unsigned int x, unsigned int y, unsigned int z,
+ unsigned int c)
+{
+  if (c)
+x = (y >> z) | (y << (32 - z));
+  else
+x = y;
+  return x;
+}
+
 long
 test_ADD_ceqz_imm (long x, long y, long c)
 {
@@ -1225,6 +1288,67 @@ test_RotateR_eqz_imm (unsigned long x, unsigned long y, 
unsigned long c)
 x = y;
   return x;
 }
+
+int
+test_ADD_ceqz_imm_int (int x, int y, int c)
+{
+  if (c)
+x = y + 11;
+  else
+x = y;
+  return x;
+}
+
+int
+test_ShiftLeft_eqz_imm_int (int x, int y, int c)
+{
+  if (c)
+x = y << 11;
+  else
+x = y;
+  return x;
+}
+
+int
+test_ShiftR_eqz_imm_int (int x, int y, int c)
+{
+  if (c)
+x = y >> 11;
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_ShiftR_logical_eqz_imm_int (unsigned int x, unsigned int y, unsigned int 
c)
+{
+  if (c)
+x = y >> 11;
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_RotateL_eqz_imm_int (unsigned int x, unsigned int y, unsigned int c)
+{
+  if (c)
+x = (y << 11) | (y >> (32 - 11));
+  else
+x = y;
+  return x;
+}
+
+unsigned int
+test_RotateR_eqz_imm_int (unsigned int x, unsigned int y, unsigned int c)
+{
+  if (c)
+x = (y >> 11) | (y << (32 - 11));
+  else
+x = y;
+  return x;
+}
+
 long
 test_AND_ceqz (long x, long y, long z, long c)
 {
@@ -1544,5 +1668,5 @@ test_AND_eqz_x_2_imm_reverse_bin_oprands (long x, long c)
 x = 11 & x;
   return x;
 }
-/* { dg-final { scan-assembler-times {czero\.eqz} 82 } } */
+/* { dg-final { scan-assembler-times {czero\.eqz} 94 } } */
 /* { dg-final { scan-assembler-times 

[PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao
op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT, AND]

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_shift_op_supported): check if OP is shift 
like operation
(noce_cond_zero_binary_op_supported): restructure & call 
noce_cond_zero_shift_op_supported
(noce_bbs_ok_for_cond_zero_arith): add support for const_int
(noce_try_cond_zero_arith): add support for x=c ? (y op const_int)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for x=c ? (y op 
const_int) : y
---
 gcc/ifcvt.cc  |  45 +-
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 774 +-
 2 files changed, 811 insertions(+), 8 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 29f33f956eb..b84be53ec5c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2910,6 +2910,20 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+/*  Check if OP is shift-like operation supported by conditional zero
+based if conversion, returning TRUE if satisfied otherwise FALSE.
+
+OP is the operation to check.  */
+static bool
+noce_cond_zero_shift_op_supported (enum rtx_code op)
+{
+  if (op == ASHIFT || op == ASHIFTRT || op == LSHIFTRT || op == ROTATE
+  || op == ROTATERT)
+return true;
+
+  return false;
+}
+
 /*  Check if OP is supported by conditional zero based if conversion,
 returning TRUE if satisfied otherwise FALSE.
 
@@ -2921,8 +2935,7 @@ noce_cond_zero_binary_op_supported (rtx op)
   enum rtx_code opcode = GET_CODE (op);
 
   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
-  || opcode == ASHIFT || opcode == ASHIFTRT || opcode == LSHIFTRT
-  || opcode == ROTATE || opcode == ROTATERT || opcode == AND)
+  || opcode == AND || noce_cond_zero_shift_op_supported (opcode))
 return true;
 
   return false;
@@ -3009,7 +3022,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
   if (czero_code == UNKNOWN)
 return false;
 
-  if (REG_P (bin_op1))
+  if (CONST_INT_P (bin_op1) || REG_P (bin_op1))
 *to_replace =  (bin_exp, 1);
   else if (SUBREG_P (bin_op1))
 *to_replace = _REG (XEXP (bin_exp, 1));
@@ -3038,6 +3051,7 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   enum rtx_code czero_code = UNKNOWN;
   rtx bin_exp = NULL_RTX;
   enum rtx_code bin_code = UNKNOWN;
+  rtx bin_op0 = NULL_RTX;
   rtx non_zero_op = NULL_RTX;
   rtx *to_replace = NULL;
 
@@ -3048,6 +3062,7 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   start_sequence ();
 
   bin_code = GET_CODE (bin_exp);
+  bin_op0 = XEXP (bin_exp, 0);
 
   if (bin_code == AND)
 {
@@ -3074,9 +3089,16 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
 }
   else
 {
-  non_zero_op = *to_replace;
+  if (CONST_INT_P (*to_replace))
+   {
+ non_zero_op = gen_reg_rtx (mode);
+ noce_emit_move_insn (non_zero_op, *to_replace);
+   }
+  else
+   non_zero_op = *to_replace;
+
   /* If x is used in both input and out like x = c ? x + z : x,
-use a new reg to avoid modifying x  */
+use a new reg to avoid modifying x  */
   if (common && rtx_equal_p (common, if_info->x))
target = gen_reg_rtx (mode);
   else
@@ -3089,7 +3111,18 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
  return false;
}
 
-  *to_replace = target;
+  if (CONST_INT_P (*to_replace))
+   {
+ if (noce_cond_zero_shift_op_supported (bin_code))
+   *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ else if (SUBREG_P (bin_op0))
+   *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
+ else
+   *to_replace = target;
+   }
+  else
+   *to_replace = target;
+
   noce_emit_move_insn (if_info->x, a);
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index d5310690539..85743e1734c 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -615,6 +615,616 @@ test_RotateR_eqz (unsigned long x, unsigned long y, 
unsigned long z,
   return x;
 }
 
+long
+test_ADD_ceqz_imm (long x, long y, long c)
+{
+  if (c)
+x = y + 11;
+  else
+x = y;
+  return x;
+}
+
+long
+test_ADD_ceqz_x_imm (long x, long c)
+{
+  if (c)
+x = x + 11;
+
+  return x;
+}
+
+long
+test_ADD_nez_imm (long x, long y, long c)
+{
+  if (c)
+x = y;
+  else
+x = y + 11;
+  return x;
+}
+
+long
+test_ADD_nez_x_imm (long x, long c)
+{
+  if (c)
+{
+}
+  else
+x = x + 11;
+  return x;
+}
+
+long
+test_ADD_nez_2_imm (long x, long y, long c)
+{
+  if (!c)
+x = y + 11;
+  else
+x = y;
+  return x;
+}
+
+long
+test_ADD_nez_x_2_imm (long x, long c)
+{
+  if (!c)
+x = x + 11;
+
+  return x;
+}
+
+long
+test_ADD_eqz_2_imm (long x, long 

[PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao
Take the following case for example.

CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2

long
test_AND_ceqz (long x, long y, long z, long c)
{
  if (c)
x = y & z;
  else
x = y;
  return x;
}

Before patch:

and a2,a1,a2
czero.eqz   a0,a2,a3
czero.nez   a3,a1,a3
or  a0,a3,a0
ret

After patch:
and a0,a1,a2
czero.nez   a1,a1,a3
or  a0,a1,a0
ret

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
(noce_bbs_ok_for_cond_zero_arith): Likewise.
(noce_try_cond_zero_arith): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for AND.
---
 gcc/ifcvt.cc  |  69 ++--
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 163 +-
 2 files changed, 211 insertions(+), 21 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 2efae21ebfe..29f33f956eb 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2922,7 +2922,7 @@ noce_cond_zero_binary_op_supported (rtx op)
 
   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
   || opcode == ASHIFT || opcode == ASHIFTRT || opcode == LSHIFTRT
-  || opcode == ROTATE || opcode == ROTATERT)
+  || opcode == ROTATE || opcode == ROTATERT || opcode == AND)
 return true;
 
   return false;
@@ -2954,6 +2954,7 @@ get_base_reg (rtx exp)
 
 static bool
 noce_bbs_ok_for_cond_zero_arith (struct noce_if_info *if_info, rtx *common_ptr,
+rtx *bin_exp_ptr,
 enum rtx_code *czero_code_ptr, rtx *a_ptr,
 rtx **to_replace)
 {
@@ -2998,7 +2999,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 {
   common = b;
   bin_op1 = XEXP (bin_exp, 1);
-  czero_code = reverse
+  czero_code = (reverse ^ (GET_CODE (bin_exp) == AND))
 ? noce_reversed_cond_code (if_info)
 : GET_CODE (cond);
 }
@@ -3016,6 +3017,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 return false;
 
   *common_ptr = common;
+  *bin_exp_ptr = bin_exp;
   *czero_code_ptr = czero_code;
   *a_ptr = a;
 
@@ -3029,38 +3031,67 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 static int
 noce_try_cond_zero_arith (struct noce_if_info *if_info)
 {
-  rtx target, a;
+  rtx target, rtmp, a;
   rtx_insn *seq;
   machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
   enum rtx_code czero_code = UNKNOWN;
+  rtx bin_exp = NULL_RTX;
+  enum rtx_code bin_code = UNKNOWN;
   rtx non_zero_op = NULL_RTX;
   rtx *to_replace = NULL;
 
-  if (!noce_bbs_ok_for_cond_zero_arith (if_info, , _code, ,
-   _replace))
+  if (!noce_bbs_ok_for_cond_zero_arith (if_info, , _exp, 
_code,
+   , _replace))
 return false;
 
-  non_zero_op = *to_replace;
-
   start_sequence ();
 
-  /* If x is used in both input and out like x = c ? x + z : x,
- use a new reg to avoid modifying x  */
-  if (common && rtx_equal_p (common, if_info->x))
-target = gen_reg_rtx (mode);
-  else
-target = if_info->x;
+  bin_code = GET_CODE (bin_exp);
 
-  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
-  if (!target || !to_replace)
+  if (bin_code == AND)
 {
-  end_sequence ();
-  return false;
+  rtmp = gen_reg_rtx (mode);
+  noce_emit_move_insn (rtmp, a);
+
+  target = noce_emit_czero (if_info, czero_code, common, if_info->x);
+  if (!target)
+   {
+ end_sequence ();
+ return false;
+   }
+
+  target = expand_simple_binop (mode, IOR, rtmp, target, if_info->x, 0,
+   OPTAB_WIDEN);
+  if (!target)
+   {
+ end_sequence ();
+ return false;
+   }
+
+  if (target != if_info->x)
+   noce_emit_move_insn (if_info->x, target);
 }
+  else
+{
+  non_zero_op = *to_replace;
+  /* If x is used in both input and out like x = c ? x + z : x,
+use a new reg to avoid modifying x  */
+  if (common && rtx_equal_p (common, if_info->x))
+   target = gen_reg_rtx (mode);
+  else
+   target = if_info->x;
 
-  *to_replace = target;
-  noce_emit_move_insn (if_info->x, a);
+  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
+  if (!target || !to_replace)
+   {
+ end_sequence ();
+ return false;
+   }
+
+  *to_replace = target;
+  noce_emit_move_insn (if_info->x, a);
+}
 
   seq = end_ifcvt_sequence (if_info);
   if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index ab5a4909b61..d5310690539 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

[PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao
op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for shift 
like op.
(get_base_reg): add support for subreg to handle shift amount operand.
(noce_bbs_ok_for_cond_zero_arith): to replace shift amount operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for shift like op.
---
 gcc/ifcvt.cc  |  8 ++-
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 55 ++-
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 1f0f5414ea1..2efae21ebfe 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2920,7 +2920,9 @@ noce_cond_zero_binary_op_supported (rtx op)
 {
   enum rtx_code opcode = GET_CODE (op);
 
-  if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR)
+  if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
+  || opcode == ASHIFT || opcode == ASHIFTRT || opcode == LSHIFTRT
+  || opcode == ROTATE || opcode == ROTATERT)
 return true;
 
   return false;
@@ -2934,6 +2936,8 @@ get_base_reg (rtx exp)
 {
   if (REG_P (exp))
 return exp;
+  else if (SUBREG_P (exp))
+return SUBREG_REG (exp);
 
   return NULL_RTX;
 }
@@ -3006,6 +3010,8 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 
   if (REG_P (bin_op1))
 *to_replace =  (bin_exp, 1);
+  else if (SUBREG_P (bin_op1))
+*to_replace = _REG (XEXP (bin_exp, 1));
   else
 return false;
 
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index dcb21c15d1a..ab5a4909b61 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -562,5 +562,58 @@ test_XOR_eqz_x_2_reverse_bin_oprands (long x, long z, long 
c)
   return x;
 }
 
-/* { dg-final { scan-assembler-times {czero\.eqz} 28 } } */
+long
+test_ShiftLeft_eqz (long x, long y, long z, long c)
+{
+  if (c)
+x = y << z;
+  else
+x = y;
+  return x;
+}
+
+long
+test_ShiftR_eqz (long x, long y, long z, long c)
+{
+  if (c)
+x = y >> z;
+  else
+x = y;
+  return x;
+}
+
+unsigned long
+test_ShiftR_logical_eqz (unsigned long x, unsigned long y, unsigned long z,
+unsigned long c)
+{
+  if (c)
+x = y >> z;
+  else
+x = y;
+  return x;
+}
+
+unsigned long
+test_RotateL_eqz (unsigned long x, unsigned long y, unsigned long z,
+ unsigned long c)
+{
+  if (c)
+x = (y << z) | (y >> (64 - z));
+  else
+x = y;
+  return x;
+}
+
+unsigned long
+test_RotateR_eqz (unsigned long x, unsigned long y, unsigned long z,
+ unsigned long c)
+{
+  if (c)
+x = (y >> z) | (y << (64 - z));
+  else
+x = y;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times {czero\.eqz} 33 } } */
 /* { dg-final { scan-assembler-times {czero\.nez} 28 } } */
-- 
2.17.1



[PATCH 1/5][V3][ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-05 Thread Fei Gao
op=[PLUS, MINUS, IOR, XOR]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_try_cond_zero_arith):handler for condtional zero based 
ifcvt
(noce_emit_czero): helper for noce_try_cond_zero_arith
(noce_cond_zero_binary_op_supported): check supported OPs for 
condtional zero based ifcvt
(get_base_reg): get the reg itself or NULL_RTX if not a reg
(noce_bbs_ok_for_cond_zero_arith): check if BBs are OK for condtional 
zero based ifcvt
(noce_process_if_block): add noce_try_cond_zero_arith

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
 gcc/ifcvt.cc  | 187 ++
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 566 ++
 2 files changed, 753 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..1f0f5414ea1 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -787,6 +787,7 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
 static bool noce_try_minmax (struct noce_if_info *);
 static bool noce_try_abs (struct noce_if_info *);
 static bool noce_try_sign_mask (struct noce_if_info *);
+static int noce_try_cond_zero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1831,6 +1832,35 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
 return NULL_RTX;
 }
 
+/*  Emit a conditional zero, returning TARGET or NULL_RTX upon failure.
+IF_INFO describes the if-conversion scenario under consideration.
+CZERO_CODE selects the condition (EQ/NE).
+NON_ZERO_OP is the nonzero operand of the conditional move
+TARGET is the desired output register.  */
+
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code,
+rtx non_zero_op, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+= gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else
+= gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, non_zero_op);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  rtx_insn *insn = make_insn_raw (set);
+
+  if (recog_memoized (insn) >= 0)
+{
+  add_insn (insn);
+  return target;
+}
+
+  return NULL_RTX;
+}
+
 /* Try only simple constants and registers here.  More complex cases
are handled in noce_try_cmove_arith after noce_try_store_flag_arith
has had a go at it.  */
@@ -2880,6 +2910,160 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+/*  Check if OP is supported by conditional zero based if conversion,
+returning TRUE if satisfied otherwise FALSE.
+
+OP is the operation to check.  */
+
+static bool
+noce_cond_zero_binary_op_supported (rtx op)
+{
+  enum rtx_code opcode = GET_CODE (op);
+
+  if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR)
+return true;
+
+  return false;
+}
+
+/*  Helper function to return REG itself,
+otherwise NULL_RTX for other RTX_CODE.  */
+
+static rtx
+get_base_reg (rtx exp)
+{
+  if (REG_P (exp))
+return exp;
+
+  return NULL_RTX;
+}
+
+/*  Check if IF-BB and THEN-BB satisfy the condition for conditional zero
+based if conversion, returning TRUE if satisfied otherwise FALSE.
+
+IF_INFO describes the if-conversion scenario under consideration.
+COMMON_PTR points to the common REG of canonicalized IF_INFO->A and
+IF_INFO->B.
+CZERO_CODE_PTR points to the comparison code to use in czero RTX.
+A_PTR points to the A expression of canonicalized IF_INFO->A.
+TO_REPLACE points to the RTX to be replaced by czero RTX destnation.  */
+
+static bool
+noce_bbs_ok_for_cond_zero_arith (struct noce_if_info *if_info, rtx *common_ptr,
+enum rtx_code *czero_code_ptr, rtx *a_ptr,
+rtx **to_replace)
+{
+  rtx common = NULL_RTX;
+  rtx cond = if_info->cond;
+  rtx a = copy_rtx (if_info->a);
+  rtx b = copy_rtx (if_info->b);
+  rtx bin_op1 = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  bool reverse = false;
+  rtx op0, op1, bin_exp;
+
+  if (!noce_simple_bbs (if_info))
+return false;
+
+  /* COND must be EQ or NE comparision of a reg and 0.  */
+  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
+return false;
+  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
+return false;
+
+  /* Canonicalize x = y : (y op z) to x = (y op z) : y.  */
+  if (REG_P (a) && noce_cond_zero_binary_op_supported (b))
+{
+  std::swap (a, b);
+  

Re: Re: [PATCH v2] RISC-V: Update crypto vector ISA info with latest spec

2023-12-03 Thread Fei Gao
Committed! Thanks Kito.

BR, 
Fei

On 2023-12-04 15:01  Kito Cheng  wrote:
>
>LGTM again :)
>
>On Mon, Dec 4, 2023 at 2:44 PM Feng Wang  wrote:
>>
>> Rebase and resend this patch due to it was not added into patchwork
>> before. Kito had already reviewed it. Please refer to
>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327499.html
>>
>> This patch add the Zvkb subset of crypto vector extension. The
>> corresponding test cases have aslo been modified.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: Add zvkb ISA info.
>> * config/riscv/riscv.opt: Add Mask(ZVKB)
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zvkn-1.c: Replace zvbb with zvkb.
>> * gcc.target/riscv/zvkn.c:   Ditto.
>> * gcc.target/riscv/zvknc-1.c:Ditto.
>> * gcc.target/riscv/zvknc-2.c:Ditto.
>> * gcc.target/riscv/zvknc.c:  Ditto.
>> * gcc.target/riscv/zvkng-1.c:Ditto.
>> * gcc.target/riscv/zvkng-2.c:Ditto.
>> * gcc.target/riscv/zvkng.c:  Ditto.
>> * gcc.target/riscv/zvks-1.c: Ditto.
>> * gcc.target/riscv/zvks.c:   Ditto.
>> * gcc.target/riscv/zvksc-1.c:Ditto.
>> * gcc.target/riscv/zvksc-2.c:Ditto.
>> * gcc.target/riscv/zvksc.c:  Ditto.
>> * gcc.target/riscv/zvksg-1.c:Ditto.
>> * gcc.target/riscv/zvksg-2.c:Ditto.
>> * gcc.target/riscv/zvksg.c:  Ditto.
>> ---
>>  gcc/common/config/riscv/riscv-common.cc  | 6 --
>>  gcc/config/riscv/riscv.opt   | 2 ++
>>  gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 8 
>>  gcc/testsuite/gcc.target/riscv/zvkn.c    | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvknc-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvknc-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvknc.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvkng-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvkng-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvkng.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvks-1.c  | 8 
>>  gcc/testsuite/gcc.target/riscv/zvks.c    | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksc-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvksc-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksc.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksg-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvksg-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksg.c   | 4 ++--
>>  18 files changed, 50 insertions(+), 46 deletions(-)
>>
>> diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> b/gcc/common/config/riscv/riscv-common.cc
>> index ded85b4c578..6c210412515 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/common/config/riscv/riscv-common.cc
>> @@ -106,7 +106,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>>
>>    {"zvkn", "zvkned"},
>>    {"zvkn", "zvknhb"},
>> -  {"zvkn", "zvbb"},
>> +  {"zvkn", "zvkb"},
>>    {"zvkn", "zvkt"},
>>    {"zvknc", "zvkn"},
>>    {"zvknc", "zvbc"},
>> @@ -114,7 +114,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>>    {"zvkng", "zvkg"},
>>    {"zvks", "zvksed"},
>>    {"zvks", "zvksh"},
>> -  {"zvks", "zvbb"},
>> +  {"zvks", "zvkb"},
>>    {"zvks", "zvkt"},
>>    {"zvksc", "zvks"},
>>    {"zvksc", "zvbc"},
>> @@ -253,6 +253,7 @@ static const struct riscv_ext_version 
>> riscv_ext_version_table[] =
>>
>>    {"zvbb", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvbc", ISA_SPEC_CLASS_NONE, 1, 0},
>> +  {"zvkb", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvkg", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvkned", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvknha", ISA_SPEC_CLASS_NONE, 1, 0},
>> @@ -1624,6 +1625,7 @@ static const riscv_ext_flag_table_t 
>> riscv_ext_flag_table[] =
>>
>>    {"zvbb", _options::x_riscv_zvb_subext, MASK_ZVBB},
>>    {"zvbc", _options::x_riscv_zvb_subext, MASK_ZVBC},
>> +  {"zvkb", _options::x_riscv_zvb_subext, MASK_ZVKB},
>>    {"zvkg", _options::x_riscv_zvk_subext, MASK_ZVKG},
>>    {"zvkned",   _options::x_riscv_zvk_subext, MASK_ZVKNED},
>>    {"zvknha",   _options::x_riscv_zvk_subext, MASK_ZVKNHA},
>> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> index 0c6517bdc8b..78186fff6c5 100644
>> --- a/gcc/config/riscv/riscv.opt
>> +++ b/gcc/config/riscv/riscv.opt
>> @@ -319,6 +319,8 @@ Mask(ZVBB) Var(riscv_zvb_subext)
>>
>>  Mask(ZVBC) Var(riscv_zvb_subext)
>>
>> +Mask(ZVKB) Var(riscv_zvb_subext)
>> +
>>  TargetVariable
>>  int riscv_zvk_subext
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/zvkn-1.c 
>> b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> index 23b255b4779..069a8f66c92 100644
>> --- a/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> +++ b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> @@ -1,6 +1,6 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvbb_zvkt" { target { rv64 } } 
>> } */
>> -/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvbb_zvkt" { target { rv32 } } 
>> } */
>> +/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvkb_zvkt" { target { rv64 } } 
>> 

Re: Re: [PATCH 1/4] [RISC-V] prefer Zicond primitive semantics to SFB

2023-12-03 Thread Fei Gao
Committed.  Thanks Kito and Jeff.

BR
Fei

On 2023-11-28 13:03  Jeff Law  wrote:
>
>
>
>On 11/27/23 20:09, Kito Cheng wrote:
>> Personally I don't like to play with the pattern order to tweak the
>> code gen since it kinda introduces implicit relation/rule here, but I
>> guess the only way to prevent that is to duplicate the pattern for SFB
>> again, which is not an ideal solution...
>I won't object to this patch, but I don't really like it either.
>
>This patch highlights that the SFB code is not well integrated with the
>rest of the conditional move support.
>
>Jeff

Re: Re: [PATCH 2/4] [ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns

2023-11-29 Thread Fei Gao
On 2023-11-29 13:26  Jeff Law  wrote:
>
>
>
>On 11/27/23 19:32, Fei Gao wrote:
>> op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]
>>
>> SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
>> to support SImode in 64-bit machine.
>Let's defer these for now.  We're supposed to be wrapping up work that
>was posted before stage1 closed.  If these opcodes were trivial to
>support, then I would let them through, but SUBREGs for example can be
>problematical as their semantics can be complex. 

I can delete the  32bit-support in 64 bit machine.
Or split this patch into 2 parts word-size support and 32bit-support in 64 bit 
machine.
Or keep current status if the following words persuades you :)

Regarding complexity, the current patch differs from 1st version by 
"to_replace" and reduces
complexity significantly. noce_simple_bbs() ensures we have only single set for 
insn_a like x=sext(subreg(y) op subreg(z)).
Instead of constructing an insn considering complex extension and subreg from 
scratch, to_replace locates the rtx z
in noce_bbs_ok_for_cond_zero_arith,
and then replace it with czero result. In this way, extension and subreg are as 
simple as reg cases.
All the cases for extension and subreg have been uploaded in this patch.
They can be found by searching "int" in gcc.target/riscv/zicond_ifcvt_opt.c

Could you please let me known which you prefer?

>
>
>>
>> Conditional op, if zero
>> rd = (rc == 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> op rd, rs1, rd
>>
>> Conditional op, if non-zero
>> rd = (rc != 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> op rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_try_cond_zero_arith):handler for condtional zero 
>>based ifcvt
>>  (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_cond_zero_binary_op_supported): check supported OPs for 
>>condtional zero based ifcvt
>>  (get_base_reg): get base reg of a subreg or the reg itself
>>  (noce_bbs_ok_for_cond_zero_arith): check if BBs are OK for 
>>condtional zero based ifcvt
>>  (noce_process_if_block): add noce_try_cond_zero_arith
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc  | 210 ++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 682 ++
>>   2 files changed, 892 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..8f6a0e7f5fe 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -787,6 +787,7 @@ static rtx noce_get_alt_condition (struct noce_if_info 
>> *, rtx, rtx_insn **);
>>   static bool noce_try_minmax (struct noce_if_info *);
>>   static bool noce_try_abs (struct noce_if_info *);
>>   static bool noce_try_sign_mask (struct noce_if_info *);
>> +static int noce_try_cond_zero_arith (struct noce_if_info *);
>>  
>>   /* Return the comparison code for reversed condition for IF_INFO,
>>  or UNKNOWN if reversing the condition is not possible.  */
>> @@ -1831,6 +1832,40 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
>> enum rtx_code code,
>>   return NULL_RTX;
>>   }
>>  
>> +/*  Emit a conditional zero, returning TARGET or NULL_RTX upon failure.
>> +    IF_INFO describes the if-conversion scenario under consideration.
>> +    CZERO_CODE selects the condition (EQ/NE).
>> +    NON_ZERO_OP is the nonzero operand of the conditional move
>> +    TARGET is the desired output register.  */
>> +
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code,
>> +rtx non_zero_op, rtx target)
>[ ... ]
>The code you wrote is safe in that if constructs a suitable if-then-else
>as a single object, starts a new sequence the uses emit_insn to put that
>object onto a sequence.  Then you extract that one and only one insn
>from the sequence and validate it can be recognized.
>
>In cases where you want to do something like this and know you're going
>to emit one and only one insn you can use 'make_insn_raw' without
>generating a new sequence.
>
>I would suggest you replace all the code starting with start_sequence()
>and ending with end_sequence () (inclusive) with something like
>
>insn = make_insn_raw (set);
>if (recog_memoized (insn) >= 0)
>   {
> emit_insn (insn);
> return target;
> 

Re: [PATCH] [ifcvt][V2] optimize x=c ? (y and z) : y, where z is a reg or imm

2023-11-28 Thread Fei Gao
hi  Jeff and Kito

Please be noted this patch is based on
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327149.html
[PATCH 3/4] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like 
insns

Thanks & BR,
Fei

On 2023-11-28 18:10  Fei Gao  wrote:
>
>Take the following case for example.
>
>CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2
>
>long
>test_AND_ceqz (long x, long y, long z, long c)
>{
>  if (c)
>    x = y & z;
>  else
>    x = y;
>  return x;
>}
>
>Before patch:
>
>   and a2,a1,a2
>   czero.eqz   a0,a2,a3
>   czero.nez   a3,a1,a3
>   or  a0,a3,a0
>   ret
>
>After patch:
>   and a0,a1,a2
>   czero.nez   a1,a1,a3
>   or  a0,a1,a0
>   ret
>
>Co-authored-by: Xiao Zeng
>
>gcc/ChangeLog:
>
>    * ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
>    (noce_bbs_ok_for_cond_zero_arith): Likewise.
>    (noce_try_cond_zero_arith): Likewise.
>
>gcc/testsuite/ChangeLog:
>
>    * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for AND.
>---
> gcc/ifcvt.cc  |  86 +++--
> .../gcc.target/riscv/zicond_ifcvt_opt.c   | 323 +-
> 2 files changed, 377 insertions(+), 32 deletions(-)
>
>diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>index 4cc6a125ff0..a1af762b5aa 100644
>--- a/gcc/ifcvt.cc
>+++ b/gcc/ifcvt.cc
>@@ -2940,7 +2940,7 @@ noce_cond_zero_binary_op_supported (rtx op)
> opcode = GET_CODE (XEXP (op, 0));
>
>   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
>-  || noce_cond_zero_shift_op_supported (opcode))
>+  || opcode == AND || noce_cond_zero_shift_op_supported (opcode))
> return true;
>
>   return false;
>@@ -3021,7 +3021,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
>*if_info, rtx *common_ptr,
> {
>   common = b;
>   bin_op1 = XEXP (bin_exp, 1);
>-  czero_code = reverse
>+  czero_code = (reverse ^ (GET_CODE (bin_exp) == AND))
>  ? noce_reversed_cond_code (if_info)
>  : GET_CODE (cond);
> }
>@@ -3053,7 +3053,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
>*if_info, rtx *common_ptr,
> static int
> noce_try_cond_zero_arith (struct noce_if_info *if_info)
> {
>-  rtx target, a;
>+  rtx target, rtmp, a;
>   rtx_insn *seq;
>   machine_mode mode = GET_MODE (if_info->x);
>   rtx common = NULL_RTX;
>@@ -3073,44 +3073,70 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
>   bin_code = GET_CODE (bin_exp);
>   bin_op0 = XEXP (bin_exp, 0);
>
>-  if (CONST_INT_P (*to_replace))
>+  if (bin_code == AND)
> {
>-  non_zero_op = gen_reg_rtx (mode);
>-  noce_emit_move_insn (non_zero_op, *to_replace);
>+  rtmp = gen_reg_rtx (mode);
>+  emit_insn (gen_rtx_SET (rtmp, a));
>+
>+  target = noce_emit_czero (if_info, czero_code, common, if_info->x);
>+  if (!target)
>+  {
>+    end_sequence ();
>+    return false;
>+  }
>+
>+  target = expand_simple_binop (mode, IOR, rtmp, target, if_info->x, 0,
>+      OPTAB_WIDEN);
>+  if (!target)
>+  {
>+    end_sequence ();
>+    return false;
>+  }
>+
>+  if (target != if_info->x)
>+  noce_emit_move_insn (if_info->x, target);
> }
>   else
>-    non_zero_op = *to_replace;
>+    {
>+  if (CONST_INT_P (*to_replace))
>+  {
>+    non_zero_op = gen_reg_rtx (mode);
>+    noce_emit_move_insn (non_zero_op, *to_replace);
>+  }
>+  else
>+  non_zero_op = *to_replace;
>
>-  /* If x is used in both input and out like x = c ? x + z : x,
>- use a new reg to avoid modifying x  */
>-  if (common && rtx_equal_p (common, if_info->x))
>-    target = gen_reg_rtx (mode);
>-  else
>-    target = if_info->x;
>+  /* If x is used in both input and out like x = c ? x + z : x,
>+  use a new reg to avoid modifying x  */
>+  if (common && rtx_equal_p (common, if_info->x))
>+  target = gen_reg_rtx (mode);
>+  else
>+  target = if_info->x;
>
>-  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
>-  if (!target || !to_replace)
>-    {
>-  end_sequence ();
>-  return false;
>-    }
>+  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
>+  if (!target || !to_replace)
>+  {
>+    end_sequence ();
>+    return false;
>+  }
>
>-  if (CONST_INT_P (*to_replace))
>-    {
>-  if (noce_cond_zero_shift_op_supported (bin_code))
>+  if (CONST_INT_P (*to_replace))
> {
>- 

[PATCH] [ifcvt][V2] optimize x=c ? (y and z) : y, where z is a reg or imm

2023-11-28 Thread Fei Gao
Take the following case for example.

CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2

long
test_AND_ceqz (long x, long y, long z, long c)
{
  if (c)
x = y & z;
  else
x = y;
  return x;
}

Before patch:

and a2,a1,a2
czero.eqz   a0,a2,a3
czero.nez   a3,a1,a3
or  a0,a3,a0
ret

After patch:
and a0,a1,a2
czero.nez   a1,a1,a3
or  a0,a1,a0
ret

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
(noce_bbs_ok_for_cond_zero_arith): Likewise.
(noce_try_cond_zero_arith): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for AND.
---
 gcc/ifcvt.cc  |  86 +++--
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 323 +-
 2 files changed, 377 insertions(+), 32 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 4cc6a125ff0..a1af762b5aa 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2940,7 +2940,7 @@ noce_cond_zero_binary_op_supported (rtx op)
 opcode = GET_CODE (XEXP (op, 0));
 
   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
-  || noce_cond_zero_shift_op_supported (opcode))
+  || opcode == AND || noce_cond_zero_shift_op_supported (opcode))
 return true;
 
   return false;
@@ -3021,7 +3021,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 {
   common = b;
   bin_op1 = XEXP (bin_exp, 1);
-  czero_code = reverse
+  czero_code = (reverse ^ (GET_CODE (bin_exp) == AND))
 ? noce_reversed_cond_code (if_info)
 : GET_CODE (cond);
 }
@@ -3053,7 +3053,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 static int
 noce_try_cond_zero_arith (struct noce_if_info *if_info)
 {
-  rtx target, a;
+  rtx target, rtmp, a;
   rtx_insn *seq;
   machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
@@ -3073,44 +3073,70 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   bin_code = GET_CODE (bin_exp);
   bin_op0 = XEXP (bin_exp, 0);
 
-  if (CONST_INT_P (*to_replace))
+  if (bin_code == AND)
 {
-  non_zero_op = gen_reg_rtx (mode);
-  noce_emit_move_insn (non_zero_op, *to_replace);
+  rtmp = gen_reg_rtx (mode);
+  emit_insn (gen_rtx_SET (rtmp, a));
+
+  target = noce_emit_czero (if_info, czero_code, common, if_info->x);
+  if (!target)
+   {
+ end_sequence ();
+ return false;
+   }
+
+  target = expand_simple_binop (mode, IOR, rtmp, target, if_info->x, 0,
+   OPTAB_WIDEN);
+  if (!target)
+   {
+ end_sequence ();
+ return false;
+   }
+
+  if (target != if_info->x)
+   noce_emit_move_insn (if_info->x, target);
 }
   else
-non_zero_op = *to_replace;
+{
+  if (CONST_INT_P (*to_replace))
+   {
+ non_zero_op = gen_reg_rtx (mode);
+ noce_emit_move_insn (non_zero_op, *to_replace);
+   }
+  else
+   non_zero_op = *to_replace;
 
-  /* If x is used in both input and out like x = c ? x + z : x,
- use a new reg to avoid modifying x  */
-  if (common && rtx_equal_p (common, if_info->x))
-target = gen_reg_rtx (mode);
-  else
-target = if_info->x;
+  /* If x is used in both input and out like x = c ? x + z : x,
+use a new reg to avoid modifying x  */
+  if (common && rtx_equal_p (common, if_info->x))
+   target = gen_reg_rtx (mode);
+  else
+   target = if_info->x;
 
-  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
-  if (!target || !to_replace)
-{
-  end_sequence ();
-  return false;
-}
+  target = noce_emit_czero (if_info, czero_code, non_zero_op, target);
+  if (!target || !to_replace)
+   {
+ end_sequence ();
+ return false;
+   }
 
-  if (CONST_INT_P (*to_replace))
-{
-  if (noce_cond_zero_shift_op_supported (bin_code))
+  if (CONST_INT_P (*to_replace))
{
- *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
- if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
-   PUT_CODE (a, SIGN_EXTEND);
+ if (noce_cond_zero_shift_op_supported (bin_code))
+   {
+ *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
+   PUT_CODE (a, SIGN_EXTEND);
+   }
+ else if (SUBREG_P (bin_op0))
+   *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
+ else
+   *to_replace = target;
}
-  else if (SUBREG_P (bin_op0))
-   *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
   else
*to_replace = target;
+  emit_insn (gen_rtx_SET (if_info->x, a));
 }
-  else
-

Re: Re: [PATCH 4/4] [ifcvt] if convert x=c ? y : y by RISC-V Zicond like insns

2023-11-27 Thread Fei Gao
On 2023-11-20 15:10  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index 6e341fc4d4b..cfa9bc4b850 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -2911,7 +2911,7 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>>   static bool
>>   noce_cond_zero_binary_op_supported (enum rtx_code op)
>>   {
>> -  if (op == PLUS || op == MINUS || op == IOR || op == XOR)
>> +  if (op == PLUS || op == MINUS || op == IOR || op == XOR || op == AND)
>>   return true;
>Include ASHIFT, LSHIFTRT, ASHIFTRT, ROTATE, ROTATERT.  That should pick
>up that critical conditional-shift-by-6 in leela. 

Done.

>
>
>
>
>> +  if (opcode == AND)
>> +    {
>> +  tmp
>> += expand_simple_binop (mode, AND, common, z, NULL_RTX, 0, OPTAB_DIRECT);
>OPTAB_WIDEN here I think. 
I restructured the codes to have a simple implementation. But AND is different 
from 
others operations in czero based ifcvt, I kept the ugly codes locally. I will 
refine it and post
if current codes accepted.

>
>
>> +  if (!tmp)
>> +{
>> +  end_sequence ();
>> +  return FALSE;
>> +}
>>  
>> -  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
>> -  if (common && rtx_equal_p (common, if_info->x))
>> -    target = gen_reg_rtx (mode);
>> -  else
>> -    target = if_info->x;
>> +  target = noce_emit_czero (if_info, czero_code, common, if_info->x);
>> +  if (!target)
>> +{
>> +  end_sequence ();
>> +  return FALSE;
>Please try to be consistent with upper/lower case.  In your prior
>patches you used lower case for true/false.  In this patch you're using
>upper case.  Lower case seems to be the standard in that file, so use
>lower case. 
>
>> +}
>>  
>> -  target = noce_emit_czero (if_info, czero_code, z, target);
>> -  if (!target)
>> -    {
>> -  end_sequence ();
>> -  return false;
>> +  target = expand_simple_binop (mode, IOR, tmp, target, if_info->x, 0,
>> +    OPTAB_DIRECT);
>>   }
>> +  else
>> +    {
>> +  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  
>> */
>> +  if (common && rtx_equal_p (common, if_info->x))
>> +target = gen_reg_rtx (mode);
>> +  else
>> +target = if_info->x;
>As noted before you may not be able to generate a new register when
>ifcvt is run after register allocation.  Your code needs to handle that
>correctly.
>
>
>> +
>> +  target = noce_emit_czero (if_info, czero_code, z, target);
>> +  if (!target)
>> +{
>> +  end_sequence ();
>> +  return false;
>> +}
>>  
>> -  target = expand_simple_binop (mode, opcode, common, target, if_info->x, 0,
>> -OPTAB_DIRECT);
>> +  target = expand_simple_binop (mode, opcode, common, target, 
>> if_info->x, 0,
>> +    OPTAB_DIRECT);
>OPTAB_WIDEN.
>
>And the usual comments about avoiding explicit registers in the tests.
>
>
>I would suggest you try to handle this case as well, I don't think it's
>handled by your current code:
>
>long
>eq2 (long a, long b)
>{
>   if (a == 0)
> return b;
>
>   return 0;
>} 
I tried both in old and new series. Zicond insns could be generated.

BR, 
Fei
>
>
>There's probably also a negated version of that to be handled as well.
>
>
>Overall I think we can go forward with your patches after things are
>fixed.  I'm inclined to wait until after Maciej has integrated his
>changes before actually committing them.  While I don't expect problems,
>I wouldn't want Maciej to have to respin a 40+ patch series.
>
>Note that while we transition to stage3 development today, your patch
>was posted while we were in stage1, so you've met the deadline.  We just
>need to get the updates done relatively soon rather than having it drag
>late into stage3.
>
>Jeff

Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-11-27 Thread Fei Gao
On 2023-11-20 14:59  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_try_cond_zero_arith): handler for condtional zero op
>>  (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc  | 112 +++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 130 ++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -781,12 +781,14 @@ static bool noce_try_store_flag_constants (struct 
>> noce_if_info *);
>>   static bool noce_try_store_flag_mask (struct noce_if_info *);
>>   static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
>>       rtx, rtx, rtx, rtx = NULL, rtx = NULL);
>> +static rtx noce_emit_czero (struct noce_if_info *, enum rtx_code, rtx, rtx);
>>   static bool noce_try_cmove (struct noce_if_info *);
>>   static bool noce_try_cmove_arith (struct noce_if_info *);
>>   static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn 
>>**);
>>   static bool noce_try_minmax (struct noce_if_info *);
>>   static bool noce_try_abs (struct noce_if_info *);
>>   static bool noce_try_sign_mask (struct noce_if_info *);
>> +static bool noce_try_cond_zero_arith (struct noce_if_info *);
>>  
>>   /* Return the comparison code for reversed condition for IF_INFO,
>>  or UNKNOWN if reversing the condition is not possible.  */
>> @@ -1831,6 +1833,32 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
>> enum rtx_code code,
>>   return NULL_RTX;
>>   }
>>  
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, 
>> rtx z, rtx target)
>Every function needs a comment describing what the function does, it's
>return value(s) and its arguments.  There are many examples in ifcvt.cc
>you can use to guide you.  I might start with something like this:
>
>/* Emit a conditional zero, returning the location of the result
>    or NULL_RTX upon failure.
>
>    IF_INFO describes the if-conversion scenario under consideration.
>    CZERO_CODE selects the condition (EQ/NE).
>    Z is the nonzero operand of the conditional move
>    TARGET is the desired output register.  */
>
>Or something like that.  I would suggest renaming "Z" to something more
>meaningful. 
Hi Jeff

Thanks for your patients. All comments regarding coding style have been 
addressed in new patches.

>
>
>
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>The function comment really should be improved.  For example it doesn't
>indicate what the return value is.
>
>> +
>> +  /* cond must be EQ or NEQ comparision of a reg and 0.  */
>In general when you refer to a variable in a comment, do so in upper
>case.  Use NE rather than NEQ as the former is how most code refers to a
>not-equal rtx code.
>
>
>> +  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
>> +    return false;
>> +  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
>> +    return false;
>> +
>> +  /* check y + z:y*/
>> +  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
>> +  && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
>Write comments as complete sentences.
>
>> +    {
>> +  common = b;
>> +  z = XEXP (a, 1);
>Rather than "z" use a more descriptive variable name.
>
>
>> +
>> +  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
>> +  if (common && rtx_equal_p (common, if_info->x))
>> +    target = gen_reg_rtx (mode);
>> +  else
>> +    target = if_info->x;
>if-conv

Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-11-27 Thread Fei Gao
On 2023-11-20 14:46  Jeff Law  wrote:
>
>
>
>On 10/30/23 21:35, Fei Gao wrote:
>
>>> So just a few notes to further illustrate why I'm currently looking to
>>> take the VRULL+Ventana implementation.  The code above would be much
>>> better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>>> through the conditional move expander.  So any improvement we make in
>>> the expander "just work" when called from the if-converter.
>> noce_emit_czero is used here to make sure czero insns are emited.
>> noce_emit_cmove includes SFB and Thead movcc, which will take precedence
>> over zicond in RISCV if enabled. Unfortunately we have products with SFB and 
>> Zicond
>> both available and saw such conflict.
>> And that is also the reason to add hook TARGET_HAVE_COND_ZERO
>> in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond 
>> disabled case.
>I understand what you're trying to do, but I would consider the
>TARGET_HAVE_COND_ZERO fundamentally the wrong approach. 
Hi Jeff

Thanks for your review. I just post the new series.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327148.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327151.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327149.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327150.html

TARGET_HAVE_COND_ZERO has been deleted. 

>
>I'm willing to defer routing everything through noce_emit_cmove for now,
>but that's really where this code needs to be going.  If that's causing
>a conflict for a particular implementation with both SFB and Zicond,
>then we'll have to look at the details and adjust things in the target
>files. 
Agree. We can try noce_emit_cmove later with more TCs integrated recently.
Also I tried to solve the conflict found in my TCs in [PATCH 1/4] and [PATCH 
4/4].

>
>
>> Cool and waiting for your submit. Shifts/rotates can be added in 
>> noce_try_cond_zero_arith.
>Fully agreed.  Those are easy. 
Shifts/rotates have been added. 

BR, 
Fei
>
>> I tried to keep noce_try_cond_zero_arith simple without introducing SCC and 
>> other stuff
>> as addtional insns will be generated for greater than like comparision
>> but may not be generated for branch-insn based SFB.
>And I think the result is we're going to fail to implement many
>profitable if-conversions.
>
>
>Jeff

[PATCH 2/4] [ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns

2023-11-27 Thread Fei Gao
op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
to support SImode in 64-bit machine.

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_try_cond_zero_arith):handler for condtional zero based 
ifcvt
(noce_emit_czero): helper for noce_try_cond_zero_arith
(noce_cond_zero_binary_op_supported): check supported OPs for 
condtional zero based ifcvt
(get_base_reg): get base reg of a subreg or the reg itself
(noce_bbs_ok_for_cond_zero_arith): check if BBs are OK for condtional 
zero based ifcvt
(noce_process_if_block): add noce_try_cond_zero_arith

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
 gcc/ifcvt.cc  | 210 ++
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 682 ++
 2 files changed, 892 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..8f6a0e7f5fe 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -787,6 +787,7 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
 static bool noce_try_minmax (struct noce_if_info *);
 static bool noce_try_abs (struct noce_if_info *);
 static bool noce_try_sign_mask (struct noce_if_info *);
+static int noce_try_cond_zero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1831,6 +1832,40 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
 return NULL_RTX;
 }
 
+/*  Emit a conditional zero, returning TARGET or NULL_RTX upon failure.
+IF_INFO describes the if-conversion scenario under consideration.
+CZERO_CODE selects the condition (EQ/NE).
+NON_ZERO_OP is the nonzero operand of the conditional move
+TARGET is the desired output register.  */
+
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code,
+rtx non_zero_op, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+= gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else
+= gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, non_zero_op);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  start_sequence ();
+  rtx_insn *insn = emit_insn (set);
+
+  if (recog_memoized (insn) >= 0)
+{
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_insn (seq);
+
+  return target;
+}
+
+  end_sequence ();
+  return NULL_RTX;
+}
+
 /* Try only simple constants and registers here.  More complex cases
are handled in noce_try_cmove_arith after noce_try_store_flag_arith
has had a go at it.  */
@@ -2880,6 +2915,178 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+/*  Check if OP is supported by conditional zero based if conversion,
+returning TRUE if satisfied otherwise FALSE.
+
+OP is the operation to check.  */
+
+static bool
+noce_cond_zero_binary_op_supported (rtx op)
+{
+  enum rtx_code opcode = GET_CODE (op);
+
+  /* Strip SIGN_EXTEND or ZERO_EXTEND if any.  */
+  if (opcode == SIGN_EXTEND || opcode == ZERO_EXTEND)
+opcode = GET_CODE (XEXP (op, 0));
+
+  if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
+  || opcode == ASHIFT || opcode == ASHIFTRT || opcode == LSHIFTRT
+  || opcode == ROTATE || opcode == ROTATERT)
+return true;
+
+  return false;
+}
+
+/*  Helper function to return REG itself or inner expression of a SUBREG,
+otherwise NULL_RTX for other RTX_CODE.  */
+
+static rtx
+get_base_reg (rtx exp)
+{
+  if (REG_P (exp))
+return exp;
+  else if (SUBREG_P (exp))
+return SUBREG_REG (exp);
+
+  return NULL_RTX;
+}
+
+/*  Check if IF-BB and THEN-BB satisfy the condition for conditional zero
+based if conversion, returning TRUE if satisfied otherwise FALSE.
+
+IF_INFO describes the if-conversion scenario under consideration.
+COMMON_PTR points to the common REG of canonicalized IF_INFO->A and
+IF_INFO->B.
+CZERO_CODE_PTR points to the comparison code to use in czero RTX.
+A_PTR points to the A expression of canonicalized IF_INFO->A.
+TO_REPLACE points to the RTX to be replaced by czero RTX destnation.  */
+
+static bool
+noce_bbs_ok_for_cond_zero_arith (struct noce_if_info *if_info, rtx *common_ptr,
+enum rtx_code *czero_code_ptr, rtx *a_ptr,
+rtx **to_replace)
+{
+  rtx common = NULL_RTX;
+  rtx cond = if_info->cond;
+  rtx a = copy_rtx (if_info->a);
+  rtx 

[PATCH 4/4] [V2] [ifcvt] prefer SFB to Zicond for x=c ? (y op CONST) : y.

2023-11-27 Thread Fei Gao
In x=c ? (y op CONST) : y cases, Zicond based czero ifcvt generates
more true dependency in code sequence than SFB based movcc. So exit
noce_try_cond_zero_arith in such cases to have a better code sequence
generated by noce_try_cmove_arith.

Take the following case for example.

CFLAGS: -mtune=sifive-7-series -march=rv64gc_zbb_zicond -mabi=lp64d -O2

unsigned int
test_RotateR_eqz_imm_int (unsigned int x, unsigned int y, unsigned int c)
{
  if (c)
x = (y >> 11) | (y << (32 - 11));
  else
x = y;
  return x;
}

before patch:
li  a5,11
czero.eqz   a2,a5,a2
rorwa0,a1,a2
ret

after patch:
roriw   a0,a1,11
bne a2,zero,1f  # movcc
mv  a0,a1
1:
ret

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_have_sfb): hook implementation
(TARGET_HAVE_SFB): define hook in riscv
* doc/tm.texi: add TARGET_HAVE_SFB
* doc/tm.texi.in: add TARGET_HAVE_SFB
* ifcvt.cc (noce_try_cond_zero_arith): prefer SFB for x=c ? (y op 
CONST) : y
* target.def:add TARGET_HAVE_SFB

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_sfb_ifcvt_opt.c: New test.
---
 gcc/config/riscv/riscv.cc |   12 +
 gcc/doc/tm.texi   |4 +
 gcc/doc/tm.texi.in|2 +
 gcc/ifcvt.cc  |4 +-
 gcc/target.def|7 +
 .../gcc.target/riscv/zicond_sfb_ifcvt_opt.c   | 1354 +
 6 files changed, 1382 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_sfb_ifcvt_opt.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0efb939bf2..91fb4ebd653 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10191,6 +10191,14 @@ riscv_vectorize_related_mode (machine_mode 
vector_mode, scalar_mode element_mode
   return default_vectorize_related_mode (vector_mode, element_mode, nunits);
 }
 
+/* Implement TARGET_HAVE_SFB.  */
+
+bool
+riscv_have_sfb (void)
+{
+  return TARGET_SFB_ALU;
+}
+
 /* Implement TARGET_VECTORIZE_VEC_PERM_CONST.  */
 
 static bool
@@ -10536,6 +10544,10 @@ extract_base_offset_in_addr (rtx mem, rtx *base, rtx 
*offset)
 #define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_ENDIAN)
 #endif
 
+#undef TARGET_HAVE_SFB
+#define TARGET_HAVE_SFB \
+riscv_have_sfb
+
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P riscv_vector_mode_supported_p
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 645559ea084..9b4e3f71569 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12149,6 +12149,10 @@ This target hook is required only when the target has 
several different
 modes and they have different conditional execution capability, such as ARM.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_HAVE_SFB (void)
+This target hook returns true if the target supports Short Forward Branch.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn **@var{prep_seq}, 
rtx_insn **@var{gen_seq}, rtx_code @var{code}, tree @var{op0}, tree @var{op1})
 This function prepares to emit a comparison insn for the first compare in a
  sequence of conditional comparisions.  It returns an appropriate comparison
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ddc8507ed9..6dac432605f 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7843,6 +7843,8 @@ lists.
 
 @hook TARGET_HAVE_CONDITIONAL_EXECUTION
 
+@hook TARGET_HAVE_SFB
+
 @hook TARGET_GEN_CCMP_FIRST
 
 @hook TARGET_GEN_CCMP_NEXT
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 4cc6a125ff0..c0f42a7ab1f 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3068,10 +3068,12 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
, _replace))
 return false;
 
-  start_sequence ();
+  if (targetm.have_sfb () && CONST_INT_P (*to_replace))
+return false;
 
   bin_code = GET_CODE (bin_exp);
   bin_op0 = XEXP (bin_exp, 0);
+  start_sequence ();
 
   if (CONST_INT_P (*to_replace))
 {
diff --git a/gcc/target.def b/gcc/target.def
index 475c55c22c1..6d9b71e165b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2726,6 +2726,13 @@ modes and they have different conditional execution 
capability, such as ARM.",
  bool, (void),
  default_have_conditional_execution)
 
+/* Return true if the target supports SFB.  */
+DEFHOOK
+(have_sfb,
+ "This target hook returns true if the target supports Short Forward Branch.",
+ bool, (void),
+ hook_bool_void_false)
+
 DEFHOOK
 (gen_ccmp_first,
  "This function prepares to emit a comparison insn for the first compare in 
a\n\
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_sfb_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_sfb_ifcvt_opt.c
new file mode 100644
index 000..a9cad788956
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond_sfb_ifcvt_opt.c
@@ -0,0 +1,1354 

[PATCH 3/4] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like insns

2023-11-27 Thread Fei Gao
op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_shift_op_supported): check if OP is shift 
like operation
(noce_cond_zero_binary_op_supported): restructure & call 
noce_cond_zero_shift_op_supported
(noce_bbs_ok_for_cond_zero_arith): add bin_exp_ptr interface
(noce_try_cond_zero_arith): add support for x=c ? (y op const_int)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for x=c ? (y op 
const_int) : y
---
 gcc/ifcvt.cc  |  53 +-
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 675 +-
 2 files changed, 716 insertions(+), 12 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 8f6a0e7f5fe..4cc6a125ff0 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2920,6 +2920,16 @@ noce_try_sign_mask (struct noce_if_info *if_info)
 
 OP is the operation to check.  */
 
+static bool
+noce_cond_zero_shift_op_supported (enum rtx_code op)
+{
+  if (op == ASHIFT || op == ASHIFTRT || op == LSHIFTRT || op == ROTATE
+  || op == ROTATERT)
+return true;
+
+  return false;
+}
+
 static bool
 noce_cond_zero_binary_op_supported (rtx op)
 {
@@ -2930,8 +2940,7 @@ noce_cond_zero_binary_op_supported (rtx op)
 opcode = GET_CODE (XEXP (op, 0));
 
   if (opcode == PLUS || opcode == MINUS || opcode == IOR || opcode == XOR
-  || opcode == ASHIFT || opcode == ASHIFTRT || opcode == LSHIFTRT
-  || opcode == ROTATE || opcode == ROTATERT)
+  || noce_cond_zero_shift_op_supported (opcode))
 return true;
 
   return false;
@@ -2963,6 +2972,7 @@ get_base_reg (rtx exp)
 
 static bool
 noce_bbs_ok_for_cond_zero_arith (struct noce_if_info *if_info, rtx *common_ptr,
+rtx *bin_exp_ptr,
 enum rtx_code *czero_code_ptr, rtx *a_ptr,
 rtx **to_replace)
 {
@@ -3029,6 +3039,7 @@ noce_bbs_ok_for_cond_zero_arith (struct noce_if_info 
*if_info, rtx *common_ptr,
 return false;
 
   *common_ptr = common;
+  *bin_exp_ptr = bin_exp;
   *czero_code_ptr = czero_code;
   *a_ptr = a;
 
@@ -3047,20 +3058,28 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
   enum rtx_code czero_code = UNKNOWN;
+  rtx bin_exp = NULL_RTX;
+  enum rtx_code bin_code = UNKNOWN;
+  rtx bin_op0 = NULL_RTX;
   rtx non_zero_op = NULL_RTX;
   rtx *to_replace = NULL;
 
-  if (!noce_bbs_ok_for_cond_zero_arith (if_info, , _code, ,
-   _replace))
+  if (!noce_bbs_ok_for_cond_zero_arith (if_info, , _exp, 
_code,
+   , _replace))
 return false;
 
-  /* Disallow CONST_INT currently for simplicity*/
-  if (to_replace == NULL || !REG_P (*to_replace))
-return false;
+  start_sequence ();
 
-  non_zero_op = *to_replace;
+  bin_code = GET_CODE (bin_exp);
+  bin_op0 = XEXP (bin_exp, 0);
 
-  start_sequence ();
+  if (CONST_INT_P (*to_replace))
+{
+  non_zero_op = gen_reg_rtx (mode);
+  noce_emit_move_insn (non_zero_op, *to_replace);
+}
+  else
+non_zero_op = *to_replace;
 
   /* If x is used in both input and out like x = c ? x + z : x,
  use a new reg to avoid modifying x  */
@@ -3076,7 +3095,21 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   return false;
 }
 
-  *to_replace = target;
+  if (CONST_INT_P (*to_replace))
+{
+  if (noce_cond_zero_shift_op_supported (bin_code))
+   {
+ *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
+   PUT_CODE (a, SIGN_EXTEND);
+   }
+  else if (SUBREG_P (bin_op0))
+   *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
+  else
+   *to_replace = target;
+}
+  else
+*to_replace = target;
   emit_insn (gen_rtx_SET (if_info->x, a));
 
   seq = end_ifcvt_sequence (if_info);
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index 9357f26d978..c6b0518968b 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -678,5 +678,676 @@ test_RotateR_eqz_int (unsigned int x, unsigned int y, 
unsigned int z,
   return x;
 }
 
-/* { dg-final { scan-assembler-times {czero\.eqz} 39 } } */
-/* { dg-final { scan-assembler-times {czero\.nez} 28 } } */
+long
+test_ADD_ceqz_imm (long x, long y, long c)
+{
+  if (c)
+x = y + 11;
+  else
+x = y;
+  return x;
+}
+
+long
+test_ADD_ceqz_x_imm (long x, long c)
+{
+  if (c)
+x = x + 11;
+
+  return x;
+}
+
+long
+test_ADD_nez_imm (long x, long y, long c)
+{
+  if (c)
+x = y;
+  else
+x = y + 11;
+  return x;
+}
+
+long
+test_ADD_nez_x_imm (long x, long c)
+{
+  if (c)
+{
+}
+  else
+x = x + 11;
+  return x;
+}
+
+long

[PATCH 1/4] [RISC-V] prefer Zicond primitive semantics to SFB

2023-11-27 Thread Fei Gao
Move Zicond md files ahead of SFB to recognize Zicond first.

Take the following case for example.

CFLAGS: -mtune=sifive-7-series -march=rv64gc_zicond -mabi=lp64d

long primitiveSemantics_00(long a, long b) { return a == 0 ? 0 : b; }

before patch:
primitiveSemantics_00:
bne a0,zero,1f  # movcc
mv  a1,zero
1:
mv  a0,a1
ret

after patch:
primitiveSemantics_00:
czero.eqz   a0,a1,a0
ret

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* config/riscv/riscv.md (*movcc):move to sfb.md
* config/riscv/sfb.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-sfb-primitiveSemantics.c: New test.
---
 gcc/config/riscv/riscv.md | 19 +--
 gcc/config/riscv/sfb.md   | 37 ++
 .../riscv/zicond-sfb-primitiveSemantics.c | 50 +++
 3 files changed, 88 insertions(+), 18 deletions(-)
 create mode 100644 gcc/config/riscv/sfb.md
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-sfb-primitiveSemantics.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 935eeb7fd8e..d020988446f 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2711,24 +2711,6 @@
   DONE;
 })
 
-;; Patterns for implementations that optimize short forward branches.
-
-(define_insn "*movcc"
-  [(set (match_operand:GPR 0 "register_operand" "=r,r")
-   (if_then_else:GPR
-(match_operator 5 "ordered_comparison_operator"
-   [(match_operand:X 1 "register_operand" "r,r")
-(match_operand:X 2 "reg_or_0_operand" "rJ,rJ")])
-(match_operand:GPR 3 "register_operand" "0,0")
-(match_operand:GPR 4 "sfb_alu_operand" "rJ,IL")))]
-  "TARGET_SFB_ALU"
-  "@
-   b%C5\t%1,%z2,1f\t# movcc\;mv\t%0,%z4\n1:
-   b%C5\t%1,%z2,1f\t# movcc\;li\t%0,%4\n1:"
-  [(set_attr "length" "8")
-   (set_attr "type" "sfb_alu")
-   (set_attr "mode" "")])
-
 ;; Used to implement built-in functions.
 (define_expand "condjump"
   [(set (pc)
@@ -3748,5 +3730,6 @@
 (include "generic-ooo.md")
 (include "vector.md")
 (include "zicond.md")
+(include "sfb.md")
 (include "zc.md")
 (include "corev.md")
diff --git a/gcc/config/riscv/sfb.md b/gcc/config/riscv/sfb.md
new file mode 100644
index 000..52af4b17d46
--- /dev/null
+++ b/gcc/config/riscv/sfb.md
@@ -0,0 +1,37 @@
+;; Machine description for short forward branches(SFB).
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+;; Patterns for implementations that optimize short forward branches.
+
+(define_insn "*movcc"
+  [(set (match_operand:GPR 0 "register_operand" "=r,r")
+   (if_then_else:GPR
+(match_operator 5 "ordered_comparison_operator"
+   [(match_operand:X 1 "register_operand" "r,r")
+(match_operand:X 2 "reg_or_0_operand" "rJ,rJ")])
+(match_operand:GPR 3 "register_operand" "0,0")
+(match_operand:GPR 4 "sfb_alu_operand" "rJ,IL")))]
+  "TARGET_SFB_ALU"
+  "@
+   b%C5\t%1,%z2,1f\t# movcc\;mv\t%0,%z4\n1:
+   b%C5\t%1,%z2,1f\t# movcc\;li\t%0,%4\n1:"
+  [(set_attr "length" "8")
+   (set_attr "type" "sfb_alu")
+   (set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-sfb-primitiveSemantics.c 
b/gcc/testsuite/gcc.target/riscv/zicond-sfb-primitiveSemantics.c
new file mode 100644
index 000..2c60656d5eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-sfb-primitiveSemantics.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=sifive-7-series -march=rv64gc_zicond -mabi=lp64d" { 
target { rv64 } } } */
+/* { dg-options "-mtune=sifive-7-series -march=rv32gc_zicond -mabi=ilp32f" { 
target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Og"} } */
+
+long primitiveSemantics_00(long a, long b) { return a == 0 ? 0 : b; }
+
+long primitiveSemantics_01(long a, long b) { return a != 0 ? 0 : b; }
+
+long primitiveSemantics_02(long a, long b) { return a == 0 ? b : 0; }
+
+long primitiveSemantics_03(long a, long b) { return a != 0 ? b : 0; }
+
+long primitiveSemantics_04(long a, long b) {
+  if (a)
+b = 0;
+  return b;
+}
+
+long primitiveSemantics_05(long a, long b) {
+  if (!a)
+b = 0;
+  return b;
+}
+
+int primitiveSemantics_06(int a, int b) { return a == 0 ? 0 : b; }
+
+int primitiveSemantics_07(int a, 

Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
On 2023-10-31 03:16  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_try_cond_zero_arith): handler for condtional zero op
>>  (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc  | 112 +++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c   | 130 ++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, 
>> rtx z, rtx target)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  rtx cond_op0 = XEXP (if_info->cond, 0);
>> +  rtx czero_cond
>> +    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, 
>> const0_rtx);
>> +  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
>> +  rtx set = gen_rtx_SET (target, if_then_else);
>> +
>> +  start_sequence ();
>> +  rtx_insn *insn = emit_insn (set);
>> +
>> +  if (recog_memoized (insn) >= 0)
>> +    {
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +  emit_insn (seq);
>> +
>> +  return target;
>> +    }
>> +
>> +  end_sequence ();
>> +  return NULL_RTX;
>> +}
>So just a few notes to further illustrate why I'm currently looking to
>take the VRULL+Ventana implementation.  The code above would be much
>better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>through the conditional move expander.  So any improvement we make in
>the expander "just work" when called from the if-converter. 
noce_emit_czero is used here to make sure czero insns are emited. 
noce_emit_cmove includes SFB and Thead movcc, which will take precedence
over zicond in RISCV if enabled. Unfortunately we have products with SFB and 
Zicond
both available and saw such conflict. 
And that is also the reason to add hook TARGET_HAVE_COND_ZERO
in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond 
disabled case. 

>> +
>>   /* Try only simple constants and registers here.  More complex cases
>>  are handled in noce_try_cmove_arith after noce_try_store_flag_arith
>>  has had a go at it.  */
>> @@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>> return true;
>>   }
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>> +{
>> +  rtx target;
>> +  rtx_insn *seq;
>> +  machine_mode mode = GET_MODE (if_info->x);
>> +  rtx common = NULL_RTX;
>> +  enum rtx_code czero_code = UNKNOWN;
>> +  rtx a = if_info->a;
>> +  rtx b = if_info->b;
>> +  rtx z = NULL_RTX;
>> +  rtx cond = if_info->cond;
>> +
>> +  if (!noce_simple_bbs (if_info))
>> +    return false;
>[ ... ]
>So the internal code we have does a bit of canonicalization before the
>optimizing transformations.  In particular we may be presented with
>
>(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows
>us to pick up more cases.  (b != 0 ? b : a) gets similar handling.
>
>As I mentioned earlier, the VRULL+Ventana code handles wrapping
>extensions & subregs.  Our code also handles if-converting shifts/rotates. 
Cool and waiting for your submit. Shifts/rotates can be added in 
noce_try_cond_zero_arith.
I tried to keep noce_try_cond_zero_arith simple without introducing SCC and 
other stuff
as addtional insns will be generated for greater than like comparision
but may not be generated for branch-insn based SFB.
IMHO, the earlier the noce_try* function emerges in noce_process_if_block, the 
simpler
optimization scenarios are and more efficent codes shall be generated,
then the later function like noce_try_cmove_arith will handle the more general 
case.

BR, 
Fei
>
>Hopefully that explains a bit more why I think cleaning up the
>VRULL+Ventana code is a better choice. 

>
>jeff

Re: Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
On 2023-10-31 00:36  Jeff Law  wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>>  * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>  (noce_try_cond_zero_arith): handler for condtional zero op
>>  (noce_process_if_block): add noce_try_cond_zero_arith with hook 
>>control
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>So the idea here is to improve upon the current code we generate for
>conditional arithmetic.  Right now we support conditional arithmetic
>using zicond, but the sequence is poor.
>
>Basically the if-converter knows how to generate a conditional add, but
>it does so in a way that isn't as efficient as it could be.
>
>In effect ifcvt wants to generate
>
>t = a + b
>res = cond ? t : b
>
>
>We want to change it to
>
>t = cond ? b : 0;
>res = a + t;
>
>The latter sequence expands to more efficient code trivially for risc-v. 
Exactly. 2 less insns for add case below:
long test_ADD_ceqz(long x, long y, long z, long c){
  if (c)
    x = y + z;  
  else
    x = y;  
  return x;
  }
  
test_ADD_ceqz(before this patch): 
  add a2,a1,a2
  czero.eqz a0,a2,a3
  czero.nez a3,a1,a3
  or a0,a3,a0
ret

test_ADD_ceqz(after this patch):
  czero.eqz a3,a2,a3
  add a0,a1,a3
  ret
>
>I wandered a bit through the combine dumps to see if it would be easy to
>capture this class of cases.  We never get anything useful, and while I
>can imagine "bridge" patterns that would potentially expose enough RTL
>to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.
>
>So going back to ifcvt...
>
>In the first sequence the addition must wait for both "a" and "b" to be
>available and the conditional move can fire on the next cycle.
>
>In the second sequence the conditional move can fire when just "b" is
>available.  So that gives "a" another cycle to become ready (say if it's
>coming from memory or a multi-cycle operation like multiply).
>
>On the other hand the second sequence does keep "a" live longer.
>
>In the end I strongly suspect neither sequence is significantly better
>than the other.  Meaning I don't think we need to conditionalize using
>condzero arith at all. 
As shown case above, 2 less insns with using condzero arith.

>
>
>I'll note that subsequent patches add MINUS, IOR, XOR and AND.  It's
>also possible (and important) to handle shifts.  There's a conditional
>shift-by-6 in leela's hot path. 
This series is a initial framework for simple condzero arith. Shift may come 
later
as it involes sugreg stuff.

>
>Overall this looks a lot like the VRULL code, but just less complete.
>My inclination is to do a cleanup pass on the VRULL code verify it
>handles all the cases in your tests and commit the VRULL implementation
>with your tests. 
I searched and didn't find VRULL codes, could you please provide a link at
your convience? My colleague Zeng Xiao posted monthes ago 
https://patchwork.sourceware.org/project/gcc/patch/20230719101156.21771-6-zengx...@eswincomputing.com/
But after fixing several bugs, we realized the previous implementation is quite 
complex and
come up with this patch series.

>
>I'll do some further poking at this today.  Thanks for re-submitting
>these bits.  Getting this target independent work cleaned up has been on
>my TODO for a while now. 
Thanks for your patience.

BR, 
Fei

>
>jeff

Re: Re: [PATCH 1/3] [V6] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-10-30 Thread Fei Gao


On 2023-10-28 10:35  Jeff Law  wrote:
>
>
>
>On 10/27/23 14:31, Patrick O'Neill wrote:
>> Hi Fei,
>>
>> A recent change to GCC [1] updated the  the registers in the cm.push and
>> cm.pop insns for these testcases:
>>
>> |FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies test1
>> FAIL: gcc.target/riscv/rv32i_zcmp.c -Os check-function-bodies
>> test2_step1_0_size FAIL: gcc.target/riscv/rv32i_zcmp.c -Os
>> check-function-bodies test3|
>[ ... ]
>Actually [1-9] looks better upon further review. 

hi Patrick

Thanks for adapting the TCs.
I follow Jeff's advice {ra, s0-s[1-9]} for case {ra, s0-sx}.

BR
Fei
>
>jeff

[PATCH 3/4] [ifcvt] if convert x=c ? y op z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
op=[-, |, ^]
opcode=[sub, or, xor]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
opcode rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
opcode rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add more op=[-, |, ^]
(noce_try_cond_zero_arith): adapt for newly added op

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for op=[-, |, ^]
---
 gcc/ifcvt.cc  |  23 +-
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 378 ++
 2 files changed, 396 insertions(+), 5 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 4f98c1c7bf9..6e341fc4d4b 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2908,6 +2908,15 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+static bool
+noce_cond_zero_binary_op_supported (enum rtx_code op)
+{
+  if (op == PLUS || op == MINUS || op == IOR || op == XOR)
+return true;
+
+  return false;
+}
+
 /* Convert x = c ? y + z : y or x = c ? y : y + z. */
 
 static bool
@@ -2918,6 +2927,7 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
   enum rtx_code czero_code = UNKNOWN;
+  enum rtx_code opcode = UNKNOWN;
   rtx a = if_info->a;
   rtx b = if_info->b;
   rtx z = NULL_RTX;
@@ -2933,18 +2943,21 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
 return false;
 
   /* check y + z:y*/
-  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
-  && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
+  if (noce_cond_zero_binary_op_supported (GET_CODE (a)) && REG_P (XEXP (a, 0))
+  && REG_P (XEXP (a, 1)) && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
 {
+  opcode = GET_CODE (a);
   common = b;
   z = XEXP (a, 1);
   /* x = c ? y+z : y, cond = !c --> x = cond ? y : y+z  */
   czero_code = GET_CODE (cond);
 }
   /* check y : y+z  */
-  else if (GET_CODE (b) == PLUS && REG_P (XEXP (b, 0)) && REG_P (XEXP (b, 1))
-  && REG_P (a) && rtx_equal_p (a, XEXP (b, 0)))
+  else if (noce_cond_zero_binary_op_supported (GET_CODE (b))
+  && REG_P (XEXP (b, 0)) && REG_P (XEXP (b, 1)) && REG_P (a)
+  && rtx_equal_p (a, XEXP (b, 0)))
 {
+  opcode = GET_CODE (b);
   common = a;
   z = XEXP (b, 1);
   /* x = c ? y : y+z, cond = !c --> x = !cond ? y : y+z  */
@@ -2971,7 +2984,7 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   return false;
 }
 
-  target = expand_simple_binop (mode, PLUS, common, target, if_info->x, 0,
+  target = expand_simple_binop (mode, opcode, common, target, if_info->x, 0,
OPTAB_DIRECT);
   if (!target)
 {
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index 068c1443413..3ec01dcb135 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -128,3 +128,381 @@ long test_ADD_eqz_x_2(long x, long z, long c)
 x = x + z;
   return x;
 }
+
+/*
+**test_SUB_ceqz:
+** czero\.eqz  a3,a2,a3
+** sub a0,a1,a3
+** ret
+*/
+// x = c ? y-z : y
+long test_SUB_ceqz(long x, long y, long z, long c)
+{
+  if (c)
+x = y - z;
+  else
+x = y;
+  return x;
+}
+
+/*
+**test_SUB_ceqz_x:
+** czero\.eqz  a2,a1,a2
+** sub a0,a0,a2
+** ret
+*/
+// x = c ? x-z : x
+long test_SUB_ceqz_x(long x, long z, long c)
+{
+  if (c)
+x = x - z;
+
+  return x;
+}
+
+/*
+**test_SUB_nez:
+** czero\.nez  a3,a2,a3
+** sub a0,a1,a3
+** ret
+*/
+// x = c ? y : y-z
+long test_SUB_nez(long x, long y, long z, long c)
+{
+  if (c)
+x = y;
+  else
+x = y - z;
+  return x;
+}
+
+/*
+**test_SUB_nez_x:
+** czero\.nez  a2,a1,a2
+** sub a0,a0,a2
+** ret
+*/
+// x = c ? x : x-z
+long test_SUB_nez_x(long x, long z, long c)
+{
+  if (c)
+  {}
+  else
+x = x - z;
+  return x;
+}
+
+/*
+**test_SUB_nez_2:
+** czero\.nez  a3,a2,a3
+** sub a0,a1,a3
+** ret
+*/
+// x = !c ? y-z : y
+long test_SUB_nez_2(long x, long y, long z, long c)
+{
+  if (!c)
+x = y - z;
+  else
+x = y;
+  return x;
+}
+
+/*
+**test_SUB_nez_x_2:
+** czero\.nez  a2,a1,a2
+** sub a0,a0,a2
+** ret
+*/
+// x = !c ? x-z : x
+long test_SUB_nez_x_2(long x, long z, long c)
+{
+  if (!c)
+x = x - z;
+
+  return x;
+}
+
+/*
+**test_SUB_eqz_2:
+** czero\.eqz  a3,a2,a3
+** sub a0,a1,a3
+** ret
+*/
+// x = !c ? y : y-z
+long test_SUB_eqz_2(long x, long y, long z, long c)
+{
+  if (!c)
+x = y;
+  else
+x = y - z;
+  return x;
+}
+
+/*
+**test_SUB_eqz_x_2:
+** czero\.eqz  a2,a1,a2
+** sub a0,a0,a2
+** ret
+*/
+// x = !c ? x : x-z
+long test_SUB_eqz_x_2(long x, long z, 

[PATCH 4/4] [ifcvt] if convert x=c ? y : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
Conditional and, if zero
rd = (rc == 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.eqz rtmp, rs1, rc
or rd, rd, rtmp

Conditional and, if non-zero
rd = (rc != 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.nez rtmp, rs1, rc
or rd, rd, rtmp

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for and
(noce_try_cond_zero_arith): adapt for and operation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for and operation.
---
 gcc/ifcvt.cc  |  60 +---
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 134 ++
 2 files changed, 176 insertions(+), 18 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 6e341fc4d4b..cfa9bc4b850 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2911,7 +2911,7 @@ noce_try_sign_mask (struct noce_if_info *if_info)
 static bool
 noce_cond_zero_binary_op_supported (enum rtx_code op)
 {
-  if (op == PLUS || op == MINUS || op == IOR || op == XOR)
+  if (op == PLUS || op == MINUS || op == IOR || op == XOR || op == AND)
 return true;
 
   return false;
@@ -2922,7 +2922,7 @@ noce_cond_zero_binary_op_supported (enum rtx_code op)
 static bool
 noce_try_cond_zero_arith (struct noce_if_info *if_info)
 {
-  rtx target;
+  rtx target, tmp;
   rtx_insn *seq;
   machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
@@ -2949,8 +2949,9 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   opcode = GET_CODE (a);
   common = b;
   z = XEXP (a, 1);
-  /* x = c ? y+z : y, cond = !c --> x = cond ? y : y+z  */
-  czero_code = GET_CODE (cond);
+  /* x = c ? y+z : y, cond = !c --> x = cond ? y : y+z, but AND differs  */
+  czero_code
+   = (opcode == AND) ? noce_reversed_cond_code (if_info) : GET_CODE (cond);
 }
   /* check y : y+z  */
   else if (noce_cond_zero_binary_op_supported (GET_CODE (b))
@@ -2960,8 +2961,9 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
   opcode = GET_CODE (b);
   common = a;
   z = XEXP (b, 1);
-  /* x = c ? y : y+z, cond = !c --> x = !cond ? y : y+z  */
-  czero_code = noce_reversed_cond_code (if_info);
+  /* x = c ? y : y+z, cond = !c --> x = !cond ? y : y+z, but AND differs  
*/
+  czero_code
+   = (opcode == AND) ? GET_CODE (cond) : noce_reversed_cond_code (if_info);
 }
   else
 return false;
@@ -2970,22 +2972,44 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
 return false;
 
   start_sequence ();
+  if (opcode == AND)
+{
+  tmp
+   = expand_simple_binop (mode, AND, common, z, NULL_RTX, 0, OPTAB_DIRECT);
+  if (!tmp)
+   {
+ end_sequence ();
+ return FALSE;
+   }
 
-  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
-  if (common && rtx_equal_p (common, if_info->x))
-target = gen_reg_rtx (mode);
-  else
-target = if_info->x;
+  target = noce_emit_czero (if_info, czero_code, common, if_info->x);
+  if (!target)
+   {
+ end_sequence ();
+ return FALSE;
+   }
 
-  target = noce_emit_czero (if_info, czero_code, z, target);
-  if (!target)
-{
-  end_sequence ();
-  return false;
+  target = expand_simple_binop (mode, IOR, tmp, target, if_info->x, 0,
+   OPTAB_DIRECT);
 }
+  else
+{
+  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
+  if (common && rtx_equal_p (common, if_info->x))
+   target = gen_reg_rtx (mode);
+  else
+   target = if_info->x;
+
+  target = noce_emit_czero (if_info, czero_code, z, target);
+  if (!target)
+   {
+ end_sequence ();
+ return false;
+   }
 
-  target = expand_simple_binop (mode, opcode, common, target, if_info->x, 0,
-   OPTAB_DIRECT);
+  target = expand_simple_binop (mode, opcode, common, target, if_info->x, 
0,
+   OPTAB_DIRECT);
+}
   if (!target)
 {
   end_sequence ();
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c 
b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
index 3ec01dcb135..bfff570edd7 100644
--- a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -506,3 +506,137 @@ long test_XOR_eqz_x_2(long x, long z, long c)
 x = x ^ z;
   return x;
 }
+
+/*
+**test_AND_ceqz:
+** and a2,a1,a2
+** czero\.nez  a1,a1,a3
+** or  a0,a1,a2
+** ret
+*/
+// x = c ? y : y
+long test_AND_ceqz(long x, long y, long z, long c)
+{
+  if (c)
+x = y & z;
+  else
+x = y;
+  return x;
+}
+
+/*
+**test_AND_ceqz_x:
+** and a1,a0,a1
+** czero\.nez  a0,a0,a2
+** or  a0,a0,a1
+** ret
+*/
+// x = c ? x : x
+long test_AND_ceqz_x(long x, long z, long c)
+{
+  if (c)
+x = x & z;
+
+  return x;
+}
+
+/*
+**test_AND_nez:
+** and 

[PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-10-30 Thread Fei Gao
Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
(noce_try_cond_zero_arith): handler for condtional zero op
(noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
 gcc/ifcvt.cc  | 112 +++
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 130 ++
 2 files changed, 242 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..4f98c1c7bf9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -781,12 +781,14 @@ static bool noce_try_store_flag_constants (struct 
noce_if_info *);
 static bool noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
rtx, rtx, rtx, rtx = NULL, rtx = NULL);
+static rtx noce_emit_czero (struct noce_if_info *, enum rtx_code, rtx, rtx);
 static bool noce_try_cmove (struct noce_if_info *);
 static bool noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static bool noce_try_minmax (struct noce_if_info *);
 static bool noce_try_abs (struct noce_if_info *);
 static bool noce_try_sign_mask (struct noce_if_info *);
+static bool noce_try_cond_zero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1831,6 +1833,32 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
 return NULL_RTX;
 }
 
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx 
z, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+= gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  start_sequence ();
+  rtx_insn *insn = emit_insn (set);
+
+  if (recog_memoized (insn) >= 0)
+{
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_insn (seq);
+
+  return target;
+}
+
+  end_sequence ();
+  return NULL_RTX;
+}
+
 /* Try only simple constants and registers here.  More complex cases
are handled in noce_try_cmove_arith after noce_try_store_flag_arith
has had a go at it.  */
@@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+/* Convert x = c ? y + z : y or x = c ? y : y + z. */
+
+static bool
+noce_try_cond_zero_arith (struct noce_if_info *if_info)
+{
+  rtx target;
+  rtx_insn *seq;
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx common = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx z = NULL_RTX;
+  rtx cond = if_info->cond;
+
+  if (!noce_simple_bbs (if_info))
+return false;
+
+  /* cond must be EQ or NEQ comparision of a reg and 0.  */
+  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
+return false;
+  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
+return false;
+
+  /* check y + z:y*/
+  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
+  && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
+{
+  common = b;
+  z = XEXP (a, 1);
+  /* x = c ? y+z : y, cond = !c --> x = cond ? y : y+z  */
+  czero_code = GET_CODE (cond);
+}
+  /* check y : y+z  */
+  else if (GET_CODE (b) == PLUS && REG_P (XEXP (b, 0)) && REG_P (XEXP (b, 1))
+  && REG_P (a) && rtx_equal_p (a, XEXP (b, 0)))
+{
+  common = a;
+  z = XEXP (b, 1);
+  /* x = c ? y : y+z, cond = !c --> x = !cond ? y : y+z  */
+  czero_code = noce_reversed_cond_code (if_info);
+}
+  else
+return false;
+
+  if (czero_code == UNKNOWN)
+return false;
+
+  start_sequence ();
+
+  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
+  if (common && rtx_equal_p (common, if_info->x))
+target = gen_reg_rtx (mode);
+  else
+target = if_info->x;
+
+  target = noce_emit_czero (if_info, czero_code, z, target);
+  if (!target)
+{
+  end_sequence ();
+  return false;
+}
+
+  target = expand_simple_binop (mode, PLUS, common, target, if_info->x, 0,
+   OPTAB_DIRECT);
+  if (!target)
+{
+  end_sequence ();
+  return false;
+}
+
+  if (target != if_info->x)
+noce_emit_move_insn (if_info->x, target);
+
+  seq = end_ifcvt_sequence (if_info);
+  if (!seq 

[PATCH 0/4] add support for conditional zero operation

2023-10-30 Thread Fei Gao
RISC-V defines Zicond extentsion:
czero.eqz rd, rs1, rs2: moves zero to a register rd, if the condition rs2 is 
equal to zero, otherwise moves rs1 to rd.
czero.nez rd, rs1, rs2: moves zero to a register rd, if the condition rs2 is 
nonzero, otherwise moves rs1 to rd.

With this series, the following optimizations can be achieved.

opcode=[add, sub, or, xor] case:
Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
opcode rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
opcode rd, rs1, rd

case for and:
Conditional and, if zero
rd = (rc == 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.eqz rtmp, rs1, rc
or rd, rd, rtmp

Conditional and, if non-zero
rd = (rc != 0) ? (rs1 & rs2) : rs1
-->
and rd, rs1, rs2
czero.nez rtmp, rs1, rc
    or rd, rd, rtmp

Fei Gao (4):
  [RISC-V]add hook to control Zicond based ifcvt opt
  [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns
  [ifcvt] if convert x=c ? y op z : y by RISC-V Zicond like insns
  [ifcvt] if convert x=c ? y : y by RISC-V Zicond like insns

 gcc/config/riscv/riscv.cc |  10 +
 gcc/doc/tm.texi   |   4 +
 gcc/doc/tm.texi.in|   2 +
 gcc/ifcvt.cc  | 149 
 gcc/target.def|   7 +
 .../gcc.target/riscv/zicond_ifcvt_opt.c   | 642 ++
 6 files changed, 814 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

-- 
2.17.1



[PATCH 1/4] [RISC-V]add hook to control Zicond based ifcvt opt

2023-10-30 Thread Fei Gao
TARGET_HAVE_COND_ZERO is added to control ifcvt optimization
for targets with RISC-V Zicond like insns.

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_have_cond_zero): Implement 
TARGET_HAVE_COND_ZERO
(TARGET_HAVE_COND_ZERO): define RISC-V hook
* doc/tm.texi: add TARGET_HAVE_COND_ZERO
* doc/tm.texi.in: add TARGET_HAVE_COND_ZERO
* target.def: define TARGET_HAVE_COND_ZERO
---
 gcc/config/riscv/riscv.cc | 10 ++
 gcc/doc/tm.texi   |  4 
 gcc/doc/tm.texi.in|  2 ++
 gcc/target.def|  7 +++
 4 files changed, 23 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d5..16a91713ba5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9597,6 +9597,13 @@ riscv_vectorize_create_costs (vec_info *vinfo, bool 
costing_for_scalar)
   return new vector_costs (vinfo, costing_for_scalar);
 }
 
+/* Implement TARGET_HAVE_COND_ZERO.  */
+bool
+riscv_have_cond_zero (void)
+{
+  return TARGET_ZICOND_LIKE;
+}
+
 /* Implement TARGET_PREFERRED_ELSE_VALUE.  */
 
 static tree
@@ -9884,6 +9891,9 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, 
unsigned int nops,
 #undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
 #define TARGET_DWARF_POLY_INDETERMINATE_VALUE 
riscv_dwarf_poly_indeterminate_value
 
+#undef TARGET_HAVE_COND_ZERO
+#define TARGET_HAVE_COND_ZERO riscv_have_cond_zero
+
 #undef TARGET_ZERO_CALL_USED_REGS
 #define TARGET_ZERO_CALL_USED_REGS riscv_zero_call_used_regs
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f7ac806ff15..fe4f59d445e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12080,6 +12080,10 @@ This target hook is required only when the target has 
several different
 modes and they have different conditional execution capability, such as ARM.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_HAVE_COND_ZERO (void)
+This target hook returns true if the target supports conditional zero.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_GEN_CCMP_FIRST (rtx_insn **@var{prep_seq}, 
rtx_insn **@var{gen_seq}, rtx_code @var{code}, tree @var{op0}, tree @var{op1})
 This function prepares to emit a comparison insn for the first compare in a
  sequence of conditional comparisions.  It returns an appropriate comparison
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 141027e0bb4..12aadd75a13 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7832,6 +7832,8 @@ lists.
 
 @hook TARGET_HAVE_CONDITIONAL_EXECUTION
 
+@hook TARGET_HAVE_COND_ZERO
+
 @hook TARGET_GEN_CCMP_FIRST
 
 @hook TARGET_GEN_CCMP_NEXT
diff --git a/gcc/target.def b/gcc/target.def
index 42622177ef9..f977edc3430 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2726,6 +2726,13 @@ modes and they have different conditional execution 
capability, such as ARM.",
  bool, (void),
  default_have_conditional_execution)
 
+/* Return true if the target supports conditional zero.  */
+DEFHOOK
+(have_cond_zero,
+ "This target hook returns true if the target supports conditional zero.",
+ bool, (void),
+ hook_bool_void_false)
+
 DEFHOOK
 (gen_ccmp_first,
  "This function prepares to emit a comparison insn for the first compare in 
a\n\
-- 
2.17.1



[PATCH] MAINTAINERS: Add myself to write after approval

2023-09-18 Thread Fei Gao
Signed-off-by: Fei Gao 
ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f2f5ed29885..e9154878517 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -424,6 +424,7 @@ Gary Funck  

 Pompapathi V Gadad 
 Eric Gallager  
 Gopalasubramanian Ganesh   

+Fei Gao

 Kaveh Ghazi
 Doug Gilmore   
 Matthew Gingell
-- 
2.17.1



[PATCH] [RISC-V] fix PR 111259 invalid zcmp mov predicate.

2023-09-14 Thread Fei Gao
The code changes are from Palmer.

root cause:
In a gcc build with --enable-checking=yes, REGNO (op) checks
rtx code and expected code 'reg'. so a rtx with 'subreg' causes
an internal compiler error.

solution:
Restrict predicate to allow 'reg' only.

gcc/ChangeLog:

* config/riscv/predicates.md: Restrict predicate
to allow 'reg' only.
---
 gcc/config/riscv/predicates.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 53e7c1d03aa..4bc7ff2c9d8 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -74,6 +74,7 @@
   (ior (match_operand 0 "const_0_operand")
(match_operand 0 "register_operand")))
 
+;; ZCMP predicates
 (define_predicate "stack_push_up_to_ra_operand"
   (and (match_code "const_int")
(match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op) * -1, 
1)")))
@@ -170,13 +171,12 @@
   (and (match_code "const_int")
(match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 13)")))
 
-;; ZCMP predicates
 (define_predicate "a0a1_reg_operand"
-  (and (match_operand 0 "register_operand")
+  (and (match_code "reg")
(match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
 
 (define_predicate "zcmp_mv_sreg_operand"
-  (and (match_operand 0 "register_operand")
+  (and (match_code "reg")
(match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
 : IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
 || IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))
-- 
2.17.1



Re: Re: [PATCH 2/2] [RISC-V] Enalble zcmp for -Os

2023-09-11 Thread Fei Gao
On 2023-09-06 16:06  Kito Cheng  wrote:
>
>On Wed, Sep 6, 2023 at 9:47 AM Fei Gao  wrote:
>>
>> On 2023-09-05 20:02  Kito Cheng  wrote:
>> >
>> >> @@ -5569,7 +5571,9 @@ riscv_avoid_multi_push (const struct 
>> >> riscv_frame_info *frame)
>> >>  {
>> >>    if (!TARGET_ZCMP || crtl->calls_eh_return || frame_pointer_needed
>> >>    || cfun->machine->interrupt_handler_p || 
>> >>cfun->machine->varargs_size != 0
>> >> -  || crtl->args.pretend_args_size != 0 || flag_shrink_wrap_separate
>> >> +  || crtl->args.pretend_args_size != 0
>> >> +  || (use_shrink_wrapping_separate ()
>> >> + && !riscv_avoid_shrink_wrapping_separate ())
>> >
>> >I think we should also check "!optimize_function_for_size_p (cfun)"
>> >here, otherwise that does not really match what we claim in the commit
>> >message.
>> >
>> A similar check optimize_function_for_speed_p is included in
>> use_shrink_wrapping_separate of [1/2] allow targets to check
>> shrink-wrap-separate enabled or not.
>>
>> >e.g. it still will enable with -O2 -fno-shrink-wrap-separate
>> It's intentional to enable zcmp with -O2 -fno-shrink-wrap-separate.
>> Maybe I should have given a better commit message saying
>> "enable muti push and pop for Zcmp extension when
>> shrink-wrap-separate is inactive".
>>
>> Would you like a new patch from me or agree with my
>> explanation and modify commit message in your side?
>
>Could you send a new patch with updated commit message. 
hi Kito

New patch with updated commit message:
https://patchwork.sourceware.org/project/gcc/list/?series=24300

BR, 
Fei
>
>
>>
>> BR
>> Fei
>> >
>> >>    || (frame->mask & ~MULTI_PUSH_GPR_MASK))
>> >>  return true;
>> >>
>>

Re: Re: [PATCH 3/3] [V2] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-09-07 Thread Fei Gao
On 2023-09-08 04:33  Palmer Dabbelt  wrote:
>
>On Thu, 07 Sep 2023 13:16:36 PDT (-0700), dimi...@dinux.eu wrote:
>> Hi,
>>
>> This patch appears to have caused PR 111259. 
Hi Patrick 

We're reproducing the issue also. 

One thing that puzzles me is why a zcmp predicate casused
a regression in rv64gc single lib build. The new define_insn
and define_peephole2 are all gated by TARGET_ZCMP, which
is false when building libgcc.

Could you please share more about your findings regading 
"This patch appears to have caused PR 111259"?

BR, 
Fei

>
>Thanks.  Looks like wer'e not running our tests with RTL checking,
>Patrick is going to try and see if we've got compute time left for some
>builds -- even just having builds with checking would be a good one, we
>get bit by these bugs from time to time.
>
>I'm spinning up a --enable-checking=yes build.  Maybe we just need
>something like
>
>    diff --git a/gcc/config/riscv/predicates.md 
>b/gcc/config/riscv/predicates.md
>    index 53e7c1d03aa..aa4f02c67d5 100644
>    --- a/gcc/config/riscv/predicates.md
>    +++ b/gcc/config/riscv/predicates.md
>    @@ -172,11 +172,11 @@ (define_predicate "stack_pop_up_to_s11_operand"
>
> ;; ZCMP predicates
> (define_predicate "a0a1_reg_operand"
>    -  (and (match_operand 0 "register_operand")
>    +  (and (match_code "reg")
>    (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
>
> (define_predicate "zcmp_mv_sreg_operand"
>    -  (and (match_operand 0 "register_operand")
>    +  (and (match_code "reg")
>    (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, 
>S1_REGNUM)
>         : IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
> || IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))
>
>> Regards,
>> Dimitar
>>
>> On Tue, Aug 29, 2023 at 08:37:46AM +, Fei Gao wrote:
>>> From: Die Li 
>>>
>>> Signed-off-by: Die Li 
>>> Co-Authored-By: Fei Gao 
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/peephole.md: New pattern.
>>> * config/riscv/predicates.md (a0a1_reg_operand): New predicate.
>>> (zcmp_mv_sreg_operand): New predicate.
>>> * config/riscv/riscv.md: New predicate.
>>> * config/riscv/zc.md (*mva01s): New pattern.
>>> (*mvsa01): New pattern.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/cm_mv_rv32.c: New test.
>>> ---
>>>  gcc/config/riscv/peephole.md    | 28 +
>>>  gcc/config/riscv/predicates.md  | 11 
>>>  gcc/config/riscv/riscv.md   |  1 +
>>>  gcc/config/riscv/zc.md  | 22 
>>>  gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 23 +
>>>  5 files changed, 85 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c
>>>
>>> diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
>>> index 0ef0c04410b..92e57f9a447 100644
>>> --- a/gcc/config/riscv/peephole.md
>>> +++ b/gcc/config/riscv/peephole.md
>>> @@ -38,3 +38,31 @@
>>>  {
>>>    operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5]));
>>>  })
>>> +
>>> +;; ZCMP
>>> +(define_peephole2
>>> +  [(set (match_operand:X 0 "a0a1_reg_operand")
>>> +    (match_operand:X 1 "zcmp_mv_sreg_operand"))
>>> +   (set (match_operand:X 2 "a0a1_reg_operand")
>>> +    (match_operand:X 3 "zcmp_mv_sreg_operand"))]
>>> +  "TARGET_ZCMP
>>> +   && (REGNO (operands[2]) != REGNO (operands[0]))"
>>> +  [(parallel [(set (match_dup 0)
>>> +   (match_dup 1))
>>> +  (set (match_dup 2)
>>> +   (match_dup 3))])]
>>> +)
>>> +
>>> +(define_peephole2
>>> +  [(set (match_operand:X 0 "zcmp_mv_sreg_operand")
>>> +    (match_operand:X 1 "a0a1_reg_operand"))
>>> +   (set (match_operand:X 2 "zcmp_mv_sreg_operand")
>>> +    (match_operand:X 3 "a0a1_reg_operand"))]
>>> +  "TARGET_ZCMP
>>> +   && (REGNO (operands[0]) != REGNO (operands[2]))
>>> +   && (REGNO (operands[1]) != REGNO (operands[3]))"
>>> +  [(parallel [(set (match_dup 0)
>>> +   

[PATCH 2/2] [V2][RISC-V] enable muti push and pop for Zcmp when shrink-wrap-separate is ineffective

2023-09-06 Thread Fei Gao
So that zcmp can be enabled in -Os where
shrink-wrap-separate is not effective.

To force enabling zcmp multi push/pop in speed perfered case,
fno-shrink-wrap-separate has to be explictly given.

gcc/ChangeLog:

* config/riscv/riscv.cc 
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push):avoid multi push if shrink_wrapping_separate
  is active.
(riscv_get_separate_components):call 
riscv_avoid_shrink_wrapping_separate

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: remove -fno-shrink-wrap-separate
* gcc.target/riscv/rv32i_zcmp.c: likewise
* gcc.target/riscv/zcmp_push_fpr.c: likewise
* gcc.target/riscv/zcmp_stack_alignment.c: likewise
* gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
* gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.
---
 gcc/config/riscv/riscv.cc | 21 -
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  2 +-
 .../gcc.target/riscv/zcmp_push_fpr.c  |  2 +-
 .../riscv/zcmp_shrink_wrap_separate.c | 93 +++
 .../riscv/zcmp_shrink_wrap_separate2.c| 93 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  2 +-
 7 files changed, 207 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 78600ba73b6..3f71000c88b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "cfgloop.h"
 #include "cfgrtl.h"
+#include "shrink-wrap.h"
 #include "sel-sched.h"
 #include "sched-int.h"
 #include "fold-const.h"
@@ -372,6 +373,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* use_divmod_expansion */
 };
 
+static bool riscv_avoid_shrink_wrapping_separate ();
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
@@ -5569,7 +5571,9 @@ riscv_avoid_multi_push (const struct riscv_frame_info 
*frame)
 {
   if (!TARGET_ZCMP || crtl->calls_eh_return || frame_pointer_needed
   || cfun->machine->interrupt_handler_p || cfun->machine->varargs_size != 0
-  || crtl->args.pretend_args_size != 0 || flag_shrink_wrap_separate
+  || crtl->args.pretend_args_size != 0
+  || (use_shrink_wrapping_separate ()
+ && !riscv_avoid_shrink_wrapping_separate ())
   || (frame->mask & ~MULTI_PUSH_GPR_MASK))
 return true;
 
@@ -6831,6 +6835,17 @@ riscv_epilogue_uses (unsigned int regno)
   return false;
 }
 
+static bool
+riscv_avoid_shrink_wrapping_separate ()
+{
+  if (riscv_use_save_libcall (>machine->frame)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+return true;
+
+  return false;
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
 
 static sbitmap
@@ -6840,9 +6855,7 @@ riscv_get_separate_components (void)
   sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
   bitmap_clear (components);
 
-  if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  if (riscv_avoid_shrink_wrapping_separate ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
diff --git a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
index 394459c4ed7..50e443573ad 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options " -Os -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow 
-fno-shrink-wrap-separate" } */
+/* { dg-options " -Os -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow" } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
index f00338a9d17..ea562b7a233 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options " -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow 
-fno-shrink-wrap-separate" }*/
+/* { dg-options " -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow" 
}*/
 /* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git 

[PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-09-06 Thread Fei Gao
No functional changes but restructure and expose use_shrink_wrapping_separate
to the TARGETs.

gcc/ChangeLog:

* shrink-wrap.cc (try_shrink_wrapping_separate):call
  use_shrink_wrapping_separate.
(use_shrink_wrapping_separate): wrap the condition
  check in use_shrink_wrapping_separate.
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern
---
 gcc/shrink-wrap.cc | 22 +++---
 gcc/shrink-wrap.h  |  1 +
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index b8d7b557130..28301f04f89 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1776,16 +1776,13 @@ insert_prologue_epilogue_for_components (sbitmap 
components)
   commit_edge_insertions ();
 }
 
-/* The main entry point to this subpass.  FIRST_BB is where the prologue
-   would be normally put.  */
-void
-try_shrink_wrapping_separate (basic_block first_bb)
+bool
+use_shrink_wrapping_separate (void)
 {
-  if (!(SHRINK_WRAPPING_ENABLED
-   && flag_shrink_wrap_separate
+  if (!(SHRINK_WRAPPING_ENABLED && flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun)
&& targetm.shrink_wrap.get_separate_components))
-return;
+return false;
 
   /* We don't handle "strange" functions.  */
   if (cfun->calls_alloca
@@ -1794,6 +1791,17 @@ try_shrink_wrapping_separate (basic_block first_bb)
   || crtl->calls_eh_return
   || crtl->has_nonlocal_goto
   || crtl->saves_all_registers)
+return false;
+
+  return true;
+}
+
+/* The main entry point to this subpass.  FIRST_BB is where the prologue
+   would be normally put.  */
+void
+try_shrink_wrapping_separate (basic_block first_bb)
+{
+  if (!use_shrink_wrapping_separate ())
 return;
 
   /* Ask the target what components there are.  If it returns NULL, don't
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index 161647711a3..82386c2b712 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
 extern void try_shrink_wrapping_separate (basic_block first_bb);
+extern bool use_shrink_wrapping_separate (void);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
2.17.1



[PATCH 0/2] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-09-06 Thread Fei Gao
Enable muti push and pop for Zcmp when shrink-wrap-separate is ineffective.

Fei Gao (2):
  allow targets to check shrink-wrap-separate enabled or not
  [V2][RISC-V] enable muti push and pop for Zcmp when shrink-wrap-separate is 
ineffective

 gcc/config/riscv/riscv.cc | 21 -
 gcc/shrink-wrap.cc| 22 +++--
 gcc/shrink-wrap.h |  1 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  2 +-
 .../gcc.target/riscv/zcmp_push_fpr.c  |  2 +-
 .../riscv/zcmp_shrink_wrap_separate.c | 93 +++
 .../riscv/zcmp_shrink_wrap_separate2.c| 93 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  2 +-
 9 files changed, 223 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c

-- 
2.17.1



Re: Re: [PATCH 2/2] [RISC-V] Enalble zcmp for -Os

2023-09-05 Thread Fei Gao
On 2023-09-05 20:02  Kito Cheng  wrote:
>
>> @@ -5569,7 +5571,9 @@ riscv_avoid_multi_push (const struct riscv_frame_info 
>> *frame)
>>  {
>>    if (!TARGET_ZCMP || crtl->calls_eh_return || frame_pointer_needed
>>    || cfun->machine->interrupt_handler_p || cfun->machine->varargs_size 
>>!= 0
>> -  || crtl->args.pretend_args_size != 0 || flag_shrink_wrap_separate
>> +  || crtl->args.pretend_args_size != 0
>> +  || (use_shrink_wrapping_separate ()
>> + && !riscv_avoid_shrink_wrapping_separate ())
>
>I think we should also check "!optimize_function_for_size_p (cfun)"
>here, otherwise that does not really match what we claim in the commit
>message.
> 
A similar check optimize_function_for_speed_p is included in 
use_shrink_wrapping_separate of [1/2] allow targets to check
shrink-wrap-separate enabled or not.

>e.g. it still will enable with -O2 -fno-shrink-wrap-separate 
It's intentional to enable zcmp with -O2 -fno-shrink-wrap-separate. 
Maybe I should have given a better commit message saying
"enable muti push and pop for Zcmp extension when
shrink-wrap-separate is inactive".

Would you like a new patch from me or agree with my
explanation and modify commit message in your side?

BR
Fei
>
>>    || (frame->mask & ~MULTI_PUSH_GPR_MASK))
>>  return true;
>> 



Re: Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-31 Thread Fei Gao


On 2023-08-29 09:46  Jeff Law  wrote:
>
>
>
>On 8/28/23 19:28, Fei Gao wrote:
>> On 2023-08-29 06:54  Jeff Law  wrote:
>>>
>>>
>>>
>>> On 8/28/23 01:47, Fei Gao wrote:
>>>> no functional changes but allow targets to check shrink-wrap-separate 
>>>> enabled or not.
>>>>
>>>>      gcc/ChangeLog:
>>>>
>>>>    * shrink-wrap.cc (try_shrink_wrapping_separate):call
>>>>      use_shrink_wrapping_separate.
>>>>    (use_shrink_wrapping_separate): wrap the condition
>>>>      check in use_shrink_wrapping_separate.
>>>>    * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>>> So as I mentioned earlier today in the older thread, can we use
>>> override_options to do this?
>>>
>>> If we look at aarch64_override_options we have this:
>>>
>>>     /* The pass to insert speculation tracking runs before
>>>    shrink-wrapping and the latter does not know how to update the
>>>    tracking status.  So disable it in this case.  */
>>>     if (aarch64_track_speculation)
>>>   flag_shrink_wrap = 0;
>>>
>>> We kind of want this instead
>>>
>>>     if (flag_shrink_wrap)
>>>   {
>>>     turn off whatever target bits enable the cm.push/cm.pop insns
>>>   }
>>>
>>>
>>> This does imply that we have a distinct target flag to enable/disable
>>> those instructions.  But that seems like a good thing to have anyway.
>> I'm afraid we cannot simply resolve the confilict based on
>> flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from 
>> -O1 onwards,
>> which means zcmp is disabled almostly unless 
>> -fno-shrink-warp/-fno-shrink-warp-separate
>> are explictly given.
>Yea, but I would generally expect that if someone is really concerned
>about code size, they're probably using -Os which (hopefully) would not
>have shrink-wrapping enabled.
>
>>
>> So after discussion with Kito, we would like to turn on zcmp for -Os and 
>> shrink-warp-separate
>> for the speed perfered optimization. use_shrink_wrapping_separate in this 
>> patch provide the
>> chance for this check. No new hook is needed.
>Seems reasonable to me if Kito is OK with it. 

Thanks Jeff and Kito for the discussion.

Could you please review the new series at your convenience?
https://patchwork.sourceware.org/project/gcc/list/?series=24065

BR, 
Fei

>
>jeff

[PATCH 2/2] [RISC-V] Enalble zcmp for -Os

2023-08-31 Thread Fei Gao
Enalble zcmp for -Os and shrink-warp-separate for
the speed perfered optimization by default.

To force enabling zcmp multi push/pop in speed perfered case,
fno-shrink-wrap-separate has to be explictly given.

gcc/ChangeLog:

* config/riscv/riscv.cc 
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push):avoid multi push if shrink_wrapping_separate
  is active.
(riscv_get_separate_components):call 
riscv_avoid_shrink_wrapping_separate

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: remove -fno-shrink-wrap-separate
* gcc.target/riscv/rv32i_zcmp.c: likewise
* gcc.target/riscv/zcmp_push_fpr.c: likewise
* gcc.target/riscv/zcmp_stack_alignment.c: likewise
* gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
* gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.
---
 gcc/config/riscv/riscv.cc | 21 -
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  2 +-
 .../gcc.target/riscv/zcmp_push_fpr.c  |  2 +-
 .../riscv/zcmp_shrink_wrap_separate.c | 93 +++
 .../riscv/zcmp_shrink_wrap_separate2.c| 93 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  2 +-
 7 files changed, 207 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 78600ba73b6..3f71000c88b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "cfgloop.h"
 #include "cfgrtl.h"
+#include "shrink-wrap.h"
 #include "sel-sched.h"
 #include "sched-int.h"
 #include "fold-const.h"
@@ -372,6 +373,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* use_divmod_expansion */
 };
 
+static bool riscv_avoid_shrink_wrapping_separate ();
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
@@ -5569,7 +5571,9 @@ riscv_avoid_multi_push (const struct riscv_frame_info 
*frame)
 {
   if (!TARGET_ZCMP || crtl->calls_eh_return || frame_pointer_needed
   || cfun->machine->interrupt_handler_p || cfun->machine->varargs_size != 0
-  || crtl->args.pretend_args_size != 0 || flag_shrink_wrap_separate
+  || crtl->args.pretend_args_size != 0
+  || (use_shrink_wrapping_separate ()
+ && !riscv_avoid_shrink_wrapping_separate ())
   || (frame->mask & ~MULTI_PUSH_GPR_MASK))
 return true;
 
@@ -6831,6 +6835,17 @@ riscv_epilogue_uses (unsigned int regno)
   return false;
 }
 
+static bool
+riscv_avoid_shrink_wrapping_separate ()
+{
+  if (riscv_use_save_libcall (>machine->frame)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+return true;
+
+  return false;
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
 
 static sbitmap
@@ -6840,9 +6855,7 @@ riscv_get_separate_components (void)
   sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
   bitmap_clear (components);
 
-  if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  if (riscv_avoid_shrink_wrapping_separate ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
diff --git a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
index 394459c4ed7..50e443573ad 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options " -Os -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow 
-fno-shrink-wrap-separate" } */
+/* { dg-options " -Os -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow" } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
index f00338a9d17..ea562b7a233 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options " -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow 
-fno-shrink-wrap-separate" }*/
+/* { dg-options " -Os -march=rv32imaf_zca_zcmp -mabi=ilp32f -mcmodel=medlow" 
}*/
 /* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git 

[PATCH 0/2] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-08-31 Thread Fei Gao
Enalble zcmp for -Os and shrink-warp-separate for
the speed perfered optimization by default.

Fei Gao (2):
  allow targets to check shrink-wrap-separate enabled or not
  [RISC-V] Enalble zcmp for -Os

 gcc/config/riscv/riscv.cc | 21 -
 gcc/shrink-wrap.cc| 22 +++--
 gcc/shrink-wrap.h |  1 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  2 +-
 .../gcc.target/riscv/zcmp_push_fpr.c  |  2 +-
 .../riscv/zcmp_shrink_wrap_separate.c | 93 +++
 .../riscv/zcmp_shrink_wrap_separate2.c| 93 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  2 +-
 9 files changed, 223 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c

-- 
2.17.1



[PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-31 Thread Fei Gao
No functional changes but restructure and expose use_shrink_wrapping_separate
to the TARGETs.

gcc/ChangeLog:

* shrink-wrap.cc (try_shrink_wrapping_separate):call
  use_shrink_wrapping_separate.
(use_shrink_wrapping_separate): wrap the condition
  check in use_shrink_wrapping_separate.
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern
---
 gcc/shrink-wrap.cc | 22 +++---
 gcc/shrink-wrap.h  |  1 +
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index b8d7b557130..28301f04f89 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1776,16 +1776,13 @@ insert_prologue_epilogue_for_components (sbitmap 
components)
   commit_edge_insertions ();
 }
 
-/* The main entry point to this subpass.  FIRST_BB is where the prologue
-   would be normally put.  */
-void
-try_shrink_wrapping_separate (basic_block first_bb)
+bool
+use_shrink_wrapping_separate (void)
 {
-  if (!(SHRINK_WRAPPING_ENABLED
-   && flag_shrink_wrap_separate
+  if (!(SHRINK_WRAPPING_ENABLED && flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun)
&& targetm.shrink_wrap.get_separate_components))
-return;
+return false;
 
   /* We don't handle "strange" functions.  */
   if (cfun->calls_alloca
@@ -1794,6 +1791,17 @@ try_shrink_wrapping_separate (basic_block first_bb)
   || crtl->calls_eh_return
   || crtl->has_nonlocal_goto
   || crtl->saves_all_registers)
+return false;
+
+  return true;
+}
+
+/* The main entry point to this subpass.  FIRST_BB is where the prologue
+   would be normally put.  */
+void
+try_shrink_wrapping_separate (basic_block first_bb)
+{
+  if (!use_shrink_wrapping_separate ())
 return;
 
   /* Ask the target what components there are.  If it returns NULL, don't
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index 161647711a3..82386c2b712 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
 extern void try_shrink_wrapping_separate (basic_block first_bb);
+extern bool use_shrink_wrapping_separate (void);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
2.17.1



[PATCH 2/3] [V2] [RISC-V] support cm.popretz in zcmp

2023-08-29 Thread Fei Gao
Generate cm.popretz instead of cm.popret if return value is 0.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md: define A0_REGNUM
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_): md for popretz ra
(@gpr_multi_popretz_up_to_s0_): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_): likewise
(@gpr_multi_popretz_up_to_s2_): likewise
(@gpr_multi_popretz_up_to_s3_): likewise
(@gpr_multi_popretz_up_to_s4_): likewise
(@gpr_multi_popretz_up_to_s5_): likewise
(@gpr_multi_popretz_up_to_s6_): likewise
(@gpr_multi_popretz_up_to_s7_): likewise
(@gpr_multi_popretz_up_to_s8_): likewise
(@gpr_multi_popretz_up_to_s9_): likewise
(@gpr_multi_popretz_up_to_s11_): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i
---
 gcc/config/riscv/riscv.cc   | 114 --
 gcc/config/riscv/riscv.md   |   1 +
 gcc/config/riscv/zc.md  | 393 
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c |  13 +
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c |  13 +
 5 files changed, 509 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ed4d28b2eb0..78600ba73b6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -422,6 +422,7 @@ typedef enum
   PUSH_IDX = 0,
   POP_IDX,
   POPRET_IDX,
+  POPRETZ_IDX,
   ZCMP_OP_NUM
 } riscv_zcmp_op_t;
 
@@ -6238,30 +6239,31 @@ riscv_emit_stack_tie (void)
 /*zcmp multi push and pop code_for_push_pop function ptr array  */
 const code_for_push_pop_t code_for_push_pop[ZCMP_MAX_GRP_SLOTS][ZCMP_OP_NUM]
   = {{code_for_gpr_multi_push_up_to_ra, code_for_gpr_multi_pop_up_to_ra,
-  code_for_gpr_multi_popret_up_to_ra},
+  code_for_gpr_multi_popret_up_to_ra, code_for_gpr_multi_popretz_up_to_ra},
  {code_for_gpr_multi_push_up_to_s0, code_for_gpr_multi_pop_up_to_s0,
-  code_for_gpr_multi_popret_up_to_s0},
+  code_for_gpr_multi_popret_up_to_s0, code_for_gpr_multi_popretz_up_to_s0},
  {code_for_gpr_multi_push_up_to_s1, code_for_gpr_multi_pop_up_to_s1,
-  code_for_gpr_multi_popret_up_to_s1},
+  code_for_gpr_multi_popret_up_to_s1, code_for_gpr_multi_popretz_up_to_s1},
  {code_for_gpr_multi_push_up_to_s2, code_for_gpr_multi_pop_up_to_s2,
-  code_for_gpr_multi_popret_up_to_s2},
+  code_for_gpr_multi_popret_up_to_s2, code_for_gpr_multi_popretz_up_to_s2},
  {code_for_gpr_multi_push_up_to_s3, code_for_gpr_multi_pop_up_to_s3,
-  code_for_gpr_multi_popret_up_to_s3},
+  code_for_gpr_multi_popret_up_to_s3, code_for_gpr_multi_popretz_up_to_s3},
  {code_for_gpr_multi_push_up_to_s4, code_for_gpr_multi_pop_up_to_s4,
-  code_for_gpr_multi_popret_up_to_s4},
+  code_for_gpr_multi_popret_up_to_s4, code_for_gpr_multi_popretz_up_to_s4},
  {code_for_gpr_multi_push_up_to_s5, code_for_gpr_multi_pop_up_to_s5,
-  code_for_gpr_multi_popret_up_to_s5},
+  code_for_gpr_multi_popret_up_to_s5, code_for_gpr_multi_popretz_up_to_s5},
  {code_for_gpr_multi_push_up_to_s6, code_for_gpr_multi_pop_up_to_s6,
-  code_for_gpr_multi_popret_up_to_s6},
+  code_for_gpr_multi_popret_up_to_s6, code_for_gpr_multi_popretz_up_to_s6},
  {code_for_gpr_multi_push_up_to_s7, code_for_gpr_multi_pop_up_to_s7,
-  code_for_gpr_multi_popret_up_to_s7},
+  code_for_gpr_multi_popret_up_to_s7, code_for_gpr_multi_popretz_up_to_s7},
  {code_for_gpr_multi_push_up_to_s8, code_for_gpr_multi_pop_up_to_s8,
-  code_for_gpr_multi_popret_up_to_s8},
+  code_for_gpr_multi_popret_up_to_s8, code_for_gpr_multi_popretz_up_to_s8},
  {code_for_gpr_multi_push_up_to_s9, code_for_gpr_multi_pop_up_to_s9,
-  code_for_gpr_multi_popret_up_to_s9},
- {nullptr, nullptr, nullptr},
+  code_for_gpr_multi_popret_up_to_s9, code_for_gpr_multi_popretz_up_to_s9},
+ {nullptr, nullptr, nullptr, nullptr},
  {code_for_gpr_multi_push_up_to_s11, code_for_gpr_multi_pop_up_to_s11,
-  code_for_gpr_multi_popret_up_to_s11}};
+  code_for_gpr_multi_popret_up_to_s11,
+  code_for_gpr_multi_popretz_up_to_s11}};
 
 static rtx
 riscv_gen_multi_push_pop_insn (riscv_zcmp_op_t op, HOST_WIDE_INT adj_size,
@@ -6474,6 +6476,78 @@ riscv_adjust_libcall_cfi_epilogue ()
   return dwarf;
 }
 
+/* return true if popretz pattern can be matched.
+   set (reg 10 a0) (const_int 0)
+   use (reg 10 a0)
+   NOTE_INSN_EPILOGUE_BEG  */
+static rtx_insn *
+riscv_zcmp_can_use_popretz (void)
+{
+  rtx_insn *insn = NULL, *use = NULL, *clear = NULL;
+
+  /* sequence stack for NOTE_INSN_EPILOGUE_BEG*/
+  struct 

[PATCH 3/3] [V2] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-08-29 Thread Fei Gao
From: Die Li 

Signed-off-by: Die Li 
Co-Authored-By: Fei Gao 

gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s): New pattern.
(*mvsa01): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.
---
 gcc/config/riscv/peephole.md| 28 +
 gcc/config/riscv/predicates.md  | 11 
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/zc.md  | 22 
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 23 +
 5 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c

diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index 0ef0c04410b..92e57f9a447 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -38,3 +38,31 @@
 {
   operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5]));
 })
+
+;; ZCMP
+(define_peephole2
+  [(set (match_operand:X 0 "a0a1_reg_operand")
+(match_operand:X 1 "zcmp_mv_sreg_operand"))
+   (set (match_operand:X 2 "a0a1_reg_operand")
+(match_operand:X 3 "zcmp_mv_sreg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[2]) != REGNO (operands[0]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
+
+(define_peephole2
+  [(set (match_operand:X 0 "zcmp_mv_sreg_operand")
+(match_operand:X 1 "a0a1_reg_operand"))
+   (set (match_operand:X 2 "zcmp_mv_sreg_operand")
+(match_operand:X 3 "a0a1_reg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[0]) != REGNO (operands[2]))
+   && (REGNO (operands[1]) != REGNO (operands[3]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 3ef09996a85..772f45df65c 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -165,6 +165,17 @@
   (and (match_code "const_int")
(match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 13)")))
 
+;; ZCMP predicates
+(define_predicate "a0a1_reg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
+
+(define_predicate "zcmp_mv_sreg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+: IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+|| IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 8e09df6ff63..aa2b5b960dc 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -132,6 +132,7 @@
(S0_REGNUM  8)
(S1_REGNUM  9)
(A0_REGNUM  10)
+   (A1_REGNUM  11)
(S2_REGNUM  18)
(S3_REGNUM  19)
(S4_REGNUM  20)
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
index 8d7de97daad..77b28adde95 100644
--- a/gcc/config/riscv/zc.md
+++ b/gcc/config/riscv/zc.md
@@ -1433,3 +1433,25 @@
   "TARGET_ZCMP"
   "cm.push {ra, s0-s11}, %0"
 )
+
+;; ZCMP mv
+(define_insn "*mva01s"
+  [(set (match_operand:X 0 "a0a1_reg_operand" "=r")
+(match_operand:X 1 "zcmp_mv_sreg_operand" "r"))
+   (set (match_operand:X 2 "a0a1_reg_operand" "=r")
+(match_operand:X 3 "zcmp_mv_sreg_operand" "r"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[2]) != REGNO (operands[0]))"
+  { return (REGNO (operands[0]) == 
A0_REGNUM)?"cm.mva01s\t%1,%3":"cm.mva01s\t%3,%1"; }
+  [(set_attr "mode" "")])
+
+(define_insn "*mvsa01"
+  [(set (match_operand:X 0 "zcmp_mv_sreg_operand" "=r")
+(match_operand:X 1 "a0a1_reg_operand" "r"))
+   (set (match_operand:X 2 "zcmp_mv_sreg_operand" "=r")
+(match_operand:X 3 "a0a1_reg_operand" "r"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[0]) != REGNO (operands[2]))
+   &&

[PATCH 1/3] [V6] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-29 Thread Fei Gao
+  /* Undo the above fib.  */
+  frame->mask = mask;
+  frame->fmask = fmask;
+  unsigned regs_count = riscv_multi_push_regs_count (frame->mask);
+  if (use_multi_pop_normal)
+   insn = emit_jump_insn (riscv_gen_multi_push_pop_insn (POPRET_IDX,
+ multipop_size,
+ regs_count));
+  else
+   insn = emit_insn (
+ riscv_gen_multi_push_pop_insn (POP_IDX, multipop_size, regs_count));
+
+  rtx dwarf = riscv_adjust_multi_pop_cfi_epilogue (multipop_size);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  REG_NOTES (insn) = dwarf;
+  if (use_multi_pop_normal)
+   return;
+}
+  else if (use_restore_libcall)
 {
   rtx dwarf = riscv_adjust_libcall_cfi_epilogue ();
   insn = emit_insn (gen_gpr_restore (GEN_INT (riscv_save_libcall_count 
(mask;
@@ -7744,6 +8056,27 @@ riscv_gen_gpr_save_insn (struct riscv_frame_info *frame)
   return gen_rtx_PARALLEL (VOIDmode, vec);
 }
 
+static HOST_WIDE_INT
+zcmp_base_adj (int regs_num)
+{
+  return riscv_16bytes_align ((regs_num) *GET_MODE_SIZE (word_mode));
+}
+
+static HOST_WIDE_INT
+zcmp_additional_adj (HOST_WIDE_INT total, int regs_num)
+{
+  return total - zcmp_base_adj (regs_num);
+}
+
+bool
+riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT total, int regs_num)
+{
+  HOST_WIDE_INT additioanl_bytes = zcmp_additional_adj (total, regs_num);
+  return additioanl_bytes == 0 || additioanl_bytes == 1 * ZCMP_SP_INC_STEP
+|| additioanl_bytes == 2 * ZCMP_SP_INC_STEP
+|| additioanl_bytes == ZCMP_MAX_SPIMM * ZCMP_SP_INC_STEP;
+}
+
 /* Return true if it's valid gpr_save pattern.  */
 
 bool
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index e18a0081297..42b6eb784d4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -420,6 +420,29 @@ ASM_MISA_SPEC
 #define RISCV_CALL_ADDRESS_TEMP(MODE) \
   gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
 
+#define RETURN_ADDR_MASK (1 << RETURN_ADDR_REGNUM)
+#define S0_MASK (1 << S0_REGNUM)
+#define S1_MASK (1 << S1_REGNUM)
+#define S2_MASK (1 << S2_REGNUM)
+#define S3_MASK (1 << S3_REGNUM)
+#define S4_MASK (1 << S4_REGNUM)
+#define S5_MASK (1 << S5_REGNUM)
+#define S6_MASK (1 << S6_REGNUM)
+#define S7_MASK (1 << S7_REGNUM)
+#define S8_MASK (1 << S8_REGNUM)
+#define S9_MASK (1 << S9_REGNUM)
+#define S10_MASK (1 << S10_REGNUM)
+#define S11_MASK (1 << S11_REGNUM)
+
+#define MULTI_PUSH_GPR_MASK
\
+  (RETURN_ADDR_MASK | S0_MASK | S1_MASK | S2_MASK | S3_MASK | S4_MASK  
\
+   | S5_MASK | S6_MASK | S7_MASK | S8_MASK | S9_MASK | S10_MASK | S11_MASK)
+#define ZCMP_MAX_SPIMM 3
+#define ZCMP_SP_INC_STEP 16
+#define ZCMP_INVALID_S0S10_SREGS_COUNTS 11
+#define ZCMP_S0S11_SREGS_COUNTS 12
+#define ZCMP_MAX_GRP_SLOTS 13
+
 #define MCOUNT_NAME "_mcount"
 
 #define NO_PROFILE_COUNTERS 1
@@ -655,6 +678,8 @@ enum reg_class
   ((REGNO) >= 8 && (REGNO) <= 9 ? (REGNO) - 8 :\
(REGNO) >= 18 && (REGNO) <= 27 ? (REGNO) - 16 : -1)
 
+#define CALLEE_SAVED_FREG_NUMBER(REGNO) CALLEE_SAVED_REG_NUMBER (REGNO - 32)
+
 #define LIBCALL_VALUE(MODE) \
   riscv_function_value (NULL_TREE, NULL_TREE, MODE)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 47d14d99903..f489646cec3 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -124,6 +124,7 @@
 
 (define_constants
   [(RETURN_ADDR_REGNUM 1)
+   (SP_REGNUM  2)
(GP_REGNUM  3)
(TP_REGNUM  4)
(T0_REGNUM  5)
@@ -3431,3 +3432,4 @@
 (include "thead.md")
 (include "vector.md")
 (include "zicond.md")
+(include "zc.md")
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 000..5c1bf031b8d
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,1042 @@
+;; Machine description for RISC-V Zc extention.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Fei Gao (gao...@eswincomputing.com).
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "@gpr_multi

[PATCH 0/3] [RISC-V] support zcmp extension

2023-08-29 Thread Fei Gao
Fei Gao (3):
  [RISC-V] support cm.push cm.pop cm.popret in zcmp
  [RISC-V] support cm.popretz in zcmp
  [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

 gcc/config/riscv/iterators.md |   15 +
 gcc/config/riscv/peephole.md  |   28 +
 gcc/config/riscv/predicates.md|  107 ++
 gcc/config/riscv/riscv-protos.h   |2 +
 gcc/config/riscv/riscv.cc |  499 +-
 gcc/config/riscv/riscv.h  |   25 +
 gcc/config/riscv/riscv.md |4 +
 gcc/config/riscv/zc.md| 1457 +
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c   |   23 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  269 +++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  269 +++
 .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
 13 files changed, 2705 insertions(+), 51 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1



Re: Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Fei Gao
On 2023-08-29 06:54  Jeff Law  wrote:
>
>
>
>On 8/28/23 01:47, Fei Gao wrote:
>> no functional changes but allow targets to check shrink-wrap-separate 
>> enabled or not.
>>
>>    gcc/ChangeLog:
>>
>>  * shrink-wrap.cc (try_shrink_wrapping_separate):call
>>    use_shrink_wrapping_separate.
>>  (use_shrink_wrapping_separate): wrap the condition
>>    check in use_shrink_wrapping_separate.
>>  * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>So as I mentioned earlier today in the older thread, can we use
>override_options to do this?
>
>If we look at aarch64_override_options we have this:
>
>   /* The pass to insert speculation tracking runs before
>  shrink-wrapping and the latter does not know how to update the
>  tracking status.  So disable it in this case.  */
>   if (aarch64_track_speculation)
> flag_shrink_wrap = 0;
>
>We kind of want this instead
>
>   if (flag_shrink_wrap)
> {
>   turn off whatever target bits enable the cm.push/cm.pop insns
> }
>
>
>This does imply that we have a distinct target flag to enable/disable
>those instructions.  But that seems like a good thing to have anyway. 
I'm afraid we cannot simply resolve the confilict based on 
flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from -O1 
onwards,
which means zcmp is disabled almostly unless 
-fno-shrink-warp/-fno-shrink-warp-separate
are explictly given. 

So after discussion with Kito, we would like to turn on zcmp for -Os and 
shrink-warp-separate
for the speed perfered optimization. use_shrink_wrapping_separate in this patch 
provide the
chance for this check. No new hook is needed. 

Please let me know what you think.

BR, 
Fei

>
>jeff

Re: Re: [PATCH 0/2] support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Fei Gao

On 2023-08-28 17:27  Kito Cheng  wrote:
>
>I would prefer to decouple the shrink-wrap part by checking
>flag_shrink_wrap, I mean let disable zcmp code gen if flag_shrink_wrap
>is true for now, and a follow up patch series with shrink-wrap.[cc|h]
>changes? 

OK. some details to be confirmed by you:
1. flag_shrink_wrap_separate seems better than flag_shrink_wrap.
2. to pass the zcmp testcases, i will add fno-shrink-wrap-separate option.

BR, 
Fei

>
>On Mon, Aug 28, 2023 at 3:48 PM Fei Gao  wrote:
>>
>> The first is a helper patch to allow targets to check shrink-wrap-separate 
>> enabled or not.
>> The second is zcmp extension implementation in RISC-V.
>>
>> Fei Gao (2):
>>   allow target to check shrink-wrap-separate enabled or not
>>   support cm.push cm.pop cm.popret in zcmp and resolve confilct with 
>>shrink-wrap-separate
>>
>>  gcc/config/riscv/iterators.md |   15 +
>>  gcc/config/riscv/predicates.md    |   96 ++
>>  gcc/config/riscv/riscv-protos.h   |    2 +
>>  gcc/config/riscv/riscv.cc |  455 ++-
>>  gcc/config/riscv/riscv.h  |   25 +
>>  gcc/config/riscv/riscv.md |    2 +
>>  gcc/config/riscv/zc.md    | 1042 +
>>  gcc/shrink-wrap.cc    |   25 +-
>>  gcc/shrink-wrap.h |    1 +
>>  gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  256 
>>  gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  256 
>>  .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
>>  .../riscv/zcmp_shrink_wrap_separate.c |   93 ++
>>  .../riscv/zcmp_shrink_wrap_separate2.c    |   93 ++
>>  .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
>>  15 files changed, 2357 insertions(+), 62 deletions(-)
>>  create mode 100644 gcc/config/riscv/zc.md
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
>>  create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
>>  create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
>>
>> --
>> 2.17.1
>>

Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-28 Thread Fei Gao
Hi Kito & Jeff

A new series for 
zcmp(https://patchwork.sourceware.org/project/gcc/list/?series=23929) to:
1. solve the 2 issues Kito catched
2. rebase

The new series would be a replacement of the following:
https://patchwork.sourceware.org/project/gcc/list/?series=21577
https://patchwork.sourceware.org/project/gcc/patch/20230607055215.29332-2-gao...@eswincomputing.com/

The rest of zcmp patches will be send out after the new series accepted to 
avoid rebase again an again.

BR, 
Fei


On 2023-08-20 18:53  Fei Gao  wrote:
>
>
>Hi Kito
>
>This issue is due to zcmp and shrink-wrap-separate conflict,
>which has been addressed by an under-review patch.
>[PATCH 0/2] resolve confilct between RISC-V zcmp and shrink-wrap-separate
>https://patchwork.sourceware.org/project/gcc/list/?series=21577
>https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311487.html
>
>I'm making  [PATCH 1/4][V5][RISC-V] support cm.push cm.pop cm.popret in zcmp 
>for the 1st issue you catched.
>Please let me know if you want me to merge 
>https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311486.html
>into [PATCH 1/4][V5][RISC-V]. 
>
>BR, 
>Fei
>On 2023-08-16 16:38  Kito Cheng  wrote:
>>
>>Another fail case for CFI:
>>
>>$ riscv64-unknown-elf-gcc _mulhc3.i
>>-march=rv64imafd_zicsr_zifencei_zca_zcmp -mabi=lp64d -g  -O2  -o
>>_mulhc3.s
>>
>>typedef float a __attribute__((mode(HF)));
>>b, c;
>>f() {
>> a a, d, e = a + d;
>> if (g() && e)
>>   c = b;
>>}
>>
>>
>>0x10e508a maybe_record_trace_start
>>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2584
>>0x10e58fb scan_trace
>>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2784
>>0x10e5fab create_cfi_notes
>>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2938
>>0x10e6ee4 execute_dwarf2_frame
>>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3309
>>0x10e7c5a execute
>>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3797
>>
>>On Wed, Aug 16, 2023 at 4:33 PM Kito Cheng  wrote:
>>>
>>> Hi Fei:
>>>
>>> Tried to use Jiawei's patch to test this patch and found some issue:
>>>
>>>
>>> > @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void)
>>> >    /* Save the registers.  */
>>> >    if ((frame->mask | frame->fmask) != 0)
>>> >  {
>>> > -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>>> > remaining_size);
>>> > -
>>> > -  insn = gen_add3_insn (stack_pointer_rtx,
>>> > -   stack_pointer_rtx,
>>> > -   GEN_INT (-step1));
>>> > -  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>> > -  remaining_size -= step1;
>>> > +  if (known_gt (remaining_size, frame->frame_pointer_offset))
>>> > +    {
>>> > +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>>> > remaining_size);
>>> > +  remaining_size -= step1;
>>> > +  insn = gen_add3_insn (stack_pointer_rtx,
>>> > +    stack_pointer_rtx,
>>> > +    GEN_INT (-step1));
>>> > +  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>> > +    }
>>> >    riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, 
>>> >false);
>>> >  }
>>> >
>>>
>>> I hit some issue here during building libgcc, I use
>>> riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp
>>>
>>> And the error message is:
>>>
>>> In file included from
>>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471:
>>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In
>>> function '_Unwind_Backtrace':
>>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1:
>>> internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176
>>>  330 | }
>>>  | ^
>>> 0x83753a gen_reg_rtx(machine_mode)
>>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176
>>> 0xf5566f maybe_legitimize_operand
>>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047
>>> 0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned
>>> int, expand_operand*)
>>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191
>>> 0xf511d9 maybe_gen_insn(insn_code, unsign

[PATCH 2/2][V5][RISC-V]support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Fei Gao
ol
+riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT total, int regs_num)
+{
+  HOST_WIDE_INT additioanl_bytes = zcmp_additional_adj (total, regs_num);
+  return additioanl_bytes == 0 || additioanl_bytes == 1 * ZCMP_SP_INC_STEP
+|| additioanl_bytes == 2 * ZCMP_SP_INC_STEP
+|| additioanl_bytes == ZCMP_MAX_SPIMM * ZCMP_SP_INC_STEP;
+}
+
 /* Return true if it's valid gpr_save pattern.  */
 
 bool
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index e18a0081297..42b6eb784d4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -420,6 +420,29 @@ ASM_MISA_SPEC
 #define RISCV_CALL_ADDRESS_TEMP(MODE) \
   gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
 
+#define RETURN_ADDR_MASK (1 << RETURN_ADDR_REGNUM)
+#define S0_MASK (1 << S0_REGNUM)
+#define S1_MASK (1 << S1_REGNUM)
+#define S2_MASK (1 << S2_REGNUM)
+#define S3_MASK (1 << S3_REGNUM)
+#define S4_MASK (1 << S4_REGNUM)
+#define S5_MASK (1 << S5_REGNUM)
+#define S6_MASK (1 << S6_REGNUM)
+#define S7_MASK (1 << S7_REGNUM)
+#define S8_MASK (1 << S8_REGNUM)
+#define S9_MASK (1 << S9_REGNUM)
+#define S10_MASK (1 << S10_REGNUM)
+#define S11_MASK (1 << S11_REGNUM)
+
+#define MULTI_PUSH_GPR_MASK
\
+  (RETURN_ADDR_MASK | S0_MASK | S1_MASK | S2_MASK | S3_MASK | S4_MASK  
\
+   | S5_MASK | S6_MASK | S7_MASK | S8_MASK | S9_MASK | S10_MASK | S11_MASK)
+#define ZCMP_MAX_SPIMM 3
+#define ZCMP_SP_INC_STEP 16
+#define ZCMP_INVALID_S0S10_SREGS_COUNTS 11
+#define ZCMP_S0S11_SREGS_COUNTS 12
+#define ZCMP_MAX_GRP_SLOTS 13
+
 #define MCOUNT_NAME "_mcount"
 
 #define NO_PROFILE_COUNTERS 1
@@ -655,6 +678,8 @@ enum reg_class
   ((REGNO) >= 8 && (REGNO) <= 9 ? (REGNO) - 8 :\
(REGNO) >= 18 && (REGNO) <= 27 ? (REGNO) - 16 : -1)
 
+#define CALLEE_SAVED_FREG_NUMBER(REGNO) CALLEE_SAVED_REG_NUMBER (REGNO - 32)
+
 #define LIBCALL_VALUE(MODE) \
   riscv_function_value (NULL_TREE, NULL_TREE, MODE)
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 47d14d99903..f489646cec3 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -124,6 +124,7 @@
 
 (define_constants
   [(RETURN_ADDR_REGNUM 1)
+   (SP_REGNUM  2)
(GP_REGNUM  3)
(TP_REGNUM  4)
(T0_REGNUM  5)
@@ -3431,3 +3432,4 @@
 (include "thead.md")
 (include "vector.md")
 (include "zicond.md")
+(include "zc.md")
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 000..5c1bf031b8d
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,1042 @@
+;; Machine description for RISC-V Zc extention.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Fei Gao (gao...@eswincomputing.com).
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "@gpr_multi_pop_up_to_ra_"
+  [(set (reg:X SP_REGNUM)
+(plus:X (reg:X SP_REGNUM)
+ (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
+   (set (reg:X RETURN_ADDR_REGNUM)
+(mem:X (plus:X (reg:X SP_REGNUM)
+   (const_int ]
+  "TARGET_ZCMP"
+  "cm.pop  {ra}, %0"
+)
+
+(define_insn "@gpr_multi_pop_up_to_s0_"
+  [(set (reg:X SP_REGNUM)
+(plus:X (reg:X SP_REGNUM)
+ (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
+   (set (reg:X S0_REGNUM)
+(mem:X (plus:X (reg:X SP_REGNUM)
+   (const_int 
+   (set (reg:X RETURN_ADDR_REGNUM)
+(mem:X (plus:X (reg:X SP_REGNUM)
+   (const_int ]
+  "TARGET_ZCMP"
+  "cm.pop  {ra, s0}, %0"
+)
+
+(define_insn "@gpr_multi_pop_up_to_s1_"
+  [(set (reg:X SP_REGNUM)
+(plus:X (reg:X SP_REGNUM)
+ (match_operand 0 "stack_pop_up_to_s1_operand" "I")))
+   (set (reg:X S1_REGNUM)
+(mem:X (plus:X (reg:X SP_REGNUM)
+   (const_int 
+   (set (reg:X S0_REGNUM)
+(mem:X (plus:X (reg:X SP_REGNUM)
+   (const_int 
+   (set (reg:X RETURN_ADDR_REGNUM)
+(mem:X 

[PATCH 0/2] support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Fei Gao
The first is a helper patch to allow targets to check shrink-wrap-separate 
enabled or not.
The second is zcmp extension implementation in RISC-V.

Fei Gao (2):
  allow target to check shrink-wrap-separate enabled or not
  support cm.push cm.pop cm.popret in zcmp and resolve confilct with 
shrink-wrap-separate

 gcc/config/riscv/iterators.md |   15 +
 gcc/config/riscv/predicates.md|   96 ++
 gcc/config/riscv/riscv-protos.h   |2 +
 gcc/config/riscv/riscv.cc |  455 ++-
 gcc/config/riscv/riscv.h  |   25 +
 gcc/config/riscv/riscv.md |2 +
 gcc/config/riscv/zc.md| 1042 +
 gcc/shrink-wrap.cc|   25 +-
 gcc/shrink-wrap.h |1 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  256 
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  256 
 .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
 .../riscv/zcmp_shrink_wrap_separate.c |   93 ++
 .../riscv/zcmp_shrink_wrap_separate2.c|   93 ++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
 15 files changed, 2357 insertions(+), 62 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1



[PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Fei Gao
no functional changes but allow targets to check shrink-wrap-separate enabled 
or not.

  gcc/ChangeLog:

* shrink-wrap.cc (try_shrink_wrapping_separate):call
  use_shrink_wrapping_separate.
(use_shrink_wrapping_separate): wrap the condition
  check in use_shrink_wrapping_separate.
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern
---
 gcc/shrink-wrap.cc | 25 +
 gcc/shrink-wrap.h  |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index b8d7b557130..d534964321a 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1776,16 +1776,14 @@ insert_prologue_epilogue_for_components (sbitmap 
components)
   commit_edge_insertions ();
 }
 
-/* The main entry point to this subpass.  FIRST_BB is where the prologue
-   would be normally put.  */
-void
-try_shrink_wrapping_separate (basic_block first_bb)
+bool
+use_shrink_wrapping_separate (void)
 {
   if (!(SHRINK_WRAPPING_ENABLED
-   && flag_shrink_wrap_separate
-   && optimize_function_for_speed_p (cfun)
-   && targetm.shrink_wrap.get_separate_components))
-return;
+&& flag_shrink_wrap_separate
+&& optimize_function_for_speed_p (cfun)
+&& targetm.shrink_wrap.get_separate_components))
+return false;
 
   /* We don't handle "strange" functions.  */
   if (cfun->calls_alloca
@@ -1794,6 +1792,17 @@ try_shrink_wrapping_separate (basic_block first_bb)
   || crtl->calls_eh_return
   || crtl->has_nonlocal_goto
   || crtl->saves_all_registers)
+return false;
+
+  return true;
+}
+
+/* The main entry point to this subpass.  FIRST_BB is where the prologue
+   would be normally put.  */
+void
+try_shrink_wrapping_separate (basic_block first_bb)
+{
+  if (!use_shrink_wrapping_separate ())
 return;
 
   /* Ask the target what components there are.  If it returns NULL, don't
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index 161647711a3..82386c2b712 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
 extern void try_shrink_wrapping_separate (basic_block first_bb);
+extern bool use_shrink_wrapping_separate (void);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
2.17.1



Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-20 Thread Fei Gao

Hi Kito

This issue is due to zcmp and shrink-wrap-separate conflict,
which has been addressed by an under-review patch.
[PATCH 0/2] resolve confilct between RISC-V zcmp and shrink-wrap-separate
https://patchwork.sourceware.org/project/gcc/list/?series=21577
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311487.html

I'm making  [PATCH 1/4][V5][RISC-V] support cm.push cm.pop cm.popret in zcmp 
for the 1st issue you catched.
Please let me know if you want me to merge 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311486.html
into [PATCH 1/4][V5][RISC-V].

BR, 
Fei
On 2023-08-16 16:38  Kito Cheng  wrote:
>
>Another fail case for CFI:
>
>$ riscv64-unknown-elf-gcc _mulhc3.i
>-march=rv64imafd_zicsr_zifencei_zca_zcmp -mabi=lp64d -g  -O2  -o
>_mulhc3.s
>
>typedef float a __attribute__((mode(HF)));
>b, c;
>f() {
> a a, d, e = a + d;
> if (g() && e)
>   c = b;
>}
>
>
>0x10e508a maybe_record_trace_start
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2584
>0x10e58fb scan_trace
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2784
>0x10e5fab create_cfi_notes
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2938
>0x10e6ee4 execute_dwarf2_frame
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3309
>0x10e7c5a execute
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3797
>
>On Wed, Aug 16, 2023 at 4:33 PM Kito Cheng  wrote:
>>
>> Hi Fei:
>>
>> Tried to use Jiawei's patch to test this patch and found some issue:
>>
>>
>> > @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void)
>> >    /* Save the registers.  */
>> >    if ((frame->mask | frame->fmask) != 0)
>> >  {
>> > -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > -
>> > -  insn = gen_add3_insn (stack_pointer_rtx,
>> > -   stack_pointer_rtx,
>> > -   GEN_INT (-step1));
>> > -  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > -  remaining_size -= step1;
>> > +  if (known_gt (remaining_size, frame->frame_pointer_offset))
>> > +    {
>> > +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > +  remaining_size -= step1;
>> > +  insn = gen_add3_insn (stack_pointer_rtx,
>> > +    stack_pointer_rtx,
>> > +    GEN_INT (-step1));
>> > +  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > +    }
>> >    riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, 
>> >false);
>> >  }
>> >
>>
>> I hit some issue here during building libgcc, I use
>> riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp
>>
>> And the error message is:
>>
>> In file included from
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471:
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In
>> function '_Unwind_Backtrace':
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1:
>> internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176
>>  330 | }
>>  | ^
>> 0x83753a gen_reg_rtx(machine_mode)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176
>> 0xf5566f maybe_legitimize_operand
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047
>> 0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned
>> int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191
>> 0xf511d9 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8210
>> 0xf58539 expand_binop_directly
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1452
>> 0xf5 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
>> rtx_def*, int, optab_methods)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1539
>> 0xcbfdd0 force_operand(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:8231
>> 0xc8fca1 force_reg(machine_mode, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/explow.cc:687
>> 0x144b8cd riscv_force_temporary
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1531
>> 0x144b8cd riscv_force_address
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1528
>> 0x144b8cd riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:2387
>> 0x1af063e gen_movdf(rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.md:2107
>> 0xcba503 rtx_insn* insn_gen_fn::operator()> rtx_def*>(rtx_def*, rtx_def*) const
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/recog.h:411
>> 0xcba503 emit_move_insn_1(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:4164
>> 0x143d6c4 riscv_emit_move(rtx_def*, rtx_def*)

Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-17 Thread Fei Gao
Hi Kito

Root cause has been identified.

Here's the frame layout fo the TC, please use courier font :)
+---+ 
|                               | 
|  GPR save area  112 B         | 
|                               |
+---+ 
|                               |<-- fs0 is beyond sp based 12-bit 
range 
|  FPR save area  96 B          |
|                               |
+---+ 
|                               |
|  local variables              |<-- stack_pointer_rtx after 
riscv_first_stack_step
|                               |
+---+ 

During stack frame allocation:
1. cm.push reserves 160 bytes, 112 for ra and sregs with 128-bit alignment as 
per ABI, and additional 48 bytes for first 6 fprs.
2. riscv_first_stack_step reserves 2032 bytes for the rest 6 fprs and local 
variables.
3. riscv_for_each_saved_reg tries to save fs0 which is beyond sp based 12-bit 
range,
    thus breaking gcc_assert (can_create_pseudo_p ()) in gen_reg_rtx when doing 
force reg as it's already after reload complete.

I tried with a solution like saving first 6 fprs immediately after cm.push. It 
seems working:)
I will fix epilogue correspondingly as well.

Thanks again for your test. 

BR, 
Fei

On 2023-08-16 16:33  Kito Cheng  wrote:
>
>Hi Fei:
>
>Tried to use Jiawei's patch to test this patch and found some issue:
>
>
>> @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void)
>>    /* Save the registers.  */
>>    if ((frame->mask | frame->fmask) != 0)
>>  {
>> -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, remaining_size);
>> -
>> -  insn = gen_add3_insn (stack_pointer_rtx,
>> -   stack_pointer_rtx,
>> -   GEN_INT (-step1));
>> -  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> -  remaining_size -= step1;
>> +  if (known_gt (remaining_size, frame->frame_pointer_offset))
>> +    {
>> +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> remaining_size);
>> +  remaining_size -= step1;
>> +  insn = gen_add3_insn (stack_pointer_rtx,
>> +    stack_pointer_rtx,
>> +    GEN_INT (-step1));
>> +  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> +    }
>>    riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, 
>>false);
>>  }
>>
>
>I hit some issue here during building libgcc, I use
>riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp
>
>And the error message is:
>
>In file included from
>../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471:
>../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In
>function '_Unwind_Backtrace':
>../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1:
>internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176
> 330 | }
> | ^
>0x83753a gen_reg_rtx(machine_mode)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176
>0xf5566f maybe_legitimize_operand
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047
>0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned
>int, expand_operand*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191
>0xf511d9 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8210
>0xf58539 expand_binop_directly
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1452
>0xf5 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
>rtx_def*, int, optab_methods)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1539
>0xcbfdd0 force_operand(rtx_def*, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:8231
>0xc8fca1 force_reg(machine_mode, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/explow.cc:687
>0x144b8cd riscv_force_temporary
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1531
>0x144b8cd riscv_force_address
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1528
>0x144b8cd riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:2387
>0x1af063e gen_movdf(rtx_def*, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.md:2107
>0xcba503 rtx_insn* insn_gen_fn::operator()rtx_def*>(rtx_def*, rtx_def*) const
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/recog.h:411
>0xcba503 emit_move_insn_1(rtx_def*, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:4164
>0x143d6c4 riscv_emit_move(rtx_def*, rtx_def*)
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1486
>0x143d6c4 riscv_save_reg
>   

Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-16 Thread Fei Gao
Hi Kito

Thanks for reporting these 2 issues. 
Let me check and feedback you soon. 

BR
Fei

On 2023-08-16 16:38  Kito Cheng  wrote:
>
>Another fail case for CFI:
>
>$ riscv64-unknown-elf-gcc _mulhc3.i
>-march=rv64imafd_zicsr_zifencei_zca_zcmp -mabi=lp64d -g  -O2  -o
>_mulhc3.s
>
>typedef float a __attribute__((mode(HF)));
>b, c;
>f() {
> a a, d, e = a + d;
> if (g() && e)
>   c = b;
>}
>
>
>0x10e508a maybe_record_trace_start
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2584
>0x10e58fb scan_trace
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2784
>0x10e5fab create_cfi_notes
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2938
>0x10e6ee4 execute_dwarf2_frame
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3309
>0x10e7c5a execute
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3797
>
>On Wed, Aug 16, 2023 at 4:33 PM Kito Cheng  wrote:
>>
>> Hi Fei:
>>
>> Tried to use Jiawei's patch to test this patch and found some issue:
>>
>>
>> > @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void)
>> >    /* Save the registers.  */
>> >    if ((frame->mask | frame->fmask) != 0)
>> >  {
>> > -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > -
>> > -  insn = gen_add3_insn (stack_pointer_rtx,
>> > -   stack_pointer_rtx,
>> > -   GEN_INT (-step1));
>> > -  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > -  remaining_size -= step1;
>> > +  if (known_gt (remaining_size, frame->frame_pointer_offset))
>> > +    {
>> > +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > +  remaining_size -= step1;
>> > +  insn = gen_add3_insn (stack_pointer_rtx,
>> > +    stack_pointer_rtx,
>> > +    GEN_INT (-step1));
>> > +  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > +    }
>> >    riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, 
>> >false);
>> >  }
>> >
>>
>> I hit some issue here during building libgcc, I use
>> riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp
>>
>> And the error message is:
>>
>> In file included from
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471:
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In
>> function '_Unwind_Backtrace':
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1:
>> internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176
>>  330 | }
>>  | ^
>> 0x83753a gen_reg_rtx(machine_mode)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176
>> 0xf5566f maybe_legitimize_operand
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047
>> 0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned
>> int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191
>> 0xf511d9 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8210
>> 0xf58539 expand_binop_directly
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1452
>> 0xf5 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
>> rtx_def*, int, optab_methods)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1539
>> 0xcbfdd0 force_operand(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:8231
>> 0xc8fca1 force_reg(machine_mode, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/explow.cc:687
>> 0x144b8cd riscv_force_temporary
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1531
>> 0x144b8cd riscv_force_address
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1528
>> 0x144b8cd riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:2387
>> 0x1af063e gen_movdf(rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.md:2107
>> 0xcba503 rtx_insn* insn_gen_fn::operator()> rtx_def*>(rtx_def*, rtx_def*) const
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/recog.h:411
>> 0xcba503 emit_move_insn_1(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:4164
>> 0x143d6c4 riscv_emit_move(rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1486
>> 0x143d6c4 riscv_save_reg
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:5715
>> 0x143e2b9 riscv_for_each_saved_reg
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:5904
>> 0x14480d0 riscv_expand_prologue()
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:6156
>> 0x1af57fb gen_prologue()
>>    

Re: Re: [PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-06-25 Thread Fei Gao
hi Jeff

Please see my earlier reply here.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310656.html

Maybe you scrolled past it in so many emails:)

BR, 
Fei


On 2023-06-25 21:36  Jeff Law  wrote:
>
>
>
>On 6/20/23 03:40, Fei Gao wrote:
>> gcc/ChangeLog:
>>
>>  * shrink-wrap.cc (try_shrink_wrapping_separate):call
>>    use_shrink_wrapping_separate.
>>  (use_shrink_wrapping_separate): wrap the condition
>>    check in use_shrink_wrapping_separate.
>>  * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>I'm still missing somethign here.
>
>Why doesn't the RISC-V target simply disable separate shrink wrapping by
>indicating no components are eligible in the relevant cases.  ie, I do
>not think we need another knob here.
>
>To be more concrete:
>
>> /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
>>  
>> static sbitmap
>> riscv_get_separate_components (void)
>> {  
>>   HOST_WIDE_INT offset;
>>   sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
>>   bitmap_clear (components);
>>  
>>   if (riscv_use_save_libcall (>machine->frame)
>>   || cfun->machine->interrupt_handler_p
>>   || !cfun->machine->frame.gp_sp_offset.is_constant ())
>> return components;
>Don't we get the behavior we want if we change this code to return an
>zero'd sbitmap?
>
>jeff

[PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-06-20 Thread Fei Gao
gcc/ChangeLog:

* shrink-wrap.cc (try_shrink_wrapping_separate):call
  use_shrink_wrapping_separate.
(use_shrink_wrapping_separate): wrap the condition
  check in use_shrink_wrapping_separate.
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern
---
 gcc/shrink-wrap.cc | 25 +
 gcc/shrink-wrap.h  |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index b8d7b557130..d534964321a 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -1776,16 +1776,14 @@ insert_prologue_epilogue_for_components (sbitmap 
components)
   commit_edge_insertions ();
 }
 
-/* The main entry point to this subpass.  FIRST_BB is where the prologue
-   would be normally put.  */
-void
-try_shrink_wrapping_separate (basic_block first_bb)
+bool
+use_shrink_wrapping_separate (void)
 {
   if (!(SHRINK_WRAPPING_ENABLED
-   && flag_shrink_wrap_separate
-   && optimize_function_for_speed_p (cfun)
-   && targetm.shrink_wrap.get_separate_components))
-return;
+&& flag_shrink_wrap_separate
+&& optimize_function_for_speed_p (cfun)
+&& targetm.shrink_wrap.get_separate_components))
+return false;
 
   /* We don't handle "strange" functions.  */
   if (cfun->calls_alloca
@@ -1794,6 +1792,17 @@ try_shrink_wrapping_separate (basic_block first_bb)
   || crtl->calls_eh_return
   || crtl->has_nonlocal_goto
   || crtl->saves_all_registers)
+return false;
+
+  return true;
+}
+
+/* The main entry point to this subpass.  FIRST_BB is where the prologue
+   would be normally put.  */
+void
+try_shrink_wrapping_separate (basic_block first_bb)
+{
+  if (!use_shrink_wrapping_separate ())
 return;
 
   /* Ask the target what components there are.  If it returns NULL, don't
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index 161647711a3..82386c2b712 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
 extern void try_shrink_wrapping_separate (basic_block first_bb);
+extern bool use_shrink_wrapping_separate (void);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
2.17.1



[PATCH 0/2] resolve confilct between RISC-V zcmp and shrink-wrap-separate

2023-06-20 Thread Fei Gao
These 2 patches resolve confilct between zcmp multi push/pop and
shrink-wrap-separate.

As per Kito's review comment
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310564.html,
I split the orginal patch into two parts: RISC-V part and
the rest part (shrink-wrap.h / shrink-wrap.cc).

Fei Gao (2):
  allow target to check shrink-wrap-separate enabled or not
  [RISC-V] resolve confilct between zcmp multi push/pop and
shrink-wrap-separate

 gcc/config/riscv/riscv.cc | 19 ---
 gcc/shrink-wrap.cc| 25 +
 gcc/shrink-wrap.h |  1 +
 3 files changed, 34 insertions(+), 11 deletions(-)

-- 
2.17.1



[PATCH 2/2] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-06-20 Thread Fei Gao
Disable zcmp multi push/pop if shrink-wrap-separate is active.

So in -Os that prefers smaller code size, by default shrink-wrap-separate
is disabled while zcmp multi push/pop is enabled.

And in -O2 and others that prefers speed, by default shrink-wrap-separate
is enabled while zcmp multi push/pop is disabled. To force enabling zcmp multi
push/pop in this case, -fno-shrink-wrap-separate has to be explictly given.

The following TC shows the issues in -O2 before this patch with both
shrink-wrap-separate and zcmp multi push/pop active.
1. duplicated store of s regs.
2. cm.push pushes ra, s0-s11 in reverse order than what normal
   prologue does, causing stack corruption and failure to resotre s regs.

TC: zcmp_shrink_wrap_separate.c included in this patch.

output asm before this patch:
calc_func:
cm.push {ra, s0-s3}, -32
...
beq a5,zero,.L2
...
.L2:
...
sw  s1,20(sp) //issue here
sw  s3,12(sp) //issue here
...
sw  s2,16(sp) //issue here

output asm after this patch:
calc_func:
addisp,sp,-32
sw  s0,24(sp)
...
beq a5,zero,.L2
...
.L2:
...
sw  s1,20(sp)
sw  s3,12(sp)
...
sw  s2,16(sp)

Signed-off-by: Fei Gao 
Co-Authored-By: Zhangjin Liao 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
  riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push): avoid multi push if shrink_wrapping_separate
  is active.
(riscv_get_separate_components): call 
riscv_avoid_shrink_wrapping_separate
---
 gcc/config/riscv/riscv.cc | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 26405b5978b..2cca5fbb62d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "cfgloop.h"
 #include "cfgrtl.h"
+#include "shrink-wrap.h"
 #include "sel-sched.h"
 #include "fold-const.h"
 #include "gimple-iterator.h"
@@ -389,6 +390,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* use_divmod_expansion */
 };
 
+static bool riscv_avoid_shrink_wrapping_separate ();
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
@@ -5032,6 +5034,8 @@ riscv_avoid_multi_push(const struct riscv_frame_info 
*frame)
   || cfun->machine->interrupt_handler_p
   || cfun->machine->varargs_size != 0
   || crtl->args.pretend_args_size != 0
+  || (use_shrink_wrapping_separate ()
+  && !riscv_avoid_shrink_wrapping_separate ())
   || (frame->mask & ~ MULTI_PUSH_GPR_MASK))
 return true;
 
@@ -6199,6 +6203,17 @@ riscv_epilogue_uses (unsigned int regno)
   return false;
 }
 
+static bool
+riscv_avoid_shrink_wrapping_separate ()
+{
+  if (riscv_use_save_libcall (>machine->frame)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+return true;
+
+  return false;
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
 
 static sbitmap
@@ -6208,9 +6223,7 @@ riscv_get_separate_components (void)
   sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
   bitmap_clear (components);
 
-  if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  if (riscv_avoid_shrink_wrapping_separate ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
-- 
2.17.1



Re: Re: [PATCH 3/4] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-06-12 Thread Fei Gao
On 2023-06-13 03:26  Jeff Law  wrote:
>
>
>
>On 6/6/23 23:52, Fei Gao wrote:
>> Disable zcmp multi push/pop if shrink-wrap-separate is active.
>>
>> So in -Os that prefers smaller code size, by default shrink-wrap-separate
>> is disabled while zcmp multi push/pop is enabled.
>>
>> And in -O2 and others that prefers speed, by default shrink-wrap-separate
>> is enabled while zcmp multi push/pop is disabled. To force enabling zcmp 
>> multi
>> push/pop in this case, -fno-shrink-wrap-separate has to be explictly given.
>>
>> The following TC shows the issues in -O2 before this patch with both
>> shrink-wrap-separate and zcmp multi push/pop active.
>> 1. duplicated store of s regs.
>> 2. cm.push pushes ra, s0-s11 in reverse order than what normal
>> prologue does, causing stack corruption and failure to resotre s regs.
>>
>> TC: zcmp_shrink_wrap_separate.c included in this patch.
>>
>> output asm before this patch:
>> calc_func:
>> cm.push  {ra, s0-s3}, -32
>> ...
>> beq  a5,zero,.L2
>> ...
>> .L2:
>> ...
>> sw   s1,20(sp) //issue here
>> sw   s3,12(sp) //issue here
>> ...
>> sw   s2,16(sp) //issue here
>>
>> output asm after this patch:
>> calc_func:
>> addi sp,sp,-32
>> sw   s0,24(sp)
>> ...
>> beq  a5,zero,.L2
>> ...
>> .L2:
>> ...
>> sw   s1,20(sp)
>> sw   s3,12(sp)
>> ...
>> sw   s2,16(sp)
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv.cc
>>  (riscv_avoid_shrink_wrapping_separate): wrap the condition check in
>>  riscv_avoid_shrink_wrapping_separate.
>>  (riscv_avoid_multi_push): avoid multi push if 
>>shrink_wrapping_separate
>>    is active.
>>  (riscv_get_separate_components): call 
>>riscv_avoid_shrink_wrapping_separate
>>  * shrink-wrap.cc (try_shrink_wrapping_separate): call
>>    use_shrink_wrapping_separate.
>>  (use_shrink_wrapping_separate):wrap the condition
>>    check in use_shrink_wrapping_separate
>>  * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
>>  * gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.
>I know Kito asked for this to be broken up into target dependent vs
>target independent changes, that's a good ask.
>
>Can't we utilize the get_separate_components hook to accomplish what
>you're trying to do?  ie, put the logic to avoid shrink wrapping for
>this case within the existing risc-v hook? 

Thank Jeff and Kito for your comments. 

My first try was to avoid shrink wrapping if zcmp is enabled.
But after discussion with Kito and Andrew Pinski, I realized it's better to 
disable
zcmp push and pops if shrink wrapping is active.
For detailed discussion, please check link below.
thread: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.
link: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg307203.html

I will go ahead with Kito's advice if you're fine with the current solution.
Thanks.

BR, 
Fei


>
>jeff

[PATCH 3/4] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-06-06 Thread Fei Gao
Disable zcmp multi push/pop if shrink-wrap-separate is active.

So in -Os that prefers smaller code size, by default shrink-wrap-separate
is disabled while zcmp multi push/pop is enabled.

And in -O2 and others that prefers speed, by default shrink-wrap-separate
is enabled while zcmp multi push/pop is disabled. To force enabling zcmp multi
push/pop in this case, -fno-shrink-wrap-separate has to be explictly given.

The following TC shows the issues in -O2 before this patch with both
shrink-wrap-separate and zcmp multi push/pop active.
1. duplicated store of s regs.
2. cm.push pushes ra, s0-s11 in reverse order than what normal
   prologue does, causing stack corruption and failure to resotre s regs.

TC: zcmp_shrink_wrap_separate.c included in this patch.

output asm before this patch:
calc_func:
cm.push {ra, s0-s3}, -32
...
beq a5,zero,.L2
...
.L2:
...
sw  s1,20(sp) //issue here
sw  s3,12(sp) //issue here
...
sw  s2,16(sp) //issue here

output asm after this patch:
calc_func:
addisp,sp,-32
sw  s0,24(sp)
...
beq a5,zero,.L2
...
.L2:
...
sw  s1,20(sp)
sw  s3,12(sp)
...
sw  s2,16(sp)
gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_avoid_shrink_wrapping_separate): wrap the condition check in
riscv_avoid_shrink_wrapping_separate.
(riscv_avoid_multi_push): avoid multi push if shrink_wrapping_separate
  is active.
(riscv_get_separate_components): call 
riscv_avoid_shrink_wrapping_separate
* shrink-wrap.cc (try_shrink_wrapping_separate): call
  use_shrink_wrapping_separate.
(use_shrink_wrapping_separate):wrap the condition
  check in use_shrink_wrapping_separate 
* shrink-wrap.h (use_shrink_wrapping_separate): add to extern

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
* gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.

Signed-off-by: Fei Gao 
Co-Authored-By: Zhangjin Liao 
---
 gcc/config/riscv/riscv.cc | 19 +++-
 gcc/shrink-wrap.cc| 25 +++--
 gcc/shrink-wrap.h |  1 +
 .../riscv/zcmp_shrink_wrap_separate.c | 97 +++
 .../riscv/zcmp_shrink_wrap_separate2.c| 97 +++
 5 files changed, 228 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f60c241a526..b505cdeca34 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "cfgloop.h"
 #include "cfgrtl.h"
+#include "shrink-wrap.h"
 #include "sel-sched.h"
 #include "fold-const.h"
 #include "gimple-iterator.h"
@@ -389,6 +390,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* use_divmod_expansion */
 };
 
+static bool riscv_avoid_shrink_wrapping_separate ();
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
@@ -4910,6 +4912,8 @@ riscv_avoid_multi_push(const struct riscv_frame_info 
*frame)
   || cfun->machine->interrupt_handler_p
   || cfun->machine->varargs_size != 0
   || crtl->args.pretend_args_size != 0
+  || (use_shrink_wrapping_separate ()
+  && !riscv_avoid_shrink_wrapping_separate ())
   || (frame->mask & ~ MULTI_PUSH_GPR_MASK))
 return true;
 
@@ -6077,6 +6081,17 @@ riscv_epilogue_uses (unsigned int regno)
   return false;
 }
 
+static bool
+riscv_avoid_shrink_wrapping_separate ()
+{
+  if (riscv_use_save_libcall (>machine->frame)
+  || cfun->machine->interrupt_handler_p
+  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+return true;
+
+  return false;
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
 
 static sbitmap
@@ -6086,9 +6101,7 @@ riscv_get_separate_components (void)
   sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
   bitmap_clear (components);
 
-  if (riscv_use_save_libcall (>machine->frame)
-  || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  if (riscv_avoid_shrink_wrapping_separate ())
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index b8d7b557130..d534964321a 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ 

[PATCH 4/4] [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

2023-06-06 Thread Fei Gao
From: Die Li 

Signed-off-by: Die Li 
Co-Authored-By: Fei Gao 

gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s): New pattern.
(*mvsa01): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.
---
 gcc/config/riscv/peephole.md| 28 +
 gcc/config/riscv/predicates.md  | 11 
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/zc.md  | 22 
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c | 21 
 5 files changed, 83 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c

diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index 67e7046d7e6..e8cb1ba4838 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -94,3 +94,31 @@
 {
   th_mempair_order_operands (operands, true, SImode);
 })
+
+;; ZCMP
+(define_peephole2
+  [(set (match_operand:X 0 "a0a1_reg_operand")
+(match_operand:X 1 "zcmp_mv_sreg_operand"))
+   (set (match_operand:X 2 "a0a1_reg_operand")
+(match_operand:X 3 "zcmp_mv_sreg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[2]) != REGNO (operands[0]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
+
+(define_peephole2
+  [(set (match_operand:X 0 "zcmp_mv_sreg_operand")
+(match_operand:X 1 "a0a1_reg_operand"))
+   (set (match_operand:X 2 "zcmp_mv_sreg_operand")
+(match_operand:X 3 "a0a1_reg_operand"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[0]) != REGNO (operands[2]))
+   && (REGNO (operands[1]) != REGNO (operands[3]))"
+  [(parallel [(set (match_dup 0)
+   (match_dup 1))
+  (set (match_dup 2)
+   (match_dup 3))])]
+)
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index a1b9367b997..6d5e8630cb5 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -207,6 +207,17 @@
   (and (match_code "const_int")
(match_test "riscv_zcmp_valid_stack_adj_bytes_p (INTVAL (op), 13)")))
 
+;; ZCMP predicates
+(define_predicate "a0a1_reg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "IN_RANGE (REGNO (op), A0_REGNUM, A1_REGNUM)")))
+
+(define_predicate "zcmp_mv_sreg_operand"
+  (and (match_operand 0 "register_operand")
+   (match_test "TARGET_RVE ? IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+: IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
+|| IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 02802d2685d..25bc3e6ab4c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -121,6 +121,7 @@
(S0_REGNUM  8)
(S1_REGNUM  9)
(A0_REGNUM  10)
+   (A1_REGNUM  11)
(S2_REGNUM  18)
(S3_REGNUM  19)
(S4_REGNUM  20)
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
index 217e115035b..bb4975cd333 100644
--- a/gcc/config/riscv/zc.md
+++ b/gcc/config/riscv/zc.md
@@ -1433,3 +1433,25 @@
   "TARGET_ZCMP"
   "cm.push {ra, s0-s11}, %0"
 )
+
+;; ZCMP mv
+(define_insn "*mva01s"
+  [(set (match_operand:X 0 "a0a1_reg_operand" "=r")
+(match_operand:X 1 "zcmp_mv_sreg_operand" "r"))
+   (set (match_operand:X 2 "a0a1_reg_operand" "=r")
+(match_operand:X 3 "zcmp_mv_sreg_operand" "r"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[2]) != REGNO (operands[0]))"
+  { return (REGNO (operands[0]) == 
A0_REGNUM)?"cm.mva01s\t%1,%3":"cm.mva01s\t%3,%1"; }
+  [(set_attr "mode" "")])
+
+(define_insn "*mvsa01"
+  [(set (match_operand:X 0 "zcmp_mv_sreg_operand" "=r")
+(match_operand:X 1 "a0a1_reg_operand" "r"))
+   (set (match_operand:X 2 "zcmp_mv_sreg_operand" "=r")
+(match_operand:X 3 "a0a1_reg_operand" "r"))]
+  "TARGET_ZCMP
+   && (REGNO (operands[0]) != REGNO (operands[2]))
+   && (REGNO (operands[1])

[PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-06 Thread Fei Gao
Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2.

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch.

Signed-off-by: Fei Gao 

gcc/ChangeLog:

* config/riscv/iterators.md
slot0_offset: slot 0 offset in stack GPRs area in bytes
slot1_offset: slot 1 offset in stack GPRs area in bytes
slot2_offset: likewise
slot3_offset: likewise
slot4_offset: likewise
slot5_offset: likewise
slot6_offset: likewise
slot7_offset: likewise
slot8_offset: likewise
slot9_offset: likewise
slot10_offset: likewise
slot11_offset: likewise
slot12_offset: likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.
---
 gcc/config/riscv/iterators.md |   15 +
 gcc/config/riscv/predicates.md|   96 ++
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv.cc |  360 +-
 gcc/config/riscv/riscv.h  |   23 +
 gcc/config/riscv/riscv.md |2 +
 gcc/config/riscv/zc.md| 1042 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  239 
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  239 
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   23 +
 10 files changed, 2000 insertions

[PATCH 2/4] [RISC-V] support cm.popretz in zcmp

2023-06-06 Thread Fei Gao
Generate cm.popretz instead of cm.popret if return value is 0.

Signed-off-by: Fei Gao 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md:
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_): md for popretz ra
(@gpr_multi_popretz_up_to_s0_): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_): likewise
(@gpr_multi_popretz_up_to_s2_): likewise
(@gpr_multi_popretz_up_to_s3_): likewise
(@gpr_multi_popretz_up_to_s4_): likewise
(@gpr_multi_popretz_up_to_s5_): likewise
(@gpr_multi_popretz_up_to_s6_): likewise
(@gpr_multi_popretz_up_to_s7_): likewise
(@gpr_multi_popretz_up_to_s8_): likewise
(@gpr_multi_popretz_up_to_s9_): likewise
(@gpr_multi_popretz_up_to_s11_): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i
---
 gcc/config/riscv/riscv.cc   | 114 --
 gcc/config/riscv/riscv.md   |   1 +
 gcc/config/riscv/zc.md  | 393 
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c |  12 +
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c |  12 +
 5 files changed, 508 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c476c699f4c..f60c241a526 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -435,6 +435,7 @@ typedef enum
   PUSH_IDX = 0,
   POP_IDX,
   POPRET_IDX,
+  POPRETZ_IDX,
   ZCMP_OP_NUM
 } riscv_zcmp_op_t;
 
@@ -5535,30 +5536,30 @@ riscv_emit_stack_tie (void)
 /*zcmp multi push and pop code_for_push_pop function ptr array  */
 const code_for_push_pop_t code_for_push_pop [ZCMP_MAX_GRP_SLOTS][ZCMP_OP_NUM] 
= {
   {code_for_gpr_multi_push_up_to_ra,code_for_gpr_multi_pop_up_to_ra,
-   code_for_gpr_multi_popret_up_to_ra},
+   code_for_gpr_multi_popret_up_to_ra,  code_for_gpr_multi_popretz_up_to_ra},
   {code_for_gpr_multi_push_up_to_s0,code_for_gpr_multi_pop_up_to_s0,
-   code_for_gpr_multi_popret_up_to_s0},
+   code_for_gpr_multi_popret_up_to_s0,  code_for_gpr_multi_popretz_up_to_s0},
   {code_for_gpr_multi_push_up_to_s1,code_for_gpr_multi_pop_up_to_s1,
-   code_for_gpr_multi_popret_up_to_s1},
+   code_for_gpr_multi_popret_up_to_s1,  code_for_gpr_multi_popretz_up_to_s1},
   {code_for_gpr_multi_push_up_to_s2,code_for_gpr_multi_pop_up_to_s2,
-   code_for_gpr_multi_popret_up_to_s2},
+   code_for_gpr_multi_popret_up_to_s2,  code_for_gpr_multi_popretz_up_to_s2},
   {code_for_gpr_multi_push_up_to_s3,code_for_gpr_multi_pop_up_to_s3,
-   code_for_gpr_multi_popret_up_to_s3},
+   code_for_gpr_multi_popret_up_to_s3,  code_for_gpr_multi_popretz_up_to_s3},
   {code_for_gpr_multi_push_up_to_s4,code_for_gpr_multi_pop_up_to_s4,
-   code_for_gpr_multi_popret_up_to_s4},
+   code_for_gpr_multi_popret_up_to_s4,  code_for_gpr_multi_popretz_up_to_s4},
   {code_for_gpr_multi_push_up_to_s5,code_for_gpr_multi_pop_up_to_s5,
-   code_for_gpr_multi_popret_up_to_s5},
+   code_for_gpr_multi_popret_up_to_s5,  code_for_gpr_multi_popretz_up_to_s5},
   {code_for_gpr_multi_push_up_to_s6,code_for_gpr_multi_pop_up_to_s6,
-   code_for_gpr_multi_popret_up_to_s6},
+   code_for_gpr_multi_popret_up_to_s6,  code_for_gpr_multi_popretz_up_to_s6},
   {code_for_gpr_multi_push_up_to_s7,code_for_gpr_multi_pop_up_to_s7,
-   code_for_gpr_multi_popret_up_to_s7},
+   code_for_gpr_multi_popret_up_to_s7,  code_for_gpr_multi_popretz_up_to_s7},
   {code_for_gpr_multi_push_up_to_s8,code_for_gpr_multi_pop_up_to_s8,
-   code_for_gpr_multi_popret_up_to_s8},
+   code_for_gpr_multi_popret_up_to_s8,  code_for_gpr_multi_popretz_up_to_s8},
   {code_for_gpr_multi_push_up_to_s9,code_for_gpr_multi_pop_up_to_s9,
-   code_for_gpr_multi_popret_up_to_s9},
-  {nullptr, nullptr, nullptr},
+   code_for_gpr_multi_popret_up_to_s9,  code_for_gpr_multi_popretz_up_to_s9},
+  {nullptr, nullptr, nullptr, nullptr},
   {code_for_gpr_multi_push_up_to_s11,   code_for_gpr_multi_pop_up_to_s11,
-   code_for_gpr_multi_popret_up_to_s11}};
+   code_for_gpr_multi_popret_up_to_s11, code_for_gpr_multi_popretz_up_to_s11}};
 
 static rtx
 riscv_gen_multi_push_pop_insn (riscv_zcmp_op_t op, HOST_WIDE_INT adj_size,
@@ -5747,6 +5748,80 @@ riscv_adjust_libcall_cfi_epilogue ()
   return dwarf;
 }
 
+/* return true if popretz pattern can be matched.
+   set (reg 10 a0) (const_int 0)
+   use (reg 10 a0)
+   NOTE_INSN_EPILOGUE_BEG  */
+static rtx_insn *
+riscv_zcmp_can_use_popretz(void)
+{
+  rtx_insn *insn = NULL, *use = NULL, *clear = NULL;
+
+  /* sequence stack for NOTE_INSN_EPILOGUE_BEG*/
+  struct sequence_stack * outer_seq = get_current_sequence ()->n

[PATCH 0/4] [RISC-V] support zcmp extention

2023-06-06 Thread Fei Gao
please be noted the series depend on the zcmp switch that Jiawei posted
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615289.html

The 1st patch is a follow up on Kito's V3 review. 
Others are new.

Fei Gao (4):
  [RISC-V] support cm.push cm.pop cm.popret in zcmp
  [RISC-V] support cm.popretz in zcmp
  [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate
  [RISC-V] support cm.mva01s cm.mvsa01 in zcmp

 gcc/config/riscv/iterators.md |   15 +
 gcc/config/riscv/peephole.md  |   28 +
 gcc/config/riscv/predicates.md|  107 ++
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv.cc |  445 -
 gcc/config/riscv/riscv.h  |   23 +
 gcc/config/riscv/riscv.md |4 +
 gcc/config/riscv/zc.md| 1457 +
 gcc/shrink-wrap.cc|   25 +-
 gcc/shrink-wrap.h |1 +
 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c   |   21 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  251 +++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  251 +++
 .../riscv/zcmp_shrink_wrap_separate.c |   97 ++
 .../riscv/zcmp_shrink_wrap_separate2.c|   97 ++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   23 +
 16 files changed, 2795 insertions(+), 51 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/cm_mv_rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1



[PATCH] [RISC-V] correct machine mode in save-restore cfi RTL.

2023-06-05 Thread Fei Gao
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): use Pmode 
for cfi reg/mem machmode
(riscv_adjust_libcall_cfi_epilogue): use Pmode for cfi reg machmode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-cfi-2.c: New test to check machmode for 
cfi reg/mem.
---
 gcc/config/riscv/riscv.cc|  6 +++---
 .../gcc.target/riscv/save-restore-cfi-2.c| 16 
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index caa7858b864..9eafd281260 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5370,8 +5370,8 @@ riscv_adjust_libcall_cfi_prologue ()
else
  offset = saved_size - ((regno - S2_REGNUM + 4) * UNITS_PER_WORD);
 
-   reg = gen_rtx_REG (SImode, regno);
-   mem = gen_frame_mem (SImode, plus_constant (Pmode,
+   reg = gen_rtx_REG (Pmode, regno);
+   mem = gen_frame_mem (Pmode, plus_constant (Pmode,
stack_pointer_rtx,
offset));
 
@@ -5510,7 +5510,7 @@ riscv_adjust_libcall_cfi_epilogue ()
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
 if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
   {
-   reg = gen_rtx_REG (SImode, regno);
+   reg = gen_rtx_REG (Pmode, regno);
dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
   }
 
diff --git a/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c 
b/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c
new file mode 100644
index 000..44d805b4de8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/save-restore-cfi-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-pro_and_epilogue -O2 -march=rv64gc -mabi=lp64d 
-msave-restore -mcmodel=medany" } */
+/* { dg-skip-if "" { *-*-* } {"-Os" "-O1" "-O0" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { scan-rtl-dump {expr_list:REG_CFA_OFFSET \(set \(mem/c:DI} 
"pro_and_epilogue" } } */
+/* { dg-final { scan-rtl-dump {expr_list:REG_CFA_RESTORE \(reg:DI 8 s0\)} 
"pro_and_epilogue" } } */
+
+char my_getchar();
+float getf();
+
+int foo()
+{
+  int s0 = my_getchar();
+  float f0 = getf();
+  int b = my_getchar();
+  return f0 + s0 + b;
+}
-- 
2.17.1



Re: Re: [PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-05 Thread Fei Gao
Thanks Kito. 
I will propose V4 and also make a separate patch to fix 
riscv_adjust_libcall_cfi_prologue. 

BR, 
Fei

On 2023-06-05 16:31  Kito Cheng  wrote:
>
>Only a few minor comments, otherwise LGTM :)
>
>But I guess we need to wait until binutils merge zc stuff.
>
>> Zcmp can share the same logic as save-restore in stack allocation: 
>> pre-allocation
>> by cm.push, step 1 and step 2.
>>
>> please be noted cm.push pushes ra, s0-s11 in reverse order than what 
>> save-restore does.
>> So adaption has been done in .cfi directives in my patch.
>>
>> Signed-off-by: Fei Gao 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/iterators.md (-8): slot offset in bytes
>> (-16): likewise
>> (-24): likewise
>> (-32): likewise
>> (-40): likewise
>> (-48): likewise
>> (-56): likewise
>> (-64): likewise
>> (-72): likewise
>> (-80): likewise
>> (-88): likewise
>> (-96): likewise
>> (-104): likewise
>
>Use slot0_offset...slot12_offset. 
>
>> @@ -422,6 +430,16 @@ static const struct riscv_tune_info 
>> riscv_tune_info_table[] = {
>>  #include "riscv-cores.def"
>>  };
>>
>> +typedef enum
>> +{
>> +  PUSH_IDX = 0,
>> +  POP_IDX,
>> +  POPRET_IDX,
>> +  ZCMP_OP_NUM
>> +} op_idx;
>
>op_idx -> riscv_zcmp_op_t 
>> @@ -5388,6 +5487,42 @@ riscv_adjust_libcall_cfi_prologue ()
>>    return dwarf;
>>  }
>>
>> +static rtx
>> +riscv_adjust_multi_push_cfi_prologue (int saved_size)
>> +{
>> +  rtx dwarf = NULL_RTX;
>> +  rtx adjust_sp_rtx, reg, mem, insn;
>> +  unsigned int mask = cfun->machine->frame.mask;
>> +  int offset;
>> +  int saved_cnt = 0;
>> +
>> +  if (mask & S10_MASK)
>> +    mask |= S11_MASK;
>> +
>> +  for (int regno = GP_REG_LAST; regno >= GP_REG_FIRST; regno--)
>> +    if (BITSET_P (mask & MULTI_PUSH_GPR_MASK, regno - GP_REG_FIRST))
>> +  {
>> +    /* The save order is s11-s0, ra
>> +   from high to low addr.  */
>> +    offset = saved_size - UNITS_PER_WORD * (++saved_cnt);
>> +
>> +    reg = gen_rtx_REG (SImode, regno);
>
>Should be Pmode rather than SImode, and seems
>riscv_adjust_libcall_cfi_prologue has same issue...could you send a
>separate patch to fix that? 
>
>> +    mem = gen_frame_mem (SImode, plus_constant (Pmode,
>
>Same here.
>
>> +    stack_pointer_rtx,
>> +    offset));
>> +
>> +    insn = gen_rtx_SET (mem, reg);
>> +    dwarf = alloc_reg_note (REG_CFA_OFFSET, insn, dwarf);
>> +  }
>> +
>> +  /* Debug info for adjust sp.  */
>> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
>> +   plus_constant(Pmode, stack_pointer_rtx, 
>> -saved_size));
>> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>> +  dwarf);
>> +  return dwarf;
>> +}
>> +
>>  static void
>>  riscv_emit_stack_tie (void)
>>  {
>
>
>> @@ -5493,6 +5697,32 @@ riscv_expand_prologue (void)
>>  }
>>  }
>>
>> +static rtx
>> +riscv_adjust_multi_pop_cfi_epilogue (int saved_size)
>> +{
>> +  rtx dwarf = NULL_RTX;
>> +  rtx adjust_sp_rtx, reg;
>> +  unsigned int mask = cfun->machine->frame.mask;
>> +
>> +  if (mask & S10_MASK)
>> +    mask |= S11_MASK;
>> +
>> +  /* Debug info for adjust sp.  */
>> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
>> +   plus_constant(Pmode, stack_pointer_rtx, 
>> saved_size));
>> +  dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>> +  dwarf);
>> +
>> +  for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
>> +    if (BITSET_P (mask, regno - GP_REG_FIRST))
>> +  {
>> +    reg = gen_rtx_REG (SImode, regno);
>
>Pmode
>
>> +    dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>> +  }
>> +
>> +  return dwarf;
>> +}
>> +
>>  static rtx
>>  riscv_adjust_libcall_cfi_epilogue ()
>>  {
>
>> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>> new file mode 100644
>> index 000..f2f2198598c
>> --- /dev/null
>> +++ b/gcc/config/riscv/zc.md
>> @@ -0,0 +1,1042 @@
>> +;; Machine description for RISC-V Zc extention.
>> +;; Copyright (C) 2011-2023 Free Software Foundation, Inc.
>
>2023 rather than 2011-2023

[PATCH] [RISC-V] add TC for save-restore cfi directives.

2023-06-05 Thread Fei Gao
gcc/testsuite/ChangeLog:

* gcc.target/riscv/save-restore-cfi.c: New test to check save-restore 
cfi directives.
---
 .../gcc.target/riscv/save-restore-cfi.c | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/save-restore-cfi.c

diff --git a/gcc/testsuite/gcc.target/riscv/save-restore-cfi.c 
b/gcc/testsuite/gcc.target/riscv/save-restore-cfi.c
new file mode 100644
index 000..a39f3060981
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/save-restore-cfi.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-g -Os -march=rv32imafc -mabi=ilp32f -msave-restore 
-mcmodel=medlow" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 16} 2} } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 32} 1} } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1} } */
+
+char my_getchar();
+float getf();
+
+int foo()
+{
+  int s0 = my_getchar();
+  float f0 = getf();
+  int b = my_getchar();
+  return f0 + s0 + b;
+}
-- 
2.17.1



[PATCH 2/2] [V3] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-02 Thread Fei Gao
Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2.

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch.

Signed-off-by: Fei Gao 

gcc/ChangeLog:

* config/riscv/iterators.md (-8): slot offset in bytes
(-16): likewise
(-24): likewise
(-32): likewise
(-40): likewise
(-48): likewise
(-56): likewise
(-64): likewise
(-72): likewise
(-80): likewise
(-88): likewise
(-96): likewise
(-104): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates for stack adjust of pushing ra
(stack_push_up_to_s0_operand): predicates for stack adjust of pushing 
ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates for stack adjust of poping ra
(stack_pop_up_to_s0_operand): predicates for stack adjust of poping ra, 
s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h 
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment size is 
valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.
---
 gcc/config/riscv/iterators.md |   15 +
 gcc/config/riscv/predicates.md|   96 ++
 gcc/config/riscv/riscv-protos.h   |1 +
 gcc/config/riscv/riscv.cc |  360 +-
 gcc/config/riscv/riscv.h  |   23 +
 gcc/config/riscv/riscv.md |2 +
 gcc/config/riscv/zc.md| 1042 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  239 
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  239 
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   23 +
 10 files changed, 2000 insertions(+), 40 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode

[PATCH 1/2] [RISC-V] fix cfi issue in save-restore.

2023-06-02 Thread Fei Gao
This patch fixes a cfi issue introduced by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20

Test code:
char my_getchar();
float getf();
int test_f0()
{
  int s0 = my_getchar();
  float f0 = getf();
  int b = my_getchar();
  return f0+s0+b;
}

cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow

before patch:
test_f0:
...
.cfi_startproc
callt0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addisp,sp,-16
.cfi_def_cfa_offset 32

...

addisp,sp,16
.cfi_def_cfa_offset 0  // issue here
...
tail__riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset -16 // issue here
.cfi_endproc

after patch:
test_f0:
...
.cfi_startproc
callt0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addisp,sp,-16
.cfi_def_cfa_offset 32

...

addisp,sp,16
.cfi_def_cfa_offset 16  // corrected here
...
tail__riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset 0 // corrected here
.cfi_endproc

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with 
correct offset.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85db1e3c86b..469af02bdf7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5652,7 +5652,7 @@ riscv_expand_epilogue (int style)
   adjust));
  rtx dwarf = NULL_RTX;
  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-GEN_INT (step2));
+GEN_INT (step2 + libcall_size));
 
  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
  RTX_FRAME_RELATED_P (insn) = 1;
@@ -5689,7 +5689,7 @@ riscv_expand_epilogue (int style)
 
   rtx dwarf = NULL_RTX;
   rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-const0_rtx);
+GEN_INT (libcall_size ));
   dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
   RTX_FRAME_RELATED_P (insn) = 1;
 
-- 
2.17.1



Re: Re: [PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-06-01 Thread Fei Gao
On 2023-05-31 18:13  Kito Cheng  wrote:
>
>> >[1] 
>> >https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jia...@iscas.ac.cn/
>> Thanks for your review.
>>
>> The md file looks verbose with bunch of *_offset_operand and 
>> stack_push_up_to_*_operand, but it significantly
>> simplies implementation of recognizing zmcp push and pop insns and 
>> outputting assembly.  Also, the md file
>> clearly shows and checks the slot that each register is placed(different to 
>> slot order w/o save-restore before
>> zcmp is introduced). So I prefer my patch V2 to V1 or the link you attached. 
>> But ideas are welcome to make
>> it better. Appreciated if you suggest more details for the improvement.
>
>Got your point, and share an idea to simplify that:
>
>struct code_for_push_pop_t {
>   insn_code (*push)(machine_mode);
>   insn_code (*pop)(machine_mode);
>   insn_code (*pop_ret)(machine_mode);
>};
>const code_for_push_pop_t code_for_push_pop [/*ZCMP_MAX_GRP_SLOTS*/2] = {
>    {code_for_gpr_multi_pop_up_to_ra, /*FIXME*/nullptr, /*FIXME*/nullptr},
>    {code_for_gpr_multi_pop_up_to_s0, /*FIXME*/nullptr, /*FIXME*/nullptr}
>};
>
>static rtx
>riscv_gen_multi_push_pop_insn (op_idx op, HOST_WIDE_INT adj_size,
>unsigned int regs_num)
>{
>  rtx stack_adj = GEN_INT (adj_size);
>
>  return GEN_FCN (code_for_push_pop[regs_num].push(Pmode)) (stack_adj);
>}
>
>(define_mode_attr slot0_offset [(SI "0") (DI "0")])
>(define_mode_attr slot1_offset [(SI "4") (DI "8")])
>
>(define_insn "@gpr_multi_pop_up_to_ra"
>  [(set (reg:X SP_REGNUM)
>    (plus:X (reg:X SP_REGNUM)
> (match_operand 0 "stack_pop_up_to_ra_operand" "I")))
>   (set (reg:X RETURN_ADDR_REGNUM)
>    (mem:X (plus:X (reg:X SP_REGNUM)
>   (const_int ]
>  "TARGET_ZCMP"
>  "cm.pop   {ra}, %0"
>)
>
>(define_insn "@gpr_multi_pop_up_to_s0"
>  [(set (reg:X SP_REGNUM)
>    (plus:X (reg:X SP_REGNUM)
> (match_operand 0 "stack_pop_up_to_s0_operand" "I")))
>   (set (reg:X S0_REGNUM)
>    (mem:X (plus:X (reg:X SP_REGNUM)
>   (const_int 
>   (set (reg:X RETURN_ADDR_REGNUM)
>    (mem:X (plus:X (reg:X SP_REGNUM)
>   (const_int ]
>  "TARGET_ZCMP"
>  "cm.pop   {ra, s0}, %0"
>) 

Perfect. 
Working on it. 

>
>
>
>> >> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
>> >>    adjust));
>> >>   rtx dwarf = NULL_RTX;
>> >>   rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> >> -    GEN_INT (step2));
>> >> +    GEN_INT (step2 + 
>> >> libcall_size + multipop_size));
>> >
>> >Why we need `+ libcall_size` here? or...why we don't need that before?
>> It's a good catch:)
>> I should have  added `+ libcall_size` in
>> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
>>
>> That's why I corrected the cfi issue in save-restore along with zcmp changes 
>> in this patch.
>
>I would like to have a separate patch to fix this bug instead of
>hidden in this patch. 
sure,  I will make  a separate patch. 

Thanks & BR, 
Fei

Re: Re: [PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-30 Thread Fei Gao
v_adjust_multi_pop_cfi_epilogue (int saved_size)
>+{
>+ rtx dwarf = NULL_RTX;
>+ rtx adjust_sp_rtx, reg;
>+ unsigned int mask = cfun->machine->frame.mask;
>+
>+ if (mask & S10_MASK)
>+ mask |= S11_MASK;
>+
>+ /* Debug info for adjust sp. */
>+ adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
>+ plus_constant(Pmode, stack_pointer_rtx, saved_size));
>+ dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>+ dwarf);
>+
>+ for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
>+ if (BITSET_P (mask, regno - GP_REG_FIRST))
>+ {
>+ reg = gen_rtx_REG (SImode, regno);
>+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>+ }
>+
>+ return dwarf;
>+}
>+
> static rtx
> riscv_adjust_libcall_cfi_epilogue ()
> {
>@@ -5500,10 +5842,18 @@ riscv_expand_epilogue (int style)
> struct riscv_frame_info *frame = >machine->frame;
> unsigned mask = frame->mask;
> HOST_WIDE_INT step2 = 0;
>- bool use_restore_libcall = ((style == NORMAL_RETURN)
>- && riscv_use_save_libcall (frame));
>- unsigned libcall_size = (use_restore_libcall
>- ? frame->save_libcall_adjustment : 0);
>+ bool use_multi_pop_normal = ((style == NORMAL_RETURN)
>+ && riscv_use_multi_push (frame));
>+ bool use_multi_pop_sibcall = ((style == SIBCALL_RETURN)
>+ && riscv_use_multi_push (frame));
>+ bool use_multi_pop = use_multi_pop_normal || use_multi_pop_sibcall;
>+
>+ bool use_restore_libcall = !use_multi_pop && ((style == NORMAL_RETURN)
>+ && riscv_use_save_libcall (frame));
>+ unsigned libcall_size = use_restore_libcall && !use_multi_pop ?
>+ frame->save_libcall_adjustment : 0;
>+ unsigned multipop_size = use_multi_pop ?
>+ frame->multi_push_adj_base + frame->multi_push_adj_addi : 0;
> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
> rtx insn;
>@@ -5574,18 +5924,25 @@ riscv_expand_epilogue (int style)
> REG_NOTES (insn) = dwarf;
> }
>- if (use_restore_libcall)
>- frame->mask = 0; /* Temporarily fib for GPRs. */
>+ if (use_restore_libcall || use_multi_pop)
>+ frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
> /* If we need to restore registers, deallocate as much stack as
> possible in the second step without going out of range. */
>- if ((frame->mask | frame->fmask) != 0)
>+ if (use_multi_pop)
>+ {
>+ if (frame->fmask
>+ && known_gt (frame->total_size - multipop_size,
>+ frame->frame_pointer_offset))
>+ step2 = riscv_first_stack_step (frame, frame->total_size - multipop_size);
>+ }
>+ else if ((frame->mask | frame->fmask) != 0)
> step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
>- if (use_restore_libcall)
>+ if (use_restore_libcall || use_multi_pop)
> frame->mask = mask; /* Undo the above fib. */
>- poly_int64 step1 = frame->total_size - step2 - libcall_size;
>+ poly_int64 step1 = frame->total_size - step2 - libcall_size - multipop_size ;
> /* Set TARGET to BASE + STEP1. */
> if (known_gt (step1, 0))
>@@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
> adjust));
> rtx dwarf = NULL_RTX;
> rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>- GEN_INT (step2));
>+ GEN_INT (step2 + libcall_size + multipop_size));
> dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> RTX_FRAME_RELATED_P (insn) = 1;
>@@ -5635,15 +5992,15 @@ riscv_expand_epilogue (int style)
> epilogue_cfa_sp_offset = step2;
> }
>- if (use_restore_libcall)
>+ if (use_restore_libcall || use_multi_pop)
> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
> /* Restore the registers. */
>- riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
>+ riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - 
>multipop_size,
> riscv_restore_reg,
> true, style == EXCEPTION_RETURN);
>- if (use_restore_libcall)
>+ if (use_restore_libcall || use_multi_pop)
> frame->mask = mask; /* Undo the above fib. */
> if (need_barrier_p)
>@@ -5657,14 +6014,30 @@ riscv_expand_epilogue (int style)
> rtx dwarf = NULL_RTX;
> rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>- const0_rtx);
>+ GEN_INT (libcall_size + multipop_size));
> dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> RTX_FRAME_RELATED_P (insn) = 1;
> REG_NOTES (insn) = dwarf;
> }
>- if (use_restore_libcall)
>+ if (use_multi_pop)
>+ {
>+ unsigned regs_count = riscv_multi_push_regs_count (frame->mask);
>+ if (use_multi_pop_normal)
>+ insn = emit_jump_insn (
>+ riscv_gen_multi_push_pop_insn (POPRET_IDX, multipop_size, regs_count));
>+ else
>+ insn= emit_insn (
>+ riscv_gen_multi_push_pop_insn(POP_IDX, multipop_size, re

Re: Re: [PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-30 Thread Fei Gao
On 2023-05-29 11:05  Kito Cheng  wrote:
>
>Thanks for this patch, just few minor comment, I think this is pretty
>close to accept :)
>
>Could you reference JiaWei's match_parallel[1] to prevent adding bunch
>of *_offset_operand and stack_push_up_to_*_operand?
>
>
>[1] 
>https://patchwork.sourceware.org/project/gcc/patch/20230406062118.47431-5-jia...@iscas.ac.cn/
Thanks for your review. 

The md file looks verbose with bunch of *_offset_operand and 
stack_push_up_to_*_operand, but it significantly
simplies implementation of recognizing zmcp push and pop insns and outputting 
assembly.  Also, the md file
clearly shows and checks the slot that each register is placed(different to 
slot order w/o save-restore before
zcmp is introduced). So I prefer my patch V2 to V1 or the link you attached. 
But ideas are welcome to make
it better. Appreciated if you suggest more details for the improvement.

>
>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 629e5e45cac..a0a2db1f594 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -117,6 +117,14 @@ struct GTY(())  riscv_frame_info {
>>    /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>>    unsigned save_libcall_adjustment;
>>
>> +  /* the minimum number of bytes, in multiples of 16-byte address 
>> increments,
>> + required to cover the registers in a multi push & pop.  */
>> +  unsigned multi_push_adj_base;
>> +
>> +  /* the number of additional 16-byte address increments allocated for the 
>> stack frame
>> + in a multi push & pop.  */
>> +  unsigned multi_push_adj_addi;
>> +
>>    /* Offsets of fixed-point and floating-point save areas from frame bottom 
>>*/
>>    poly_int64 gp_sp_offset;
>>    poly_int64 fp_sp_offset;
>> @@ -413,6 +421,21 @@ static const struct riscv_tune_info 
>> riscv_tune_info_table[] = {
>>  #include "riscv-cores.def"
>>  };
>>
>> +typedef enum
>> +{
>> +  SI_IDX = 0,
>> +  DI_IDX,
>> +  MAX_MODE_IDX = DI_IDX
>> +} mode_idx;
>> +
>
>Didn't see any use in this version? 
It's used in defining the array below.
const insn_gen_fn gen_push_pop [MAX_OP_IDX + 1][MAX_MODE_IDX + 
1][ZCMP_MAX_GRP_SLOTS]

>
>> @@ -5574,18 +5924,25 @@ riscv_expand_epilogue (int style)
>>    REG_NOTES (insn) = dwarf;
>>  }
>>
>> -  if (use_restore_libcall)
>> -    frame->mask = 0; /* Temporarily fib for GPRs.  */
>> +  if (use_restore_libcall || use_multi_pop)
>> +    frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>>
>>    /* If we need to restore registers, deallocate as much stack as
>>   possible in the second step without going out of range.  */
>> -  if ((frame->mask | frame->fmask) != 0)
>> +  if (use_multi_pop)
>> +    {
>> +  if (frame->fmask
>> +  && known_gt (frame->total_size - multipop_size,
>> +  frame->frame_pointer_offset))
>> +    step2 = riscv_first_stack_step (frame, frame->total_size - 
>> multipop_size);
>> +    }
>> +  else if ((frame->mask | frame->fmask) != 0)
>>  step2 = riscv_first_stack_step (frame, frame->total_size - 
>>libcall_size);
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || use_multi_pop)
>>  frame->mask = mask; /* Undo the above fib.  */
>>
>> -  poly_int64 step1 = frame->total_size - step2 - libcall_size;
>> +  poly_int64 step1 = frame->total_size - step2 - libcall_size - 
>> multipop_size ;
>>
>>    /* Set TARGET to BASE + STEP1.  */
>>    if (known_gt (step1, 0))
>> @@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
>>    adjust));
>>   rtx dwarf = NULL_RTX;
>>   rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> -    GEN_INT (step2));
>> +    GEN_INT (step2 + libcall_size + 
>> multipop_size));
>
>Why we need `+ libcall_size` here? or...why we don't need that before? 
It's a good catch:)  
I should have  added `+ libcall_size` in 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20

That's why I corrected the cfi issue in save-restore along with zcmp changes in 
this patch.

>
>>
>>   dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>>   RTX_FRAME_RELATED_P (insn) = 1;
>> @@ -5635,15 +5992,15 @@ riscv_expand_epilogue (int style)
>>    epilogue_cfa_sp_offset = step2;
>>  }
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || use_multi_pop)
>>  frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>>
>>    /* Restore the registers.  */
>> -  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
>> +  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - 
>> multipop_size,
>> riscv_restore_reg,
>> true, style == EXCEPTION_RETURN);
>>
>> -  if (use_restore_libcall)
>> +  if (use_restore_libcall || 

Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-12 Thread Fei Gao
On 2023-05-12 16:12  Sinan  wrote:
>
>Hi Fei,
>Sorry for the late reply, I've been busy with moving these days :(.
>Thanks for working on it. I would prefer removing the extra pass for popretz 
>if possible ... I will test your patches ASAP.
>BR,
>Sinan 

hi Sinan

I posted V2 based on Kito's comment just now.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg307507.html

For popretz, we can discuss further offline if it's convenient to you.

BR, 
Fei
>--
>Sender:Fei Gao 
>Sent At:2023 May 6 (Sat.) 16:53
>Recipient:Sinan 
>Cc:jiawei ; gcc-patches 
>Subject:Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
>On 2023-05-05 23:57 Sinan  wrote:
>>
>>> hi Jiawei
>>>
>>> Please ignore my previous reply. I accidently sent the email before I 
>>> finished it.
>>> Sorry for that!
>>>
>>> I downloaded the series of patches from you and found in some cases
>>> it fails to generate zcmp push and pop insns.
>>>
>>> TC:
>>>
>>> char my_getchar();
>>> int test_s0()
>>> {
>>>
>>> int a = my_getchar();
>>> int b = my_getchar();
>>> return a+b;
>>> }
>>>
>>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>>> -mcmodel=medlow test.c
>>>
>>> -fno-shrink-wrap-separate is used here to avoid the impact from 
>>> shrink-wrap-separate that is by default
>>> enabled in O2.
>>>
>>> As i'm also interested in Zc*, i did some changes mainly in prologue and 
>>> epilogue pass quite simliar to
>>> what has been done for save and restore except the CFI directives due to 
>>> reversed order that zcmp
>>> pushes and pops ra, s regs than what save and restore do.
>>>
>>> I will refine and share the code soon for your review.
>>>
>>> BR
>>> Fei
>>Hi Fei,
>>In the current implementation, cm.push will not increase the original 
>>adjustment size of the stack pointer. As cm.push uses a minimum adjustment 
>>size of 16, and in your example, the adjustment size of sp is 12, so cm.push 
>>will not be generated.
>>you can find the check at riscv_use_push_pop
>>> > + */
>>> > + if (base_size > frame_size)
>>> > + return false;
>>> > +
>>And if this check is removed, then you can get the output that you expect.
>>```
>> cm.push {ra,s0},-16
>> call my_getchar
>> mv s0,a0
>> call my_getchar
>> add a0,s0,a0
>> cm.popret {ra,s0},16
>>```
>>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps 
>>we can remove this check? I haven't tested if it is ok to remove this check, 
>>and CC jiawei to help test it.
>>BR,
>>Sinan
>hi Sinan
>Thanks for your reply.
>I posted my codes at 
>https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
>In the cover letter, i did some comparision.
>Could you please review?
>Thanks & BR,
>Fei
>>--
>>Sender:Fei Gao 
>>Sent At:2023 Apr. 25 (Tue.) 18:12
>>Recipient:jiawei 
>>Cc:gcc-patches 
>>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>>hi Jiawei
>>Please ignore my previous reply. I accidently sent the email before I 
>>finished it.
>>Sorry for that!
>>I downloaded the series of patches from you and found in some cases
>>it fails to generate zcmp push and pop insns.
>>TC:
>>char my_getchar();
>>int test_s0()
>>{
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>>}
>>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>>-mcmodel=medlow test.c
>>-fno-shrink-wrap-separate is used here to avoid the impact from 
>>shrink-wrap-separate that is by default
>>enabled in O2.
>>As i'm also interested in Zc*, i did some changes mainly in prologue and 
>>epilogue pass quite simliar to
>>what has been done for save and restore except the CFI directives due to 
>>reversed order that zcmp
>>pushes and pops ra, s regs than what save and restore do.
>>I will refine and share the code soon for your review.
>>BR
>>Fei
>>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>>>
>>>Add Zcmp extension instructions support. Generate push/pop
>>>with follow steps:
>>>
>>> 1. preprocessing:
>>> 1.1. if there is no push rtx, then just return. e.g.
>>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>>> (plus:SI (reg/f:SI 2 sp)
>>> (const_int -32 [0xffe0])))
>>> (nil))
>>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>>> 1.2. if push rtx exists, then we compute the number of
>>> pushed s-registers, n_sreg.
>>>
>>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>>
>>> [2 and 3 happend simultaneously]
>>>
>>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>>> and aN is not used the move pattern, and sN is not
>>> defined before the move pattern (from prologue to the
>>> position of move pattern).
>>>
>>> 3. analysis use and reach of every instruction from prologue
>>> to the position of move pattern.
>>> if any sN is used, then we mark the corresponding argument list
>>> candidate as invalid.
>>> e.g.
>>> push {ra,s0-s3}, {}, -32
>>> sw 

[PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-12 Thread Fei Gao
else if ((frame->mask | frame->fmask) != 0)
 step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
 frame->mask = mask; /* Undo the above fib.  */
 
-  poly_int64 step1 = frame->total_size - step2 - libcall_size;
+  poly_int64 step1 = frame->total_size - step2 - libcall_size - multipop_size ;
 
   /* Set TARGET to BASE + STEP1.  */
   if (known_gt (step1, 0))
@@ -5620,7 +5977,7 @@ riscv_expand_epilogue (int style)
   adjust));
  rtx dwarf = NULL_RTX;
  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-GEN_INT (step2));
+GEN_INT (step2 + libcall_size + 
multipop_size));
 
  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
  RTX_FRAME_RELATED_P (insn) = 1;
@@ -5635,15 +5992,15 @@ riscv_expand_epilogue (int style)
   epilogue_cfa_sp_offset = step2;
 }
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
 frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
 
   /* Restore the registers.  */
-  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
+  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size - 
multipop_size,
riscv_restore_reg,
true, style == EXCEPTION_RETURN);
 
-  if (use_restore_libcall)
+  if (use_restore_libcall || use_multi_pop)
   frame->mask = mask; /* Undo the above fib.  */
 
   if (need_barrier_p)
@@ -5657,14 +6014,30 @@ riscv_expand_epilogue (int style)
 
   rtx dwarf = NULL_RTX;
   rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-const0_rtx);
+GEN_INT (libcall_size + 
multipop_size));
   dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
   RTX_FRAME_RELATED_P (insn) = 1;
 
   REG_NOTES (insn) = dwarf;
 }
 
-  if (use_restore_libcall)
+  if (use_multi_pop)
+{
+  unsigned regs_count = riscv_multi_push_regs_count (frame->mask);
+  if (use_multi_pop_normal)
+insn = emit_jump_insn (
+  riscv_gen_multi_push_pop_insn (POPRET_IDX, multipop_size, 
regs_count));
+  else
+insn= emit_insn (
+  riscv_gen_multi_push_pop_insn(POP_IDX, multipop_size, regs_count));
+
+  rtx dwarf = riscv_adjust_multi_pop_cfi_epilogue (multipop_size);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  REG_NOTES (insn) = dwarf;
+  if (use_multi_pop_normal)
+return;
+}
+  else if (use_restore_libcall)
 {
   rtx dwarf = riscv_adjust_libcall_cfi_epilogue ();
   insn = emit_insn (gen_gpr_restore (GEN_INT (riscv_save_libcall_count 
(mask;
@@ -6937,6 +7310,30 @@ riscv_gen_gpr_save_insn (struct riscv_frame_info *frame)
   return gen_rtx_PARALLEL (VOIDmode, vec);
 }
 
+static HOST_WIDE_INT zcmp_base_adj(int regs_num)
+{
+  return riscv_16bytes_align ((regs_num) * GET_MODE_SIZE (word_mode));
+}
+
+static HOST_WIDE_INT zcmp_additional_adj(HOST_WIDE_INT total, int regs_num)
+{
+  return total - zcmp_base_adj(regs_num);
+}
+
+bool riscv_zcmp_valid_slot_offset_p (HOST_WIDE_INT offset, int slot_idx)
+{
+  return offset == -1 * (slot_idx + 1) * GET_MODE_SIZE (word_mode);
+}
+
+bool riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT total, int regs_num)
+{
+  HOST_WIDE_INT additioanl_bytes = zcmp_additional_adj(total, regs_num);
+  return additioanl_bytes == 0
+ || additioanl_bytes  == 1 * ZCMP_SP_INC_STEP
+ || additioanl_bytes  == 2 * ZCMP_SP_INC_STEP
+ || additioanl_bytes  == ZCMP_MAX_SPIMM * ZCMP_SP_INC_STEP;
+}
+
 /* Return true if it's valid gpr_save pattern.  */
 
 bool
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..ff210083004 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -413,6 +413,29 @@ ASM_MISA_SPEC
 #define RISCV_CALL_ADDRESS_TEMP(MODE) \
   gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
 
+#define RETURN_ADDR_MASK( 1 << RETURN_ADDR_REGNUM)
+#define S0_MASK ( 1 << S0_REGNUM)
+#define S1_MASK ( 1 << S1_REGNUM)
+#define S2_MASK ( 1 << S2_REGNUM)
+#define S3_MASK ( 1 << S3_REGNUM)
+#define S4_MASK ( 1 << S4_REGNUM)
+#define S5_MASK ( 1 << S5_REGNUM)
+#define S6_MASK ( 1 << S6_REGNUM)
+#define S7_MASK ( 1 << S7_REGNUM)
+#define S8_MASK ( 1 << S8_REGNUM)
+#define S9_MASK ( 1 << S9_REGNUM)
+#define S10_MASK( 1 << S10_REGNUM)
+#define S11_MASK( 1 << S11_REGNUM)
+
+#define MULTI_PUSH_GPR_MASK ( RETURN_ADDR_MASK | S0_MASK | S1_

[PATCH 0/1] [V2] RISC-V: support Zcmp extension

2023-05-12 Thread Fei Gao
Before implementing Zcmp, I did some optimizations and restructures to 
save-restore.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a5b2a3bff8152aa34408d8ce40add82f4d22ff87
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a782346757c54a5a3cfb9f416a7ebe3554a617d7

Then Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2.

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch. A discussion be found 
here:
https://github.com/riscv/riscv-code-size-reduction/issues/182

Weeks before, Jiawei also posted Zcmp in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615287.html.
[PATCH 0/5] RISC-V: Support ZC* extensions.   Jiawei
[PATCH 1/5] RISC-V: Minimal support for ZC extensions.   Jiawei
[PATCH 2/5] RISC-V: Enable compressible features when use ZC* extensions.   
Jiawei
[PATCH 3/5] RISC-V: Add ZC* test for march args being passed.   Jiawei
[PATCH 4/5] RISC-V: Add Zcmp extension supports.   Jiawei
[PATCH 5/5] RISC-V: Add ZCMP push/pop testcases.   Jiawei

I tested his codes and observed some issues in [PATCH 4/5],
see https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html for 
details.
So I plan to post my codes as an alternative of Jiawei's [PATCH 4/5].

My Zcmp switch codes are almost same as Jiawei's.
So i avoid repeating them in my patch series.
Please pick up Jiawei's [PATCH 1/5] before picking up my patch series.

Fei Gao (1):
  [RISC-V] support  cm.push cm.pop cm.popret in zcmp

 gcc/config/riscv/predicates.md|  148 +++
 gcc/config/riscv/riscv-protos.h   |2 +
 gcc/config/riscv/riscv.cc |  477 +++-
 gcc/config/riscv/riscv.h  |   23 +
 gcc/config/riscv/riscv.md |2 +
 gcc/config/riscv/zc.md| 1042 +
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  239 
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  239 
 .../gcc.target/riscv/zcmp_stack_alignment.c   |   23 +
 9 files changed, 2155 insertions(+), 40 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1



Re: Re: [PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-10 Thread Fei Gao
On 2023-05-08 10:48  Kito Cheng  wrote:
>
>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>new file mode 100644
>index 000..1c2f390269e
>--- /dev/null
>+++ b/gcc/config/riscv/zc.md
>@@ -0,0 +1,55 @@
>...
>+(define_insn "gpr_multi_pop"
>+  [(unspec_volatile [(match_operand 0 "const_int_operand")
>+ (match_operand 1 "const_int_operand")]
>+    UNSPECV_GPR_MULTI_POP)]
>
>I would strongly suggest modeling the right memory and register access
>here correctly instead of using unspec,
>and same for other two patterns.
>
>That will help GCC know the semantics of this operation. 

Sure, working on it. 

BR, 
Fei

Re: Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-10 Thread Fei Gao
On 2023-05-08 17:20  Kito Cheng  wrote:
>
>-msave-restore is a different story; it's only enabled when the user
>requests, but `-march` describes the capability of the target
>architecture, not specify the preference of performance or size, which
>should be determined by -O1~-O3/-Ofast or -Os/-Oz.
> 

I see and fully agree. 
I'll find a better way to resolve the conflict, 
My current idea is to diasble zcmp when shrink-wrap-separate is actually 
active. 

Thanks Kito and Andrew Pinski for your patience.

BR, 
Fei
>On Mon, May 8, 2023 at 4:54 PM Fei Gao  wrote:
>>
>> On 2023-05-08 16:05  Kito Cheng  wrote:
>> >
>> >> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> >> > index 45a63cab9c9..629e5e45cac 100644
>> >> > --- a/gcc/config/riscv/riscv.cc
>> >> > +++ b/gcc/config/riscv/riscv.cc
>> >> > @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>> >> >
>> >> >    if (riscv_use_save_libcall (>machine->frame)
>> >> >    || cfun->machine->interrupt_handler_p
>> >> > -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
>> >> > +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
>> >> > +  || TARGET_ZCMP)
>> >> >  return components;
>> >>
>> >> I think this is a bad idea. I have a use case where we use the C
>> >> extensions but still compile for -O2 because we want the code to be
>> >> fast as possible but still having the savings of the C extensions.
>> >
>> >Yeah, agree, so I would prefer to drop this from the patch series.
>>
>> Zcmp is a little different here than C.
>> C extension is done fully in AS.  So  we have the code to be
>> fast as possible but still having the savings of the C extensions.
>>
>> Zcmp and shrink-wrap-separate are both done in prologue/epilogue pass
>> and you can only have one switch active to direct sregs save and restore.
>> In my understanding, zcmp push and pop insns seem to
>> be mutually exclusive in functionality to shrink-wrap-separate.
>> It's not expected to see zcmp insns at the begining/end of prologue/epilogue,
>> and also repeated store/load sregs in separate blocks.
>>
>> Same for save and restore, and i guess that's why we have
>> riscv_use_save_libcall (>machine->frame) check here.
>>
>> BR,
>> Fei
>>
>> >
>> >> Thanks,
>> >> Andrew Pinski

Re: Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-08 Thread Fei Gao
On 2023-05-08 16:05  Kito Cheng  wrote:
>
>> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> > index 45a63cab9c9..629e5e45cac 100644
>> > --- a/gcc/config/riscv/riscv.cc
>> > +++ b/gcc/config/riscv/riscv.cc
>> > @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>> >
>> >    if (riscv_use_save_libcall (>machine->frame)
>> >    || cfun->machine->interrupt_handler_p
>> > -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
>> > +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
>> > +  || TARGET_ZCMP)
>> >  return components;
>>
>> I think this is a bad idea. I have a use case where we use the C
>> extensions but still compile for -O2 because we want the code to be
>> fast as possible but still having the savings of the C extensions.
>
>Yeah, agree, so I would prefer to drop this from the patch series. 

Zcmp is a little different here than C. 
C extension is done fully in AS.  So  we have the code to be
fast as possible but still having the savings of the C extensions.

Zcmp and shrink-wrap-separate are both done in prologue/epilogue pass
and you can only have one switch active to direct sregs save and restore.
In my understanding, zcmp push and pop insns seem to
be mutually exclusive in functionality to shrink-wrap-separate. 
It's not expected to see zcmp insns at the begining/end of prologue/epilogue, 
and also repeated store/load sregs in separate blocks.  

Same for save and restore, and i guess that's why we have 
riscv_use_save_libcall (>machine->frame) check here.

BR, 
Fei

>
>> Thanks,
>> Andrew Pinski

Re: Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-08 Thread Fei Gao
On 2023-05-08 10:41  Kito Cheng  wrote:
>
>shrink-wraping already gated by Os so I think we don't need add more
>gate here, unless we are trying to claim force optimize for size if
>zcmp is present.
> 

hi Kito

Zcmp is added here just like save-restore.

Either we add them both, or delete.

BR, 
Fei

>On Sat, May 6, 2023 at 4:41 PM Fei Gao  wrote:
>>
>> zcmp aims to reduce code size, while shrink-wrap-separate prefers
>> speed to code size. So disable shrink-wrap-separate if zcmp
>> enabled, just like what save-restore has done.
>>
>> author: Zhangjin Liao liaozhang...@eswincomputing.com
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.cc (riscv_get_separate_components):
>> ---
>>  gcc/config/riscv/riscv.cc | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 45a63cab9c9..629e5e45cac 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>>
>>    if (riscv_use_save_libcall (>machine->frame)
>>    || cfun->machine->interrupt_handler_p
>> -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
>> +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
>> +  || TARGET_ZCMP)
>>  return components;
>>
>>    offset = cfun->machine->frame.gp_sp_offset.to_constant ();
>> --
>> 2.17.1
>>

Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-06 Thread Fei Gao
On 2023-05-05 23:57  Sinan  wrote:
>
>> hi Jiawei
>>
>> Please ignore my previous reply. I accidently sent the email before I 
>> finished it.
>> Sorry for that!
>>
>> I downloaded the series of patches from you and found in some cases
>> it fails to generate zcmp push and pop insns.
>>
>> TC:
>>
>> char my_getchar();
>> int test_s0()
>> {
>>
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>> }
>>
>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>> -mcmodel=medlow test.c
>>
>> -fno-shrink-wrap-separate is used here to avoid the impact from 
>> shrink-wrap-separate that is by default
>> enabled in O2.
>>
>> As i'm also interested in Zc*, i did some changes mainly in prologue and 
>> epilogue pass quite simliar to
>> what has been done for save and restore except the CFI directives due to 
>> reversed order that zcmp
>> pushes and pops ra, s regs than what save and restore do.
>>
>> I will refine and share the code soon for your review.
>>
>> BR
>> Fei
>Hi Fei,
>In the current implementation, cm.push will not increase the original 
>adjustment size of the stack pointer. As cm.push uses a minimum adjustment 
>size of 16, and in your example, the adjustment size of sp is 12, so cm.push 
>will not be generated.
>you can find the check at riscv_use_push_pop
>> > + */
>> > + if (base_size > frame_size)
>> > + return false;
>> > +
>And if this check is removed, then you can get the output that you expect.
>```
> cm.push {ra,s0},-16
> call my_getchar
> mv s0,a0
> call my_getchar
> add a0,s0,a0
> cm.popret {ra,s0},16
>```
>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps 
>we can remove this check? I haven't tested if it is ok to remove this check, 
>and CC jiawei to help test it.
>BR,
>Sinan 

hi Sinan

Thanks for your reply. 
I posted my codes at 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
In the cover letter, i did some comparision. 
Could you please review?

Thanks & BR, 
Fei

>--
>Sender:Fei Gao 
>Sent At:2023 Apr. 25 (Tue.) 18:12
>Recipient:jiawei 
>Cc:gcc-patches 
>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>hi Jiawei
>Please ignore my previous reply. I accidently sent the email before I finished 
>it.
>Sorry for that!
>I downloaded the series of patches from you and found in some cases
>it fails to generate zcmp push and pop insns.
>TC:
>char my_getchar();
>int test_s0()
>{
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
>}
>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>-mcmodel=medlow test.c
>-fno-shrink-wrap-separate is used here to avoid the impact from 
>shrink-wrap-separate that is by default
>enabled in O2.
>As i'm also interested in Zc*, i did some changes mainly in prologue and 
>epilogue pass quite simliar to
>what has been done for save and restore except the CFI directives due to 
>reversed order that zcmp
>pushes and pops ra, s regs than what save and restore do.
>I will refine and share the code soon for your review.
>BR
>Fei
>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>>
>>Add Zcmp extension instructions support. Generate push/pop
>>with follow steps:
>>
>> 1. preprocessing:
>> 1.1. if there is no push rtx, then just return. e.g.
>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> (plus:SI (reg/f:SI 2 sp)
>> (const_int -32 [0xffe0])))
>> (nil))
>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> 1.2. if push rtx exists, then we compute the number of
>> pushed s-registers, n_sreg.
>>
>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>
>> [2 and 3 happend simultaneously]
>>
>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> and aN is not used the move pattern, and sN is not
>> defined before the move pattern (from prologue to the
>> position of move pattern).
>>
>> 3. analysis use and reach of every instruction from prologue
>> to the position of move pattern.
>> if any sN is used, then we mark the corresponding argument list
>> candidate as invalid.
>> e.g.
>> push {ra,s0-s3}, {}, -32
>> sw s0,44(sp) # s0 is used, then argument list is invalid
>> mv a0,a5 # a0 is defined, then argument list is invalid
>> ...
>> mv s0,a0
>> mv s1,a1
>> mv s2,a2
>>
>> 4. if there is a valid argument list, then replace the pop
>> push parallel insn, and delete mv pattern.
>> if not, skip.
>>
>>All "zcmpe" means Zcmp with RVE extension.
>>The push/pop instrunction implement is mostly finished by Sinan Lin.
>>
>>Co-Authored by: Sinan Lin 
>>Co-Authored by: Simon Cook 
>>Co-Authored by: Shihua Liao 
>>
>>gcc/ChangeLog:
>>
>> * config.gcc: New object.
>> * config/riscv/predicates.md (riscv_stack_push_operation):
>> New predicate.
>> (riscv_stack_pop_operation): Ditto.
>> (pop_return_value_constant): Ditto.
>> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>> * config/riscv/riscv-protos.h 

[PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-06 Thread Fei Gao
|| XINT (elt, 1) != UNSPECV_GPR_MULTI_PUSH)
+return false;
+}
+  else
+{
+  /* USEs, must check the order.  */
+  if (GET_CODE (elt) != USE
+  || !REG_P (XEXP (elt, 1))
+  || (REGNO (XEXP (elt, 1)) !=
+   gpr_save_reg_order[i + GPR_SAVE_REG_ORDER_SKIP_T0T1]))
+return false;
+}
+break;
+}
+  return true;
+}
+
+
 /* Return true if it's valid gpr_save pattern.  */
 
 bool
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..0da2190d04f 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -413,6 +413,32 @@ ASM_MISA_SPEC
 #define RISCV_CALL_ADDRESS_TEMP(MODE) \
   gen_rtx_REG (MODE, RISCV_CALL_ADDRESS_TEMP_REGNUM)
 
+#define RETURN_ADDR_MASK( 1 << RETURN_ADDR_REGNUM)
+#define S0_MASK ( 1 << S0_REGNUM)
+#define S1_MASK ( 1 << S1_REGNUM)
+#define S2_MASK ( 1 << S2_REGNUM)
+#define S3_MASK ( 1 << S3_REGNUM)
+#define S4_MASK ( 1 << S4_REGNUM)
+#define S5_MASK ( 1 << S5_REGNUM)
+#define S6_MASK ( 1 << S6_REGNUM)
+#define S7_MASK ( 1 << S7_REGNUM)
+#define S8_MASK ( 1 << S8_REGNUM)
+#define S9_MASK ( 1 << S9_REGNUM)
+#define S10_MASK( 1 << S10_REGNUM)
+#define S11_MASK( 1 << S11_REGNUM)
+
+#define MULTI_PUSH_GPR_MASK ( RETURN_ADDR_MASK | S0_MASK | S1_MASK | S2_MASK  
| S3_MASK \
+   | S4_MASK | S5_MASK | S6_MASK  
| S7_MASK \
+   | S8_MASK | S9_MASK | S10_MASK 
| S11_MASK )
+#define ZCMP_MAX_SPIMM 3
+#define ZCMP_SP_INC_STEP 16
+#define ZCMP_MAX_RLIST 15
+#define ZCMP_MIN_RLIST 4
+#define ZCMP_INVALID_S0S10_SREGS_COUNTS 11
+#define ZCMP_S0S11_SREGS_COUNTS 12
+#define ZCMP_REG_LIST_RA_S0S11 15
+#define ZCMP_RLIST_OFFSET_TO_SREGS_COUNTS 4
+
 #define MCOUNT_NAME "_mcount"
 
 #define NO_PROFILE_COUNTERS 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 7065e68c0b7..f263d9f5513 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -106,6 +106,10 @@
   ;; Zihintpause unspec
   UNSPECV_PAUSE
 
+  ;; zc unspecs
+  UNSPECV_GPR_MULTI_PUSH
+  UNSPECV_GPR_MULTI_POP
+
   ;; XTheadFmv unspec
   UNSPEC_XTHEADFMV
   UNSPEC_XTHEADFMV_HW
@@ -135,6 +139,8 @@
    (EXCEPTION_RETURN   2)
(VL_REGNUM  66)
(VTYPE_REGNUM   67)
+   (PROLOGUE   0)
+   (EPILOGUE   1)
 ])
 
 (include "predicates.md")
@@ -3205,3 +3211,4 @@
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "zc.md")
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 000..1c2f390269e
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,55 @@
+;; Machine description for RISC-V Zc extention.
+;; Copyright (C) 2011-2023 Free Software Foundation, Inc.
+;; Contributed by Fei Gao (gao...@eswincomputing.com).
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "gpr_multi_pop"
+  [(unspec_volatile [(match_operand 0 "const_int_operand")
+ (match_operand 1 "const_int_operand")]
+UNSPECV_GPR_MULTI_POP)]
+  "TARGET_ZCMP"
+  "*
+  riscv_output_gpr_multi_push_pop(\"cm.pop\", EPILOGUE, operands[0], 
operands[1]);
+  return \"\";
+  "
+)
+(define_insn "gpr_multi_popret"
+  [(unspec_volatile [(match_operand 0 "const_int_operand")
+ (match_operand 1 "const_int_operand")]
+UNSPECV_GPR_MULTI_POP)
+   (return)
+   (use (reg:SI RETURN_ADDR_REGNUM))
+   (const_int 0)]
+  "TARGET_ZCMP"
+  "*
+  riscv_output_gpr_multi_push_pop(\"cm.popret\", EPILOGUE, operands[0], 
operands[1]);
+  return \"\";
+  "
+)
+
+(define_insn "gpr_multi_push"
+  [(match_parallel 2 "gpr_multi_push_operation"
+ [(unspec_volatile [(match_operand 0 "const_int_operand")
+ 

[PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-06 Thread Fei Gao
zcmp aims to reduce code size, while shrink-wrap-separate prefers
speed to code size. So disable shrink-wrap-separate if zcmp
enabled, just like what save-restore has done.

author: Zhangjin Liao liaozhang...@eswincomputing.com

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_get_separate_components):
---
 gcc/config/riscv/riscv.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 45a63cab9c9..629e5e45cac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
 
   if (riscv_use_save_libcall (>machine->frame)
   || cfun->machine->interrupt_handler_p
-  || !cfun->machine->frame.gp_sp_offset.is_constant ())
+  || !cfun->machine->frame.gp_sp_offset.is_constant ()
+  || TARGET_ZCMP)
 return components;
 
   offset = cfun->machine->frame.gp_sp_offset.to_constant ();
-- 
2.17.1



[PATCH 0/2] RISC-V: support Zcmp extension

2023-05-06 Thread Fei Gao
  
a1,12(sp) 
mv  s0,a0   
 
sw  a2,8(sp)   sw  
a2,8(sp)  
sw  a3,4(sp)   sw  
a3,4(sp)  
sw  a4,0(sp)   sw  
a4,0(sp)  
mv  s1,a5  mv  
s1,a5 
callbarcall
bar   
lw  a1,12(sp)  lw  
a1,12(sp) 
lw  a2,8(sp)   lw  
a2,8(sp)  
lw  a3,4(sp)   lw  
a3,4(sp)  
lw  a4,0(sp)   lw  
a4,0(sp)  
add a0,s0,a1   add 
a0,s0,a1  
add a2,a0,a2   add 
a2,a0,a2  
add a3,a2,a3   add 
a3,a2,a3  
lw  a0,28(sp) //issue in accessing incoming para   lw  
a0,32(sp) 
add a4,a3,a4   add 
a4,a3,a4  
add a4,a4,s1   add 
a4,a4,s1  
add a0,a4,a0   add 
a0,a4,a0  
cm.popret   {ra,s0-s1},32  
cm.popret   {ra, s0-s1}, 32  

Fei Gao (2):
  [RISC-V] disable shrink-wrap-separate if zcmp enabled.
  [RISC-V] support cm.push cm.pop cm.popret in zcmp

 gcc/config/riscv/predicates.md|   6 +
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv.cc | 403 --
 gcc/config/riscv/riscv.h  |  26 ++
 gcc/config/riscv/riscv.md |   7 +
 gcc/config/riscv/zc.md|  55 +++
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   | 239 +++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   | 239 +++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  23 +
 9 files changed, 960 insertions(+), 41 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1



[PATCH V2] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-29 Thread Fei Gao
Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function for 
riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_avoid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_stack.c: New test.
---
 gcc/config/riscv/riscv.cc| 58 
 gcc/testsuite/gcc.target/riscv/rv32e_stack.c | 14 +
 2 files changed, 50 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_stack.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..8b32977e296 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4772,12 +4772,27 @@ riscv_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Return TRUE if a libcall to save/restore GPRs should be
+   avoided.  FALSE otherwise.  */
+static bool
+riscv_avoid_save_libcall (void)
+{
+  if (!TARGET_SAVE_RESTORE
+  || crtl->calls_eh_return
+  || frame_pointer_needed
+  || cfun->machine->interrupt_handler_p
+  || cfun->machine->varargs_size != 0
+  || crtl->args.pretend_args_size != 0)
+return true;
+
+  return false;
+}
+
 /* Determine whether to call GPR save/restore routines.  */
 static bool
 riscv_use_save_libcall (const struct riscv_frame_info *frame)
 {
-  if (!TARGET_SAVE_RESTORE || crtl->calls_eh_return || frame_pointer_needed
-  || cfun->machine->interrupt_handler_p)
+  if (riscv_avoid_save_libcall ())
 return false;
 
   return frame->save_libcall_adjustment != 0;
@@ -4857,7 +4872,7 @@ riscv_compute_frame_info (void)
   struct riscv_frame_info *frame;
   poly_int64 offset;
   bool interrupt_save_prologue_temp = false;
-  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0;
+  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0, x_save_size = 0;
 
   frame = >machine->frame;
 
@@ -4895,24 +4910,14 @@ riscv_compute_frame_info (void)
frame->fmask |= 1 << (regno - FP_REG_FIRST), num_f_saved++;
 }
 
-  /* At the bottom of the frame are any outgoing stack arguments. */
-  offset = riscv_stack_align (crtl->outgoing_args_size);
-  /* Next are local stack variables. */
-  offset += riscv_stack_align (get_frame_size ());
-  /* The virtual frame pointer points above the local variables. */
-  frame->frame_pointer_offset = offset;
-  /* Next are the callee-saved FPRs. */
-  if (frame->fmask)
-offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
-  /* Next are the callee-saved GPRs. */
   if (frame->mask)
 {
-  unsigned x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
+  x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
   unsigned num_save_restore = 1 + riscv_save_libcall_count (frame->mask);
 
   /* Only use save/restore routines if they don't alter the stack size.  */
-  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size)
+  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size
+  && !riscv_avoid_save_libcall ())
{
  /* Libcall saves/restores 3 registers at once, so we need to
 allocate 12 bytes for callee-saved register.  */
@@ -4921,9 +4926,21 @@ riscv_compute_frame_info (void)
 
  frame->save_libcall_adjustment = x_save_size;
}
-
-  offset += x_save_size;
 }
+
+  /* At the bottom of the frame are any outgoing stack arguments. */
+  offset = riscv_stack_align (crtl->outgoing_args_size);
+  /* Next are local stack variables. */
+  offset += riscv_stack_align (get_frame_size ());
+  /* The virtual frame pointer points above the local variables. */
+  frame->frame_pointer_offset = offset;
+  /* Next are the callee-saved FPRs. */
+  if (frame->fmask)
+offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+  /* Next are the callee-saved GPRs. */
+  if (frame->mask)
+

[PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-04-25 Thread Fei Gao


hi Jiawei

Please ignore my previous reply. I accidently sent the email before I finished 
it.
Sorry for that!

I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.

TC:

char my_getchar();
int test_s0()
{

        int a = my_getchar();
        int b = my_getchar();
        return a+b;
}

cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e  
-mcmodel=medlow test.c

-fno-shrink-wrap-separate is used here to avoid the impact from 
shrink-wrap-separate that is by default
enabled in O2.

As i'm also interested in Zc*, i did some changes mainly in prologue and 
epilogue pass quite simliar to
what has been done for save and restore except the CFI directives due to 
reversed order that zcmp
pushes and pops ra, s regs than what save and restore do. 

I will refine and share the code soon for your review.

BR
Fei




On Thu Apr 6 06:21:17 GMT 2023  Jiawei jia...@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
>  1. preprocessing:
>    1.1. if there is no push rtx, then just return. e.g.
>    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>      (plus:SI (reg/f:SI 2 sp)
>        (const_int -32 [0xffe0])))
>    (nil))
>    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>    1.2. if push rtx exists, then we compute the number of
>    pushed s-registers, n_sreg.
>
>  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
>  [2 and 3 happend simultaneously]
>
>  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>    and aN is not used the move pattern, and sN is not
>    defined before the move pattern (from prologue to the
>    position of move pattern).
>
>  3. analysis use and reach of every instruction from prologue
>    to the position of move pattern.
>    if any sN is used, then we mark the corresponding argument list
>    candidate as invalid.
>    e.g.
>        push  {ra,s0-s3}, {}, -32
>        sw      s0,44(sp) # s0 is used, then argument list is invalid
>        mv      a0,a5     # a0 is defined, then argument list is invalid
>        ...
>        mv      s0,a0
>        mv      s1,a1
>        mv      s2,a2
>
>  4. if there is a valid argument list, then replace the pop
>    push parallel insn, and delete mv pattern.
>     if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin 
>Co-Authored by: Simon Cook 
>Co-Authored by: Shihua Liao 
>
>gcc/ChangeLog:
>
>        * config.gcc: New object.
>        * config/riscv/predicates.md (riscv_stack_push_operation):
>          New predicate.
>        (riscv_stack_pop_operation): Ditto.
>        (pop_return_value_constant): Ditto.
>        * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>        * config/riscv/riscv-protos.h (riscv_output_popret_p):
>          New routine.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_check_regno): Ditto.
>        (make_pass_zcmp_popret): New pass.
>        * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>        (riscv_output_popret_p): New function.
>        (riscv_print_pop_size): Ditto.
>        (riscv_print_reglist): Ditto.
>        (riscv_print_operand): New case symbols.
>        (riscv_save_push_pop_count): New function.
>        (riscv_push_pop_base_sp_adjust): Ditto.
>        (riscv_use_push_pop): Ditto.
>        (riscv_compute_frame_info): Adjust frame value.
>        (riscv_emit_pop_insn): New function.
>        (riscv_check_regno): Ditto.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_emit_push_insn): Ditto.
>        (riscv_expand_prologue): Modify frame pattern.
>        (riscv_expand_epilogue): Ditto.
>        * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>        (RISCV_ZCE_PUSH_POP_MASK): New mask.
>        (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>        * config/riscv/riscv.md: Add new reg number and include info.
>        * config/riscv/t-riscv: New object rules.
>        * config/riscv/riscv-zcmp-popret.cc: New file.
>        * config/riscv/zc.md: New file.
>---
> gcc/config.gcc                        |   2 +-
> gcc/config/riscv/predicates.md        |  16 +
> gcc/config/riscv/riscv-passes.def     |   1 +
> gcc/config/riscv/riscv-protos.h       |   4 +
> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++
> gcc/config/riscv/riscv.cc             | 437 +-
> gcc/config/riscv/riscv.h              |   4 +
> gcc/config/riscv/riscv.md             |   3 +
> gcc/config/riscv/t-riscv              |   4 +
> gcc/config/riscv/zc.md                |  47 +++
> 10 files changed, 767 insertions(+), 11 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
> create mode 100644 gcc/config/riscv/zc.md
>
>diff --git a/gcc/config.gcc b/gcc/config.gcc
>index 629d324b5ef..a991c5273f9 100644
>--- a/gcc/config.gcc
>+++ b/gcc/config.gcc
>@@ 

[PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-04-25 Thread Fei Gao
hi Jiawei

I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.

test.c

char my_getchar();
int test_s0()
{

        int a = my_getchar();
        int b = my_getchar();
        return a+b;
}




On Thu Apr 6 06:21:17 GMT 2023  Jiawei jia...@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
>  1. preprocessing:
>    1.1. if there is no push rtx, then just return. e.g.
>    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>      (plus:SI (reg/f:SI 2 sp)
>        (const_int -32 [0xffe0])))
>    (nil))
>    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>    1.2. if push rtx exists, then we compute the number of
>    pushed s-registers, n_sreg.
>
>  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
>  [2 and 3 happend simultaneously]
>
>  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>    and aN is not used the move pattern, and sN is not
>    defined before the move pattern (from prologue to the
>    position of move pattern).
>
>  3. analysis use and reach of every instruction from prologue
>    to the position of move pattern.
>    if any sN is used, then we mark the corresponding argument list
>    candidate as invalid.
>    e.g.
>        push  {ra,s0-s3}, {}, -32
>        sw      s0,44(sp) # s0 is used, then argument list is invalid
>        mv      a0,a5     # a0 is defined, then argument list is invalid
>        ...
>        mv      s0,a0
>        mv      s1,a1
>        mv      s2,a2
>
>  4. if there is a valid argument list, then replace the pop
>    push parallel insn, and delete mv pattern.
>     if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin 
>Co-Authored by: Simon Cook 
>Co-Authored by: Shihua Liao 
>
>gcc/ChangeLog:
>
>        * config.gcc: New object.
>        * config/riscv/predicates.md (riscv_stack_push_operation):
>          New predicate.
>        (riscv_stack_pop_operation): Ditto.
>        (pop_return_value_constant): Ditto.
>        * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>        * config/riscv/riscv-protos.h (riscv_output_popret_p):
>          New routine.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_check_regno): Ditto.
>        (make_pass_zcmp_popret): New pass.
>        * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>        (riscv_output_popret_p): New function.
>        (riscv_print_pop_size): Ditto.
>        (riscv_print_reglist): Ditto.
>        (riscv_print_operand): New case symbols.
>        (riscv_save_push_pop_count): New function.
>        (riscv_push_pop_base_sp_adjust): Ditto.
>        (riscv_use_push_pop): Ditto.
>        (riscv_compute_frame_info): Adjust frame value.
>        (riscv_emit_pop_insn): New function.
>        (riscv_check_regno): Ditto.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_emit_push_insn): Ditto.
>        (riscv_expand_prologue): Modify frame pattern.
>        (riscv_expand_epilogue): Ditto.
>        * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>        (RISCV_ZCE_PUSH_POP_MASK): New mask.
>        (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>        * config/riscv/riscv.md: Add new reg number and include info.
>        * config/riscv/t-riscv: New object rules.
>        * config/riscv/riscv-zcmp-popret.cc: New file.
>        * config/riscv/zc.md: New file.
>---
> gcc/config.gcc                        |   2 +-
> gcc/config/riscv/predicates.md        |  16 +
> gcc/config/riscv/riscv-passes.def     |   1 +
> gcc/config/riscv/riscv-protos.h       |   4 +
> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++
> gcc/config/riscv/riscv.cc             | 437 +-
> gcc/config/riscv/riscv.h              |   4 +
> gcc/config/riscv/riscv.md             |   3 +
> gcc/config/riscv/t-riscv              |   4 +
> gcc/config/riscv/zc.md                |  47 +++
> 10 files changed, 767 insertions(+), 11 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
> create mode 100644 gcc/config/riscv/zc.md
>
>diff --git a/gcc/config.gcc b/gcc/config.gcc
>index 629d324b5ef..a991c5273f9 100644
>--- a/gcc/config.gcc
>+++ b/gcc/config.gcc
>@@ -529,7 +529,7 @@ pru-*-*)
>        ;;
> riscv*)
>        cpu_type=riscv
>-       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>+       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
>riscv-zcmp-popret.o"
>        extra_objs="${extra_objs} riscv-vector-builtins.o 
>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>        extra_objs="${extra_objs} thead.o"
>        d_target_objs="riscv-d.o"
>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>index 

[PATCH] RISC-V: decouple stack allocation for rv32e w/o save-restore.

2023-04-21 Thread Fei Gao
Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addisp,sp,-16
sw  ra,12(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,12(sp)
addisp,sp,16
jr  ra

after patch:
addisp,sp,-8
sw  ra,4(sp)
callgetInt
sw  a0,0(sp)
lw  a0,0(sp)
callPrintInts
lw  a5,0(sp)
mv  a0,a5
lw  ra,4(sp)
addisp,sp,8
jr  ra

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_forbid_save_libcall): helper function 
for riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_forbid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation 
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_stack.c: New test.
---
 gcc/config/riscv/riscv.cc| 57 
 gcc/testsuite/gcc.target/riscv/rv32e_stack.c | 14 +
 2 files changed, 49 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_stack.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..6ccdfe96fe7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4772,12 +4772,26 @@ riscv_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Determine whether to disable GPR save/restore routines.  */
+static bool
+riscv_forbid_save_libcall (void)
+{
+  if (!TARGET_SAVE_RESTORE
+  || crtl->calls_eh_return
+  || frame_pointer_needed
+  || cfun->machine->interrupt_handler_p
+  || cfun->machine->varargs_size != 0
+  || crtl->args.pretend_args_size != 0)
+return true;
+
+  return false;
+}
+
 /* Determine whether to call GPR save/restore routines.  */
 static bool
 riscv_use_save_libcall (const struct riscv_frame_info *frame)
 {
-  if (!TARGET_SAVE_RESTORE || crtl->calls_eh_return || frame_pointer_needed
-  || cfun->machine->interrupt_handler_p)
+  if (riscv_forbid_save_libcall ())
 return false;
 
   return frame->save_libcall_adjustment != 0;
@@ -4857,7 +4871,7 @@ riscv_compute_frame_info (void)
   struct riscv_frame_info *frame;
   poly_int64 offset;
   bool interrupt_save_prologue_temp = false;
-  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0;
+  unsigned int regno, i, num_x_saved = 0, num_f_saved = 0, x_save_size = 0;
 
   frame = >machine->frame;
 
@@ -4895,24 +4909,14 @@ riscv_compute_frame_info (void)
frame->fmask |= 1 << (regno - FP_REG_FIRST), num_f_saved++;
 }
 
-  /* At the bottom of the frame are any outgoing stack arguments. */
-  offset = riscv_stack_align (crtl->outgoing_args_size);
-  /* Next are local stack variables. */
-  offset += riscv_stack_align (get_frame_size ());
-  /* The virtual frame pointer points above the local variables. */
-  frame->frame_pointer_offset = offset;
-  /* Next are the callee-saved FPRs. */
-  if (frame->fmask)
-offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
-  /* Next are the callee-saved GPRs. */
   if (frame->mask)
 {
-  unsigned x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
+  x_save_size = riscv_stack_align (num_x_saved * UNITS_PER_WORD);
   unsigned num_save_restore = 1 + riscv_save_libcall_count (frame->mask);
 
   /* Only use save/restore routines if they don't alter the stack size.  */
-  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size)
+  if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == x_save_size
+  && !riscv_forbid_save_libcall ())
{
  /* Libcall saves/restores 3 registers at once, so we need to
 allocate 12 bytes for callee-saved register.  */
@@ -4921,9 +4925,21 @@ riscv_compute_frame_info (void)
 
  frame->save_libcall_adjustment = x_save_size;
}
-
-  offset += x_save_size;
 }
+
+  /* At the bottom of the frame are any outgoing stack arguments. */
+  offset = riscv_stack_align (crtl->outgoing_args_size);
+  /* Next are local stack variables. */
+  offset += riscv_stack_align (get_frame_size ());
+  /* The virtual frame pointer points above the local variables. */
+  frame->frame_pointer_offset = offset;
+  /* Next are the callee-saved FPRs. */
+  if (frame->fmask)
+offset += riscv_stack_align (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+  /* Next are the callee-saved GPRs. */
+  if (frame->mask)
+offset += x_save_size;
   

Re: [PING 2] [PATCH 0/3] RISC-V: optimize stack manipulation in save-restore

2023-02-15 Thread Fei Gao
ping.

BR, 
Fei

On 2023-02-03 16:52  Fei Gao  wrote:
>
>
>Gentle ping.
>
>The patch I previously submitted:
>| Date: Wed, 30 Nov 2022 00:38:08 -0800
>| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
>| Message-ID: 
>
>I split the patches as per Palmer's review comment.
>
>BR
>Fei
>
>On 2022-12-01 18:03  Fei Gao  wrote:
>>
>>The patches allow less instructions to be used in stack allocation
>>and deallocation if save-restore enabled, and also make the stack
>>manipulation codes more readable.
>>
>>Fei Gao (3):
>>  RISC-V: add a new parameter in riscv_first_stack_step.
>>  RISC-V: optimize stack manipulation in save-restore
>>  RISC-V: make the stack manipulation codes more readable.
>>
>> gcc/config/riscv/riscv.cc | 105 +-
>> .../gcc.target/riscv/stack_save_restore.c |  40 +++
>> 2 files changed, 95 insertions(+), 50 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>>
>>--
>>2.17.1

[PING 2] [PATCH 0/3] RISC-V: optimize stack manipulation in save-restore

2023-02-08 Thread Fei Gao

ping.

BR
Fei

On 2023-02-03 16:52  Fei Gao  wrote:
>
>
>Gentle ping.
>
>The patch I previously submitted:
>| Date: Wed, 30 Nov 2022 00:38:08 -0800
>| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
>| Message-ID: 
>
>I split the patches as per Palmer's review comment.
>
>BR
>Fei
>
>On 2022-12-01 18:03  Fei Gao  wrote:
>>
>>The patches allow less instructions to be used in stack allocation
>>and deallocation if save-restore enabled, and also make the stack
>>manipulation codes more readable.
>>
>>Fei Gao (3):
>>  RISC-V: add a new parameter in riscv_first_stack_step.
>>  RISC-V: optimize stack manipulation in save-restore
>>  RISC-V: make the stack manipulation codes more readable.
>>
>> gcc/config/riscv/riscv.cc | 105 +-
>> .../gcc.target/riscv/stack_save_restore.c |  40 +++
>> 2 files changed, 95 insertions(+), 50 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>>
>>--
>>2.17.1

[PING] [PATCH 3/3] RISC-V: make the stack manipulation codes more readable.

2023-02-03 Thread Fei Gao
Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_first_stack_step): make codes more 
>readable.
>    (riscv_expand_epilogue): likewise.
>---
> gcc/config/riscv/riscv.cc | 17 ++---
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index a50f2303032..95da08ffb3b 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -4926,8 +4926,11 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_siz
>   if (SMALL_OPERAND (remaining_const_size))
> return remaining_const_size;
>
>+  poly_int64 callee_saved_first_step =
>+    remaining_size - frame->frame_pointer_offset;
>+  gcc_assert(callee_saved_first_step.is_constant ());
>   HOST_WIDE_INT min_first_step =
>-    riscv_stack_align ((remaining_size - 
>frame->frame_pointer_offset).to_constant());
>+    riscv_stack_align (callee_saved_first_step.to_constant ());
>   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
>   HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
>   gcc_assert (min_first_step <= max_first_step);
>@@ -4935,7 +4938,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_siz
>   /* As an optimization, use the least-significant bits of the total frame
>  size, so that the second adjustment step is just LUI + ADD.  */
>   if (!SMALL_OPERAND (min_second_step)
>-  && remaining_const_size % IMM_REACH < IMM_REACH / 2
>+  && remaining_const_size % IMM_REACH <= max_first_step
>   && remaining_const_size % IMM_REACH >= min_first_step)
> return remaining_const_size % IMM_REACH;
>
>@@ -5129,14 +5132,14 @@ riscv_adjust_libcall_cfi_epilogue ()
> void
> riscv_expand_epilogue (int style)
> {
>-  /* Split the frame into two.  STEP1 is the amount of stack we should
>- deallocate before restoring the registers.  STEP2 is the amount we
>- should deallocate afterwards.
>+  /* Split the frame into 3 steps. STEP1 is the amount of stack we should
>+ deallocate before restoring the registers. STEP2 is the amount we
>+ should deallocate afterwards including the callee saved regs. STEP3
>+ is the amount deallocated by save-restore libcall.
>
>  Start off by assuming that no registers need to be restored.  */
>   struct riscv_frame_info *frame = >machine->frame;
>   unsigned mask = frame->mask;
>-  poly_int64 step1 = frame->total_size;
>   HOST_WIDE_INT step2 = 0;
>   bool use_restore_libcall = ((style == NORMAL_RETURN)
>   && riscv_use_save_libcall (frame));
>@@ -5223,7 +5226,7 @@ riscv_expand_epilogue (int style)
>   if (use_restore_libcall)
> frame->mask = mask; /* Undo the above fib.  */
>
>-  step1 -= step2 + libcall_size;
>+  poly_int64 step1 = frame->total_size - step2 - libcall_size;
>
>   /* Set TARGET to BASE + STEP1.  */
>   if (known_gt (step1, 0))
>--
>2.17.1

[PING] [PATCH 2/3] RISC-V: optimize stack manipulation in save-restore

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>The stack that save-restore reserves is not well accumulated in stack 
>allocation and deallocation.
>This patch allows less instructions to be used in stack allocation and 
>deallocation if save-restore enabled.
>
>before patch:
>  bar:
>    call   t0,__riscv_save_4
>    addi   sp,sp,-64
>    ...
>    li t0,-12288
>    addi   t0,t0,-1968 # optimized out after patch
>    addsp,sp,t0 # prologue
>    ...
>    li t0,12288 # epilogue
>    addi   t0,t0,2000 # optimized out after patch
>    addsp,sp,t0
>    ...
>    addi   sp,sp,32
>    tail   __riscv_restore_4
>
>after patch:
>  bar:
>    call   t0,__riscv_save_4
>    addi   sp,sp,-2032
>    ...
>    li t0,-12288
>    addsp,sp,t0 # prologue
>    ...
>    li t0,12288 # epilogue
>    addsp,sp,t0
>    ...
>    addi   sp,sp,2032
>    tail   __riscv_restore_4
>
>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_expand_prologue): consider save-restore 
>in stack allocation.
>    (riscv_expand_epilogue): consider save-restore in stack deallocation.
>
>gcc/testsuite/ChangeLog:
>
>    * gcc.target/riscv/stack_save_restore.c: New test.
>---
> gcc/config/riscv/riscv.cc | 50 ++-
> .../gcc.target/riscv/stack_save_restore.c | 40 +++
> 2 files changed, 66 insertions(+), 24 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index f0bbcd6d6be..a50f2303032 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -5010,12 +5010,12 @@ void
> riscv_expand_prologue (void)
> {
>   struct riscv_frame_info *frame = >machine->frame;
>-  poly_int64 size = frame->total_size;
>+  poly_int64 remaining_size = frame->total_size;
>   unsigned mask = frame->mask;
>   rtx insn;
>
>   if (flag_stack_usage_info)
>-    current_function_static_stack_size = constant_lower_bound (size);
>+    current_function_static_stack_size = constant_lower_bound 
>(remaining_size);
>
>   if (cfun->machine->naked_p)
> return;
>@@ -5026,7 +5026,7 @@ riscv_expand_prologue (void)
>   rtx dwarf = NULL_RTX;
>   dwarf = riscv_adjust_libcall_cfi_prologue ();
>
>-  size -= frame->save_libcall_adjustment;
>+  remaining_size -= frame->save_libcall_adjustment;
>   insn = emit_insn (riscv_gen_gpr_save_insn (frame));
>   frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>
>@@ -5037,16 +5037,14 @@ riscv_expand_prologue (void)
>   /* Save the registers.  */
>   if ((frame->mask | frame->fmask) != 0)
> {
>-  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
>-  if (size.is_constant ())
>-  step1 = MIN (size.to_constant(), step1);
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, remaining_size);
>
>   insn = gen_add3_insn (stack_pointer_rtx,
>     stack_pointer_rtx,
>     GEN_INT (-step1));
>   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>-  size -= step1;
>-  riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>+  remaining_size -= step1;
>+  riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, false);
> }
>
>   frame->mask = mask; /* Undo the above fib.  */
>@@ -5055,29 +5053,29 @@ riscv_expand_prologue (void)
>   if (frame_pointer_needed)
> {
>   insn = gen_add3_insn (hard_frame_pointer_rtx, stack_pointer_rtx,
>-      GEN_INT ((frame->hard_frame_pointer_offset - size).to_constant ()));
>+      GEN_INT ((frame->hard_frame_pointer_offset - 
>remaining_size).to_constant ()));
>   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>
>   riscv_emit_stack_tie ();
> }
>
>   /* Allocate the rest of the frame.  */
>-  if (known_gt (size, 0))
>+  if (known_gt (remaining_size, 0))
> {
>   /* Two step adjustment:
> 1.scalable frame. 2.constant frame.  */
>   poly_int64 scalable_frame (0, 0);
>-  if (!size.is_constant ())
>+  if (!remaining_size.is_constant ())
> {
>   /* First for scalable frame.  */
>-    poly_int64 scalable_frame = size;
>-    scalable_frame.coeffs[0] = size.coeffs[1];
>+    poly_int64 scalable_frame = remaining_size;
>+    scalable_frame.coeffs[0] = remaining_size.coeffs[1];
>   riscv_v_adjust_scalable_frame (stack_pointer_rtx, scalable_frame, false);
>-    size -= scalable_frame;
>+    remaining_size -= scalable_frame;
> }
>
>   /* Second step for constant frame.  */
>-  HOST_WIDE_INT constant_frame = size.to_constant ();
>+  HOST_WIDE_INT constant_frame = remaining_size.to_constant ();
>   if (constant_frame == 0)
> return;
>
>@@ -5142,6 +5140,8 @@ riscv_expand_epilogue (int style)
>   HOST_WIDE_INT 

[PING][PATCH 1/3] RISC-V: add a new parameter in riscv_first_stack_step.

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

>frame->total_size to remaining_size conversion is done as an independent patch 
>without
>functionality change as per review comment.
>
>gcc/ChangeLog:
>
>    * config/riscv/riscv.cc (riscv_first_stack_step): add a new function 
>parameter remaining_size.
>    (riscv_compute_frame_info): adapt new riscv_first_stack_step interface.
>    (riscv_expand_prologue): likewise.
>    (riscv_expand_epilogue): likewise.
>---
> gcc/config/riscv/riscv.cc | 48 +++
> 1 file changed, 24 insertions(+), 24 deletions(-)
>
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index 05bdba5ab4d..f0bbcd6d6be 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask)
>    They decrease stack_pointer_rtx but leave frame_pointer_rtx and
>    hard_frame_pointer_rtx unchanged.  */
>
>-static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame);
>+static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame, 
>poly_int64 remaining_size);
>
> /* Handle stack align for poly_int.  */
> static poly_int64
>@@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void)
>  save/restore t0.  We check for this before clearing the frame struct.  */
>   if (cfun->machine->interrupt_handler_p)
> {
>-  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
>   if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
> interrupt_save_prologue_temp = true;
> }
>@@ -4913,45 +4913,45 @@ riscv_restore_reg (rtx reg, rtx mem)
>    without adding extra instructions.  */
>
> static HOST_WIDE_INT
>-riscv_first_stack_step (struct riscv_frame_info *frame)
>+riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 
>remaining_size)
> {
>-  HOST_WIDE_INT frame_total_constant_size;
>-  if (!frame->total_size.is_constant ())
>-    frame_total_constant_size
>-  = riscv_stack_align (frame->total_size.coeffs[0])
>-  - riscv_stack_align (frame->total_size.coeffs[1]);
>+  HOST_WIDE_INT remaining_const_size;
>+  if (!remaining_size.is_constant ())
>+    remaining_const_size
>+  = riscv_stack_align (remaining_size.coeffs[0])
>+    - riscv_stack_align (remaining_size.coeffs[1]);
>   else
>-    frame_total_constant_size = frame->total_size.to_constant ();
>+    remaining_const_size = remaining_size.to_constant ();
>
>-  if (SMALL_OPERAND (frame_total_constant_size))
>-    return frame_total_constant_size;
>+  if (SMALL_OPERAND (remaining_const_size))
>+    return remaining_const_size;
>
>   HOST_WIDE_INT min_first_step =
>-    RISCV_STACK_ALIGN ((frame->total_size - 
>frame->frame_pointer_offset).to_constant());
>+    riscv_stack_align ((remaining_size - 
>frame->frame_pointer_offset).to_constant());
>   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
>-  HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step;
>+  HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
>   gcc_assert (min_first_step <= max_first_step);
>
>   /* As an optimization, use the least-significant bits of the total frame
>  size, so that the second adjustment step is just LUI + ADD.  */
>   if (!SMALL_OPERAND (min_second_step)
>-  && frame_total_constant_size % IMM_REACH < IMM_REACH / 2
>-  && frame_total_constant_size % IMM_REACH >= min_first_step)
>-    return frame_total_constant_size % IMM_REACH;
>+  && remaining_const_size % IMM_REACH < IMM_REACH / 2
>+  && remaining_const_size % IMM_REACH >= min_first_step)
>+    return remaining_const_size % IMM_REACH;
>
>   if (TARGET_RVC)
> {
>   /* If we need two subtracts, and one is small enough to allow compressed
>-  loads and stores, then put that one first.  */
>+ loads and stores, then put that one first.  */
>   if (IN_RANGE (min_second_step, 0,
>-      (TARGET_64BIT ? SDSP_REACH : SWSP_REACH)))
>-  return MAX (min_second_step, min_first_step);
>+    (TARGET_64BIT ? SDSP_REACH : SWSP_REACH)))
>+   return MAX (min_second_step, min_first_step);
>
>   /* If we need LUI + ADDI + ADD for the second adjustment step, then 
>start
>-  with the minimum first step, so that we can get compressed loads and
>-  stores.  */
>+ with the minimum first step, so that we can get compressed loads and
>+ stores.  */
>   else if (!SMALL_OPERAND (min_second_step))
>-  return min_first_step;
>+   return min_first_step;
> }
>
>   return max_first_step;
>@@ -5037,7 +5037,7 @@ riscv_expand_prologue (void)
>   /* Save the registers.  */
>   if ((frame->mask | 

[PING] [PATCH 0/3] RISC-V: optimize stack manipulation in save-restore

2023-02-03 Thread Fei Gao

Gentle ping.

The patch I previously submitted:
| Date: Wed, 30 Nov 2022 00:38:08 -0800
| Subject: [PATCH] RISC-V: optimize stack manipulation in save-restore
| Message-ID: 

I split the patches as per Palmer's review comment.

BR
Fei

On 2022-12-01 18:03  Fei Gao  wrote:
>
>The patches allow less instructions to be used in stack allocation
>and deallocation if save-restore enabled, and also make the stack
>manipulation codes more readable.
>
>Fei Gao (3):
>  RISC-V: add a new parameter in riscv_first_stack_step.
>  RISC-V: optimize stack manipulation in save-restore
>  RISC-V: make the stack manipulation codes more readable.
>
> gcc/config/riscv/riscv.cc | 105 +-
> .../gcc.target/riscv/stack_save_restore.c |  40 +++
> 2 files changed, 95 insertions(+), 50 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>
>--
>2.17.1

Re: Re: [PATCH] RISC-V: optimize stack manipulation in save-restore

2022-12-05 Thread Fei Gao
Hi Palmer and all, 

I have split the patches and triggerred a new thread.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg297206.html

Could you please review at your convenience?

Thanks & BR, 
Fei

On 2022-12-01 11:07  Fei Gao  wrote:
>
>On 2022-12-01 06:50  Palmer Dabbelt  wrote:
>>
>>On Wed, 30 Nov 2022 00:37:17 PST (-0800), gao...@eswincomputing.com wrote:
>>> The stack that save-restore reserves is not well accumulated in stack 
>>> allocation and deallocation.
>>> This patch allows less instructions to be used in stack allocation and 
>>> deallocation if save-restore enabled,
>>> and also a much clear logic for save-restore stack manipulation.
>>>
>>> before patch:
>>> bar:
>>> callt0,__riscv_save_4
>>> addisp,sp,-64
>>> ...
>>> li  t0,-12288
>>> addit0,t0,-1968 # optimized out after patch
>>> add sp,sp,t0 # prologue
>>> ...
>>> li  t0,12288 # epilogue
>>> addit0,t0,2000 # optimized out after patch
>>> add sp,sp,t0
>>> ...
>>> addisp,sp,32
>>> tail__riscv_restore_4
>>>
>>> after patch:
>>> bar:
>>> callt0,__riscv_save_4
>>> addisp,sp,-2032
>>> ...
>>> li  t0,-12288
>>> add sp,sp,t0 # prologue
>>> ...
>>> li  t0,12288 # epilogue
>>> add sp,sp,t0
>>> ...
>>> addisp,sp,2032
>>> tail__riscv_restore_4
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/riscv.cc (riscv_first_stack_step): add a new 
>>>function parameter remaining_size.
>>> (riscv_compute_frame_info): adapt new riscv_first_stack_step 
>>>interface.
>>> (riscv_expand_prologue): consider save-restore in stack allocation.
>>> (riscv_expand_epilogue): consider save-restore in stack 
>>>deallocation.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/stack_save_restore.c: New test.
>>> ---
>>>  gcc/config/riscv/riscv.cc | 58 ++-
>>>  .../gcc.target/riscv/stack_save_restore.c | 40 +
>>>  2 files changed, 70 insertions(+), 28 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c
>>
>>I guess with the RISC-V backend still being open for things as big as
>>the V port we should probably be taking code like this as well?  I
>>wouldn't be opposed to making an exception for the V code and holding
>>everything else back, though.
>>
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index 05bdba5ab4d..9e92e729a5f 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask)
>>> They decrease stack_pointer_rtx but leave frame_pointer_rtx and
>>> hard_frame_pointer_rtx unchanged.  */
>>>
>>> -static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info 
>>> *frame);
>>> +static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info 
>>> *frame, poly_int64 remaining_size);
>>>
>>>  /* Handle stack align for poly_int.  */
>>>  static poly_int64
>>> @@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void)
>>>   save/restore t0.  We check for this before clearing the frame struct. 
>>> */
>>>    if (cfun->machine->interrupt_handler_p)
>>>  {
>>> -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>> +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>>> frame->total_size);
>>>    if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
>>>  interrupt_save_prologue_temp = true;
>>>  }
>>> @@ -4913,31 +4913,31 @@ riscv_restore_reg (rtx reg, rtx mem)
>>> without adding extra instructions.  */
>>>
>>>  static HOST_WIDE_INT
>>> -riscv_first_stack_step (struct riscv_frame_info *frame)
>>> +riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 
>>> remaining_size)
>>>  {
>>> -  HOST_WIDE_INT frame_total_constant_size;
>>> -  if (!frame->total_size.is_constant ())
>>> -    frame_total_constant_size
>>> -  = riscv_stack_align (frame->total_size.coeffs[0])
>>> -   - riscv_stack_align (frame->total_size.coeffs[1]);
>>> +  HOST_WIDE_INT remaining_const_size;
&g

[PATCH 0/3] RISC-V: optimize stack manipulation in save-restore

2022-12-01 Thread Fei Gao
The patches allow less instructions to be used in stack allocation 
and deallocation if save-restore enabled, and also make the stack 
manipulation codes more readable.

Fei Gao (3):
  RISC-V: add a new parameter in riscv_first_stack_step.
  RISC-V: optimize stack manipulation in save-restore
  RISC-V: make the stack manipulation codes more readable.

 gcc/config/riscv/riscv.cc | 105 +-
 .../gcc.target/riscv/stack_save_restore.c |  40 +++
 2 files changed, 95 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c

-- 
2.17.1



[PATCH 3/3] RISC-V: make the stack manipulation codes more readable.

2022-12-01 Thread Fei Gao
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_first_stack_step): make codes more 
readable.
(riscv_expand_epilogue): likewise.
---
 gcc/config/riscv/riscv.cc | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a50f2303032..95da08ffb3b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4926,8 +4926,11 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
poly_int64 remaining_siz
   if (SMALL_OPERAND (remaining_const_size))
 return remaining_const_size;
 
+  poly_int64 callee_saved_first_step =
+remaining_size - frame->frame_pointer_offset;
+  gcc_assert(callee_saved_first_step.is_constant ());
   HOST_WIDE_INT min_first_step =
-riscv_stack_align ((remaining_size - 
frame->frame_pointer_offset).to_constant());
+riscv_stack_align (callee_saved_first_step.to_constant ());
   HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
   HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
   gcc_assert (min_first_step <= max_first_step);
@@ -4935,7 +4938,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame, 
poly_int64 remaining_siz
   /* As an optimization, use the least-significant bits of the total frame
  size, so that the second adjustment step is just LUI + ADD.  */
   if (!SMALL_OPERAND (min_second_step)
-  && remaining_const_size % IMM_REACH < IMM_REACH / 2
+  && remaining_const_size % IMM_REACH <= max_first_step
   && remaining_const_size % IMM_REACH >= min_first_step)
 return remaining_const_size % IMM_REACH;
 
@@ -5129,14 +5132,14 @@ riscv_adjust_libcall_cfi_epilogue ()
 void
 riscv_expand_epilogue (int style)
 {
-  /* Split the frame into two.  STEP1 is the amount of stack we should
- deallocate before restoring the registers.  STEP2 is the amount we
- should deallocate afterwards.
+  /* Split the frame into 3 steps. STEP1 is the amount of stack we should
+ deallocate before restoring the registers. STEP2 is the amount we
+ should deallocate afterwards including the callee saved regs. STEP3
+ is the amount deallocated by save-restore libcall.
 
  Start off by assuming that no registers need to be restored.  */
   struct riscv_frame_info *frame = >machine->frame;
   unsigned mask = frame->mask;
-  poly_int64 step1 = frame->total_size;
   HOST_WIDE_INT step2 = 0;
   bool use_restore_libcall = ((style == NORMAL_RETURN)
  && riscv_use_save_libcall (frame));
@@ -5223,7 +5226,7 @@ riscv_expand_epilogue (int style)
   if (use_restore_libcall)
 frame->mask = mask; /* Undo the above fib.  */
 
-  step1 -= step2 + libcall_size;
+  poly_int64 step1 = frame->total_size - step2 - libcall_size;
 
   /* Set TARGET to BASE + STEP1.  */
   if (known_gt (step1, 0))
-- 
2.17.1



  1   2   >