Re: [PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-29 Thread Sinan via Gcc-patches
>> +/* Return TRUE if Zcmp push and pop insns should be
>> + avoided. FALSE otherwise.
>> + Only use multi push & pop if all GPRs masked can be covered,
>> + and stack access is SP based,
>> + and GPRs are at top of the stack frame,
>> + and no conflicts in stack allocation with other features */
>> +static bool
>> +riscv_avoid_multi_push(const struct riscv_frame_info *frame)
>> +{
>> + if (!TARGET_ZCMP
>> + || crtl->calls_eh_return
>> + || frame_pointer_needed
>> + || cfun->machine->interrupt_handler_p
>> + || cfun->machine->varargs_size != 0
>> + || crtl->args.pretend_args_size != 0
>> + || (frame->mask & ~ MULTI_PUSH_GPR_MASK))
>> + return true;
>> +
>> + return false;
>> +}
Any reason to skip generating push/pop in the cases where a frame pointer is 
needed?
IIRC, only code compiled with O1 and above will omit frame pointer, if so then 
code with
O0 will never generate cm.push/pop. 
Same question for interrupt_handler_p. I think cm.push/pop can handle this 
case. e.g.
the test case zc-zcmp-push-pop-6.c from Jiawei's patch.
BR,
Sinan
--
Sender:Fei Gao 
Sent At:2023 May 16 (Tue.) 17:34
Recipient:sinan.lin ; jiawei ; 
shihua ; lidie 
Cc:Fei Gao 
Subject:[PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp
Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2.
please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch.
gcc/ChangeLog:
 * config/riscv/predicates.md (slot_0_offset_operand): predicates for slot 0 
offset.
 (slot_1_offset_operand): likewise
 (slot_2_offset_operand): likewise
 (slot_3_offset_operand): likewise
 (slot_4_offset_operand): likewise
 (slot_5_offset_operand): likewise
 (slot_6_offset_operand): likewise
 (slot_7_offset_operand): likewise
 (slot_8_offset_operand): likewise
 (slot_9_offset_operand): likewise
 (slot_10_offset_operand): likewise
 (slot_11_offset_operand): likewise
 (slot_12_offset_operand): likewise
 (stack_push_up_to_ra_operand): predicates for stack adjust of pushing ra
 (stack_push_up_to_s0_operand): predicates for stack adjust of pushing ra, s0
 (stack_push_up_to_s1_operand): likewise
 (stack_push_up_to_s2_operand): likewise
 (stack_push_up_to_s3_operand): likewise
 (stack_push_up_to_s4_operand): likewise
 (stack_push_up_to_s5_operand): likewise
 (stack_push_up_to_s6_operand): likewise
 (stack_push_up_to_s7_operand): likewise
 (stack_push_up_to_s8_operand): likewise
 (stack_push_up_to_s9_operand): likewise
 (stack_push_up_to_s11_operand): likewise
 (stack_pop_up_to_ra_operand): predicates for stack adjust of poping ra
 (stack_pop_up_to_s0_operand): predicates for stack adjust of poping ra, s0
 (stack_pop_up_to_s1_operand): likewise
 (stack_pop_up_to_s2_operand): likewise
 (stack_pop_up_to_s3_operand): likewise
 (stack_pop_up_to_s4_operand): likewise
 (stack_pop_up_to_s5_operand): likewise
 (stack_pop_up_to_s6_operand): likewise
 (stack_pop_up_to_s7_operand): likewise
 (stack_pop_up_to_s8_operand): likewise
 (stack_pop_up_to_s9_operand): likewise
 (stack_pop_up_to_s11_operand): likewise
 * config/riscv/riscv-protos.h (riscv_zcmp_valid_slot_offset_p): declaration
 (riscv_zcmp_valid_stack_adj_bytes_p): declaration
 * config/riscv/riscv.cc (struct riscv_frame_info): comment change
 (riscv_avoid_multi_push): helper function of riscv_use_multi_push
 (riscv_use_multi_push): true if multi push is used
 (riscv_multi_push_sregs_count): num of sregs in multi-push
 (riscv_multi_push_regs_count): num of regs in multi-push
 (riscv_16bytes_align): align to 16 bytes
 (riscv_stack_align): moved to a better place
 (riscv_save_libcall_count): no functional change
 (riscv_compute_frame_info): add zcmp frame info
 (riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
 (get_slot_offset_rtx): get the rtx of slot to push or pop
 (riscv_gen_multi_push_pop_insn): gen function for multi push and pop
 (riscv_expand_prologue): allocate stack by cm.push
 (riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
 (riscv_expand_epilogue): allocate stack by cm.pop[ret]
 (zcmp_base_adj): calculate stack adjustment base size
 (zcmp_additional_adj): calculate stack adjustment additional size
 (riscv_zcmp_valid_slot_offset_p): check if offset is valid for a slot
 (riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment size is valid
 * config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
 (S0_MASK): likewise
 (S1_MASK): likewise
 (S2_MASK): likewise
 (S3_MASK): likewise
 (S4_MASK): likewise
 (S5_MASK): likewise
 (S6_MASK): likewise
 (S7_MASK): likewise
 (S8_MASK): likewise
 (S9_MASK): likewise
 (S10_MASK): likewise
 (S11_MASK): likewise
 (MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
 (ZCMP_MAX_SPIMM): max spimm value
 (ZCMP_SP_INC_STEP): zcmp sp increment step
 (ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
 (ZCMP_S0S11_SREGS_COUNTS): num of s0

Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-12 Thread Sinan via Gcc-patches
Hi, Kito and Jiawei
I have noticed several comments are not accurate or no longer valid(e.g. only 
for zc 0.5) and they need an update or improvement.
> +
>> +namespace {
>> +
>> +/*
>> + 1. preprocessing:
>> + 1.1. if there is no push rtx, then just return. e.g.
>> + (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> + (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> + (plus:SI (reg/f:SI 2 sp)
>> + (const_int -32 [0xffe0])))
>> + (nil))
>> + (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> + 1.2. if push rtx exists, then we compute the number of
>> + pushed s-registers, n_sreg.
>> +
>> + push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>> +
>> + [2 and 3 happend simultaneously]
>> + 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> + and aN is not used the move pattern, and sN is not
>> + defined before the move pattern (from prologue to the
>> + position of move pattern).
>> + 3. analysis use and reach of every instruction from prologue
>> + to the position of move pattern.
>> + if any sN is used, then we mark the corresponding argument list
>> + candidate as invalid.
>> + e.g.
>> + push {ra,s0-s3}, {}, -32
>> + sw s0,44(sp) # s0 is used, then argument list is invalid
>> + mv a0,a5 # a0 is defined, then argument list is invalid
>> + ...
>> + mv s0,a0
>> + mv s1,a1
>> + mv s2,a2
>> +
>> + 4. if there is a valid argument list, then replace the pop
>> + push parallel insn, and delete mv pattern.
>> + if not, skip.
>> +*/
>
>I am not sure I understand this optimization pass correctly,
>could you give more example or indicate which testcase can demonstrate
>this pass?
>
>And I would prefer this pass split from this patch, let it become a separated
>patch including testcase.
This comment is incorrect.
this pass is to search `ret`, `cm.pop` and `mv a0, 0` and try to combine them 
into cm.popretz and you can find relevant cm.popretz testcases from 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg304545.html
> @@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info 
> *frame)
>> return frame->save_libcall_adjustment != 0;
>> }
>>
>> +/* Determine how many instructions related to push/pop instructions. */
>> +
>> +static unsigned
>> +riscv_save_push_pop_count (unsigned mask)
>> +{
>> + if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>> + return 0;
>> + for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>> + if (BITSET_P (mask, n)
>> + && !call_used_regs [n])
>> + /* add ra saving and sp adjust. */
>> + return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>
>What the magic number of `+ 1 + 2`?
well, it is really misleading here, and it is better to make it more clear ...
`riscv_save_push_pop_count` is used to calculate the expected size of the 
push/pop parallel pattern(the number saved/restored registers plus one sp 
adjust pattern), so
the number of xreg saved/restored = CALLEE_SAVED_REG_NUMBER (n) + 1 and then 
the `+2` is for ra and sp adjustment patterns ...
>> +riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>> +{
>> + unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>> + unsigned int n_reg = veclen - 1;
>
> Need comment to explain why `- 1` here.
so we could use `-1` to calculate how many registers are saved/restored here.
BR,
Sinan
--
Sender:Kito Cheng 
Sent At:2023 May 4 (Thu.) 17:04
Recipient:Jiawei 
Cc:gcc-patches ; kito.cheng ; 
palmer ; christoph.muellner ; 
jeremy.bennett ; mary.bennett 
; nandni.jamnadas ; 
charlie.keaney ; simon.cook 
; tariq.kurd ; 
ibrahim.abu.kharmeh1 ; sinan.lin 
; wuwei2016 ; shihua 
; shiyulong ; chenyixuan 

Subject:Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
Could you rebase this patch, we have some changes on
> All "zcmpe" means Zcmp with RVE extension.
Use zcmp_rve instead, zcmpe seems like a new ext. name
> diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc 
> b/gcc/config/riscv/riscv-zcmp-popret.cc
> new file mode 100644
> index 000..d7b40f6a3e2
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-zcmp-popret.cc
> @@ -0,0 +1,260 @@
Need a header here like "^#$% for RISC-V Copyright (C) 2023 Free
Software Foundation, Inc." here
> +#include "config.h"
...
> +#include "cfgrtl.h"
> +
> +#define IN_TARGET_CODE 1
This should appear before include anything.
> +
> +namespace {
> +
> +/*
> + 1. preprocessing:
> + 1.1. if there is no push rtx, then just return. e.g.
> + (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> + (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> + (plus:SI (reg/f:SI 2 sp)
> + (const_int -32 [0xffe0])))
> + (nil))
> + (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> + 1.2. if push rtx exists, then we compute the number of
> + pushed s-registers, n_sreg.
> +
> + push rtx should be find before NOTE_INSN_PROLOGUE_END tag
> +
> + [2 and 3 happend simultaneously]
> + 2. find valid move pattern, mv sN, aN, where N < n_sreg,
> + and aN is not used the move pattern, and sN is not
> + defined before the move patte

Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-12 Thread Sinan via Gcc-patches
Hi Fei,
Sorry for the late reply, I've been busy with moving these days :(.
Thanks for working on it. I would prefer removing the extra pass for popretz if 
possible ... I will test your patches ASAP. 
BR,
Sinan
--
Sender:Fei Gao 
Sent At:2023 May 6 (Sat.) 16:53
Recipient:Sinan 
Cc:jiawei ; gcc-patches 
Subject:Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
On 2023-05-05 23:57 Sinan  wrote:
>
>> hi Jiawei
>>
>> Please ignore my previous reply. I accidently sent the email before I 
>> finished it.
>> Sorry for that!
>>
>> I downloaded the series of patches from you and found in some cases
>> it fails to generate zcmp push and pop insns.
>>
>> TC:
>>
>> char my_getchar();
>> int test_s0()
>> {
>>
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>> }
>>
>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>> -mcmodel=medlow test.c
>>
>> -fno-shrink-wrap-separate is used here to avoid the impact from 
>> shrink-wrap-separate that is by default
>> enabled in O2.
>>
>> As i'm also interested in Zc*, i did some changes mainly in prologue and 
>> epilogue pass quite simliar to
>> what has been done for save and restore except the CFI directives due to 
>> reversed order that zcmp
>> pushes and pops ra, s regs than what save and restore do.
>>
>> I will refine and share the code soon for your review.
>>
>> BR
>> Fei
>Hi Fei,
>In the current implementation, cm.push will not increase the original 
>adjustment size of the stack pointer. As cm.push uses a minimum adjustment 
>size of 16, and in your example, the adjustment size of sp is 12, so cm.push 
>will not be generated.
>you can find the check at riscv_use_push_pop
>> > + */
>> > + if (base_size > frame_size)
>> > + return false;
>> > +
>And if this check is removed, then you can get the output that you expect.
>```
> cm.push {ra,s0},-16
> call my_getchar
> mv s0,a0
> call my_getchar
> add a0,s0,a0
> cm.popret {ra,s0},16
>```
>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps 
>we can remove this check? I haven't tested if it is ok to remove this check, 
>and CC jiawei to help test it.
>BR,
>Sinan 
hi Sinan
Thanks for your reply. 
I posted my codes at 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
In the cover letter, i did some comparision. 
Could you please review?
Thanks & BR, 
Fei
>--
>Sender:Fei Gao 
>Sent At:2023 Apr. 25 (Tue.) 18:12
>Recipient:jiawei 
>Cc:gcc-patches 
>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>hi Jiawei
>Please ignore my previous reply. I accidently sent the email before I finished 
>it.
>Sorry for that!
>I downloaded the series of patches from you and found in some cases
>it fails to generate zcmp push and pop insns.
>TC:
>char my_getchar();
>int test_s0()
>{
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
>}
>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
>-mcmodel=medlow test.c
>-fno-shrink-wrap-separate is used here to avoid the impact from 
>shrink-wrap-separate that is by default
>enabled in O2.
>As i'm also interested in Zc*, i did some changes mainly in prologue and 
>epilogue pass quite simliar to
>what has been done for save and restore except the CFI directives due to 
>reversed order that zcmp
>pushes and pops ra, s regs than what save and restore do.
>I will refine and share the code soon for your review.
>BR
>Fei
>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>>
>>Add Zcmp extension instructions support. Generate push/pop
>>with follow steps:
>>
>> 1. preprocessing:
>> 1.1. if there is no push rtx, then just return. e.g.
>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> (plus:SI (reg/f:SI 2 sp)
>> (const_int -32 [0xffe0])))
>> (nil))
>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> 1.2. if push rtx exists, then we compute the number of
>> pushed s-registers, n_sreg.
>>
>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>
>> [2 and 3 happend simultaneously]
>>
>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> and aN is not used the move pattern, and sN is not
>> defined before the move pattern (from prologue to the
>> position of move pattern).
>>
>> 3. analysis use and reach of every instruction from prologue
>> to the position of move pattern.
>> if any sN is used, then we mark the corresponding argument list
>> candidate as invalid.
>> e.g.
>> push {ra,s0-s3}, {}, -32
>> sw s0,44(sp) # s0 is used, then argument list is invalid
>> mv a0,a5 # a0 is defined, then argument list is invalid
>> ...
>> mv s0,a0
>> mv s1,a1
>> mv s2,a2
>>
>> 4. if there is a valid argument list, then replace the pop
>> push parallel insn, and delete mv pattern.
>> if not, skip.
>>
>>All "zcmpe" means Zcmp with RVE extension.
>>The push/pop instrunction implement is mostly finished by Si

Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.

2023-05-05 Thread Sinan via Gcc-patches
> hi Jiawei
> 
> Please ignore my previous reply. I accidently sent the email before I 
> finished it.
> Sorry for that!
> 
> I downloaded the series of patches from you and found in some cases
> it fails to generate zcmp push and pop insns.
> 
> TC:
> 
> char my_getchar();
> int test_s0()
> {
> 
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
> }
> 
> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
> -mcmodel=medlow test.c
> 
> -fno-shrink-wrap-separate is used here to avoid the impact from 
> shrink-wrap-separate that is by default
> enabled in O2.
> 
> As i'm also interested in Zc*, i did some changes mainly in prologue and 
> epilogue pass quite simliar to
> what has been done for save and restore except the CFI directives due to 
> reversed order that zcmp
> pushes and pops ra, s regs than what save and restore do. 
> 
> I will refine and share the code soon for your review.
> 
> BR
> Fei
Hi Fei,
In the current implementation, cm.push will not increase the original 
adjustment size of the stack pointer. As cm.push uses a minimum adjustment size 
of 16, and in your example, the adjustment size of sp is 12, so cm.push will 
not be generated.
you can find the check at riscv_use_push_pop
> > + */
> > + if (base_size > frame_size)
> > + return false;
> > +
And if this check is removed, then you can get the output that you expect. 
```
 cm.push {ra,s0},-16
 call my_getchar
 mv s0,a0
 call my_getchar
 add a0,s0,a0
 cm.popret {ra,s0},16
```
In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we 
can remove this check? I haven't tested if it is ok to remove this check, and 
CC jiawei to help test it.
BR,
Sinan
--
Sender:Fei Gao 
Sent At:2023 Apr. 25 (Tue.) 18:12
Recipient:jiawei 
Cc:gcc-patches 
Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
hi Jiawei
Please ignore my previous reply. I accidently sent the email before I finished 
it.
Sorry for that!
I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.
TC:
char my_getchar();
int test_s0()
{
 int a = my_getchar();
 int b = my_getchar();
 return a+b;
}
cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e 
-mcmodel=medlow test.c
-fno-shrink-wrap-separate is used here to avoid the impact from 
shrink-wrap-separate that is by default
enabled in O2.
As i'm also interested in Zc*, i did some changes mainly in prologue and 
epilogue pass quite simliar to
what has been done for save and restore except the CFI directives due to 
reversed order that zcmp
pushes and pops ra, s regs than what save and restore do. 
I will refine and share the code soon for your review.
BR
Fei
On Thu Apr 6 06:21:17 GMT 2023 Jiawei jia...@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
> 1. preprocessing:
> 1.1. if there is no push rtx, then just return. e.g.
> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> (plus:SI (reg/f:SI 2 sp)
> (const_int -32 [0xffe0])))
> (nil))
> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> 1.2. if push rtx exists, then we compute the number of
> pushed s-registers, n_sreg.
>
> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
> [2 and 3 happend simultaneously]
>
> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
> and aN is not used the move pattern, and sN is not
> defined before the move pattern (from prologue to the
> position of move pattern).
>
> 3. analysis use and reach of every instruction from prologue
> to the position of move pattern.
> if any sN is used, then we mark the corresponding argument list
> candidate as invalid.
> e.g.
> push {ra,s0-s3}, {}, -32
> sw s0,44(sp) # s0 is used, then argument list is invalid
> mv a0,a5 # a0 is defined, then argument list is invalid
> ...
> mv s0,a0
> mv s1,a1
> mv s2,a2
>
> 4. if there is a valid argument list, then replace the pop
> push parallel insn, and delete mv pattern.
> if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin 
>Co-Authored by: Simon Cook 
>Co-Authored by: Shihua Liao 
>
>gcc/ChangeLog:
>
> * config.gcc: New object.
> * config/riscv/predicates.md (riscv_stack_push_operation):
> New predicate.
> (riscv_stack_pop_operation): Ditto.
> (pop_return_value_constant): Ditto.
> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
> * config/riscv/riscv-protos.h (riscv_output_popret_p):
> New routine.
> (riscv_valid_stack_push_pop_p): Ditto.
> (riscv_check_regno): Ditto.
> (make_pass_zcmp_popret): New pass.
> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
> (riscv_output_popret_p): New function.
> (riscv_print_pop_size): Ditto.
> (riscv_print_reglist): Ditto.
> (riscv_print_operand): New case symbols.
> (riscv_save_push_pop_count): New function.
> 

[PATCH] RISC-V: add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr3 patterns

2023-04-10 Thread Lin Sinan via Gcc-patches
From: Sinan Lin 

tell gcc that zbkb has these two spn to enable some optimizations. e.g.
1) the rrotate_expr could match to rotrm3 during expand; 2) hook up
__builtin_bswap64 with `rev8` in zbkb64.
---
 gcc/config/riscv/bitmanip.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7aa591689ba..3ed9f5d403a 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -297,7 +297,7 @@
   [(set (match_operand:GPR 0 "register_operand")
(rotatert:GPR (match_operand:GPR 1 "register_operand")
 (match_operand:QI 2 "arith_operand")))]
-  "TARGET_ZBB || TARGET_XTHEADBB"
+  "TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB"
 {
   if (TARGET_XTHEADBB && !immediate_operand (operands[2], VOIDmode))
 FAIL;
@@ -362,12 +362,12 @@
 (define_expand "bswapdi2"
   [(set (match_operand:DI 0 "register_operand")
(bswap:DI (match_operand:DI 1 "register_operand")))]
-  "TARGET_64BIT && (TARGET_ZBB || TARGET_XTHEADBB)")
+  "TARGET_64BIT && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_ZBKB)")
 
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand")
(bswap:SI (match_operand:SI 1 "register_operand")))]
-  "(!TARGET_64BIT && TARGET_ZBB) || TARGET_XTHEADBB")
+  "(!TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)) || TARGET_XTHEADBB")
 
 (define_insn "*bswap2"
   [(set (match_operand:X 0 "register_operand" "=r")
-- 
2.19.1.6.gb485710b



[PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-09 Thread Lin Sinan via Gcc-patches
From: Sinan Lin 

there is no need to split an xori/ori with an small constant. take the test
case `int foo(int idx) { return idx|3; }` as an example,

rv64im_zba generates:
ori a0,a0,3
ret
but, rv64im_zba_zbs generates:
ori a0,a0,1
ori a0,a0,2
ret

with this change, insn `ori r2,r1,3` will not be splitted in zbs.
---
 gcc/config/riscv/predicates.md |  2 +-
 .../gcc.target/riscv/zbs-extra-bit-or-twobits.c| 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..8654dbc5943 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -399,7 +399,7 @@
 (define_predicate "uimm_extra_bit_or_twobits"
   (and (match_code "const_int")
(ior (match_operand 0 "uimm_extra_bit_operand")
-   (match_operand 0 "const_twobits_operand"
+   (match_operand 0 "const_twobits_not_arith_operand"
 
 ;; A CONST_INT operand that fits into the negative half of a
 ;; signed-immediate after a single cleared top bit has been
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c 
b/gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c
new file mode 100644
index 000..ef7ed60461a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int or_two_bit(int idx) {
+return idx|3;
+}
+
+int xor_two_bit(int idx) {
+return idx^3;
+}
+
+/* { dg-final { scan-assembler-times "\tori\t" 1 } } */
+/* { dg-final { scan-assembler-times "\txori\t" 1 } } */
-- 
2.19.1.6.gb485710b



[PATCH] RISC-V: Fix wrong partial subreg check for bsetidisi

2023-02-27 Thread Lin Sinan via Gcc-patches
From: Lin Sinan 

The partial subreg check should be for subreg operand(operand 1) instead of
the immediate operand(operand 2). This change also fix pr68648.c in zbs.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Fix wrong index in the check.

---
 gcc/config/riscv/bitmanip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 14d18edbe62..58a86bd929f 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -442,7 +442,7 @@
(ior:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
(match_operand 2 "single_bit_mask_operand" "i")))]
   "TARGET_ZBS && TARGET_64BIT
-   && !partial_subreg_p (operands[2])"
+   && !partial_subreg_p (operands[1])"
   "bseti\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
-- 
2.34.1



[PATCH] RISC-V: Allow const0_rtx operand in max/min

2023-02-27 Thread Sinan via Gcc-patches
From 73e743348a49a7fffcf2e328b8179e8dbbc3b2b4 Mon Sep 17 00:00:00 2001
From: Lin Sinan 
Date: Tue, 28 Feb 2023 00:44:55 +0800
Subject: [PATCH] RISC-V: Allow const0_rtx operand in max/min
Optimize cases that use max[u]/min[u] against a zero constant.
E.g., the case int f(int x) { return x >= 0 ? x : 0; }
the current asm output in rv64gc_zba_zbb
 li rtmp,0
 max a0,a0,rtmp
could be optimized into
 max a0,a0,zero
gcc/ChangeLog:
 * config/riscv/bitmanip.md: allow 0 constant in max/min
 pattern. 
gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zbb-min-max-03.c: New test.
---
 gcc/config/riscv/bitmanip.md | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c | 10 ++
 2 files changed, 12 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 58a86bd929f..f771835369c 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -363,9 +363,9 @@
 (define_insn "3"
 [(set (match_operand:X 0 "register_operand" "=r")
 (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
- (match_operand:X 2 "register_operand" "r")))]
+ (match_operand:X 2 "reg_or_0_operand" "rJ")))]
 "TARGET_ZBB"
- "\t%0,%1,%2"
+ "\t%0,%1,%z2"
 [(set_attr "type" "bitmanip")])
 ;; Optimize the common case of a SImode min/max against a constant
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
new file mode 100644
index 000..947300d599d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int f(int x) {
+ return x >= 0 ? x : 0;
+}
+
+/* { dg-final { scan-assembler-times "max\t" 1 } } */
+/* { dg-final { scan-assembler-not "li\t" } } */
-- 
2.34.1


Re: [PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.

2022-11-24 Thread Sinan via Gcc-patches
> The motivation of this patch is to correct the wrong estimation of
>> the number of instructions needed for loading a 64bit constant in
>> rv32 in the current cost model(riscv_interger_cost). According to
>> the current implementation, if a constant requires more than 3
>> instructions(riscv_const_insn and riscv_legitimate_constant_p),
>> then the constant will be put into constant pool when expanding
>> gimple to rtl(legitimate_constant_p hook and emit_move_insn).
>> So the inaccurate cost model leads to the suboptimal codegen
>> in rv32 and the wrong estimation part could be corrected through
>> this fix.
>>
>> e.g. the current codegen for loading 0x839290001 in rv32
>>
>> lui a5,%hi(.LC0)
>> lw a0,%lo(.LC0)(a5)
>> lw a1,%lo(.LC0+4)(a5)
>> .LC0:
>> .word 958988289
>> .word 8
>>
>> output after this patch
>>
>> li a0,958988288
>> addi a0,a0,1
>> li a1,8
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.cc (riscv_build_integer): Handle the case of loading 
>> 64bit constant in rv32.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rv32-load-64bit-constant.c: New test.
>>
>> Signed-off-by: Lin Sinan 
>> ---
>> gcc/config/riscv/riscv.cc | 23 +++
>> .../riscv/rv32-load-64bit-constant.c | 38 +++
>> 2 files changed, 61 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 32f9ef9ade9..9dffabdc5e3 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, 
>> HOST_WIDE_INT value,
>> }
>> }
>> 
>> + if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)
>
> Nit. It's common practice to have the TARGET test first in a series of 
> tests. It may also be advisable to break this into two lines. 
> Something like this:
>
>
> if ((!TARGET_64BIT)
> || value > INT32_MAX || value < INT32_MIN)
>
>
> That's the style most GCC folks are more accustomed to reading.
Thanks for the tips and I will change it then.
>> + {
>> + unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>> + unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>> + struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
>> + hicode[RISCV_MAX_INTEGER_OPS];
>> + int hi_cost, lo_cost;
>> +
>> + hi_cost = riscv_build_integer_1 (hicode, hival, mode);
>> + if (hi_cost < cost)
>> + {
>> + lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
>> + if (lo_cost + hi_cost < cost)
>
> Just so I'm sure. "cost" here refers strictly to other synthesized 
> forms? If so, then ISTM that we'd want to generate the new style when 
> lo_cost + hi_cost < cost OR when lo_cost + hi_cost is less than loading 
> the constant from memory -- which is almost certainly more than "3" 
> since the sequence from memory will be at least 3 instructions, two of 
> which will hit memory.
>
>
> Jeff
>
Yes, almost right. The basic idea of this patch is to improve the cost
calculation for loading 64bit constant in rv32, instead of adding a new
way to load constant.
gcc now loads 0x739290001LL in rv32gc with three instructions,
 li a0,958988288
 addi a0,a0,1
 li a1,7
However, when it loads 0x839290001LL, the output assembly becomes
 lui a5,%hi(.LC0)
 lw a0,%lo(.LC0)(a5)
 lw a1,%lo(.LC0+4)(a5)
 .LC0:
 .word 958988289
 .word 8
The cost calculation is inaccurate in such cases, since loading these
two constant should have no difference in rv32 (just change `li a1,7`
to `li a1,8` to load the hi part). This patch will take these cases
into consideration.
BR,
Sinan


[PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.

2022-11-10 Thread Lin Sinan via Gcc-patches
The motivation of this patch is to correct the wrong estimation of the number 
of instructions needed for loading a 64bit constant in rv32 in the current cost 
model(riscv_interger_cost). According to the current implementation, if a 
constant requires more than 3 instructions(riscv_const_insn and 
riscv_legitimate_constant_p), then the constant will be put into constant pool 
when expanding gimple to rtl(legitimate_constant_p hook and emit_move_insn). So 
the inaccurate cost model leads to the suboptimal codegen in rv32 and the wrong 
estimation part could be corrected through this fix.

e.g. the current codegen for loading 0x839290001 in rv32

  lui a5,%hi(.LC0)
  lw  a0,%lo(.LC0)(a5)
  lw  a1,%lo(.LC0+4)(a5)
.LC0:
  .word   958988289
  .word   8

output after this patch

  li a0,958988288
  addi a0,a0,1
  li a1,8

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer): Handle the case of 
loading 64bit constant in rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32-load-64bit-constant.c: New test.

Signed-off-by: Lin Sinan 
---
 gcc/config/riscv/riscv.cc | 23 +++
 .../riscv/rv32-load-64bit-constant.c  | 38 +++
 2 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32f9ef9ade9..9dffabdc5e3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 }
 
+  if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)
+{
+  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
+   hicode[RISCV_MAX_INTEGER_OPS];
+  int hi_cost, lo_cost;
+
+  hi_cost = riscv_build_integer_1 (hicode, hival, mode);
+  if (hi_cost < cost)
+   {
+ lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
+ if (lo_cost + hi_cost < cost)
+   {
+ memcpy (codes, alt_codes,
+ lo_cost * sizeof (struct riscv_integer_op));
+ memcpy (codes + lo_cost, hicode,
+ hi_cost * sizeof (struct riscv_integer_op));
+ cost = lo_cost + hi_cost;
+   }
+   }
+}
+
   return cost;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c 
b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
new file mode 100644
index 000..61d482fb283
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32im -mabi=ilp32 -O1" } */
+
+/* This test only applies to RV32. Some of 64bit constants in this test will 
be put
+into the constant pool in RV64, since RV64 might need one extra instruction to 
load
+64bit constant. */
+
+unsigned long long
+rv32_mov_64bit_int1 (void)
+{
+  return 0x739290001LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int2 (void)
+{
+  return 0x839290001LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int3 (void)
+{
+  return 0x392900013929LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int4 (void)
+{
+  return 0x392900113929LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int5 (void)
+{
+  return 0x14736def3929LL;
+}
+
+/* { dg-final { scan-assembler-not "lw\t" } } */
-- 
2.36.0



RE: [PATCH] RISC-V: cost model for loading 64bit constant in rv32

2022-11-09 Thread Sinan via Gcc-patches
>> comparison with clang:
>> https://godbolt.org/z/v5nxTbKe9 
>
> IIUC the rules are generally no compiler explorer links (so we can
> preserve history) and no attachment patches (ie, inline them like
> git-send-email does). There's some documenation on sending patches at
> .
>
> Given those usually show up for first-time contributors there's also
> some copyright/DCO procedures to bo followed. I see some other
> linux.alibaba.com emails called out explicitly, but then also a generic
> "GCC Alibaba Group Holding Limited". I think that means we're OK for
> copyright assignment here? There's still the "you wrote the code" bits
> that are worth reading, though.
Thanks for your info. I will resend this patch according to your suggestion. As 
for the copyright, people and I with the linux.alibaba.com email have FSF 
copyright assignment and copyright won't be a problem.
> On Tue, 08 Nov 2022 19:26:01 PST (-0800), gcc-patches@gcc.gnu.org wrote:
>> loading constant 0x739290001LL in rv32 can be done with three instructions
>> output:
>> li a1, 7
>> lui a1, 234128
>> addi a1, a1, 1
>
> Probably more useful to load the constant into two different registers? 
> The point is the same, though, so just a commit message issue.
>
>> Similarly, loading 0x839290001LL in rv32 can be done within three 
>> instructions
>> expected output:
>> li a1, 8
>> lui a1, 234128
>> addi a1, a1, 1
> However, riscv_build_integer does not handle this case well and makes a wrong 
> prediction about the number of instructions > needed and then the constant is 
> forced to put in the memory via riscv_const_insns and emit_move_insn.
>> real output:
>> lui a4,%hi(.LC0)
>> lw a2,%lo(.LC0)(a4)
>> lw a3,%lo(.LC0+4)(a4)
>> .LC0:
>> .word958988289
>> .word8
>
> That's still 3 instructions, but having loads and a constant is worse so
> that's just another commit message issue.
>
>
> Looking at the attached patch:
>
>> + if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)
>> + {
>> + unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>> + unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>> + struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
>> + hicode[RISCV_MAX_INTEGER_OPS];
>> + int hi_cost, lo_cost;
>> +
>> + hi_cost = riscv_build_integer_1 (hicode, hival, mode);
>> + if (hi_cost < cost)
>> + {
>> + lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
>> + if (lo_cost + hi_cost < cost)
>> + {
>> + memcpy (codes, alt_codes,
>> + lo_cost * sizeof (struct riscv_integer_op));
>> + memcpy (codes + lo_cost, hicode,
>> + hi_cost * sizeof (struct riscv_integer_op));
>> + cost = lo_cost + hi_cost;
>> + }
>> + }
>> + }
>
> This kind of stuff always seems like it should be possible to handle
> with generic middle-end optimizations: it's essentially just splitting
> the 64-bit constant into two 32-bit constants, which is free because
> it's going into two registers anyway. That's not a RISC-V specific
> problem, it's just the case any time a constant is going to be split
> between two registers -- it'd even apply for 128-bit constants on rv64,
> though those are probably rare enough they don't matter all that much.
>
> I'm not opposed to doing this in the backend, but maybe it's a sign
> we've just screwed something else up and can avoid adding the code?
Sorry for not making it clear. The motivation of this patch is to correct the 
wrong estimation of the number of instructions needed for loading a 64bit 
constant in rv32 in the current cost model(riscv_interger_cost). According to 
the current implementation, if a constant requires more than 3 
instructions(riscv_const_insn and riscv_legitimate_constant_p), then the 
constant will be put into constant pool when expanding gimple to 
rtl(legitimate_constant_p hook and emit_move_insn). So at least the wrong 
estimation part in rv32 could be improved.
Strategies of loading a big constant vary from backend ot backend, and I think 
it makes sense that let the backend make the decision. As for RV64 and int128, 
I did not investigate it and it seems a bit different from rv32 case. For 
example, loading 64bit constant 0x839290001LL in rv64 actually requires one 
more instruction than rv32.
```
lui a1, 132
addiw a1,a1,-1751
slli a1,a1,16
addi a1,a1,1
```
>> +++ b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
>> @@ -0,0 +1,35 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv32gc -mabi=ilp32 -Os" } */
>
> This has the same library/abi problems we've had before, but in this
> case I think it's fine to just leave the -march/-mabi out: the test
> cases won't be that exciting on rv64 (unless we add the 128-bit splits
> too?), but they should still stay away from the constant pool.
>
> IIUC this should work on more than Os, at least O2/above? Not that
> exciting for the test case, though.
>
>> +/* { dg-final { scan-assembler-not "\.LC\[0-9\]" } } */
>
> That's a bit fragile, maybe just ma

[PATCH] Fix doc typo

2022-11-08 Thread Sinan via Gcc-patches
add a missing variable name.


0001-doc-FixDocTypo.patch
Description: Binary data


[PATCH] RISC-V: cost model for loading 64bit constant in rv32

2022-11-08 Thread Sinan via Gcc-patches
loading constant 0x739290001LL in rv32 can be done with three instructions
output:
li a1, 7
lui a1, 234128
addi a1, a1, 1
Similarly, loading 0x839290001LL in rv32 can be done within three instructions
expected output:
li a1, 8
lui a1, 234128
addi a1, a1, 1
However, riscv_build_integer does not handle this case well and makes a wrong 
prediction about the number of instructions needed and then the constant is 
forced to put in the memory via riscv_const_insns and emit_move_insn.
real output:
lui a4,%hi(.LC0)
lw a2,%lo(.LC0)(a4)
lw a3,%lo(.LC0+4)(a4)
.LC0:
 .word958988289
 .word8
comparison with clang:
https://godbolt.org/z/v5nxTbKe9 


0001-riscv-improve-cost-model-rv32-load-64bit-constant.patch
Description: Binary data