Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Kito Cheng via Gcc-patches
pushed v1 to trunk

On Fri, May 5, 2023 at 8:46 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Ok, sounds good. Thank you!
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Friday, May 5, 2023 8:37 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> I will take V1 and commit to trunk after my local test is done :)
>
> On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
> >
> > Hi kito,
> >
> > Could you please help to share any suggestion about the PATCH? Comparing 
> > the V1 and V2.
> >
> > Pan
> >
> >
> > -Original Message-
> > From: Li, Pan2
> > Sent: Wednesday, May 3, 2023 7:18 PM
> > To: Jeff Law ; Kito Cheng
> > 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > ; Andrew Waterman 
> > Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> > Thanks all for comments, will work with kito to make it happen.
> >
> > Pan
> >
> > -Original Message-
> > From: Jeff Law 
> > Sent: Wednesday, May 3, 2023 12:28 AM
> > To: Kito Cheng 
> > Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org;
> > juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew
> > Waterman 
> > Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> >
> >
> > On 4/29/23 19:40, Kito Cheng wrote:
> > > Hi Jeff:
> > >
> > > The RTL pattern already models tail element and vector length well,
> > > so I don't feel the first version of Pan's patch has any problem?
> > >
> > > Input RTL pattern:
> > >
> > > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > > #(const_vector:VNx2BI repeat [
> > > #(const_int 1 [0x1])
> > > #])  # all-1 mask
> > > #(reg:DI 143)  # AVL reg, or vector length
> > > #(const_int 2 [0x2]) # mask policy
> > > #(const_int 0 [0])   # avl type
> > > #(reg:SI 66 vl)
> > > #(reg:SI 67 vtype)
> > > #] UNSPEC_VPREDICATE)
> > > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > > #(reg/v:VNx2QI 137 [ v1 ]))
> > > #(unspec:VNx2BI [
> > > #(reg:SI 0 zero)
> > > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > > # (expr_list:REG_DEAD (reg:DI 143)
> > > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > > #(nil
> > >
> > > And the split pattern, only did on tail/maskoff element with undefined 
> > > value:
> > >
> > > (define_split
> > >   [(set (match_operand:VB  0 "register_operand")
> > > (if_then_else:VB
> > >   (unspec:VB
> > > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> > >  (match_operand4 "vector_length_operand")
> > >  (match_operand5 "const_int_operand")
> > >  (match_operand6 "const_int_operand")
> > >  (reg:SI VL_REGNUM)
> > >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> > >   (match_operand:VB3 "vector_move_operand")
> > >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > > and tail operand, only match undef value
> > >
> > > Then it turns into vmset, and also discard mask policy operand
> > > (since maskoff is undef means don't care IMO):
> > >
> > > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > (if_then_else:VNx2BI (unspec:VNx2BI [
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])  # all-1 mask
> > > (reg:DI 143) # AVL reg, or vector length
> > > (const_int 2 [0x2]) # mask policy
> > > (reg:SI 66 vl)
> > > (reg:SI 67 vtype)
> > > ] UNSPEC_VPREDICATE)
> > > (const_vector:VNx2BI repeat [
> > > (const_int 1 [0x1])
> > > ])  

RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Li, Pan2 via Gcc-patches
Ok, sounds good. Thank you!

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, May 5, 2023 8:37 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the 
> V1 and V2.
>
> Pan
>
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law ; Kito Cheng 
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> ; Andrew Waterman 
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew 
> Waterman 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, 
> > so I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > #(const_vector:VNx2BI repeat [
> > #(const_int 1 [0x1])
> > #])  # all-1 mask
> > #(reg:DI 143)  # AVL reg, or vector length
> > #(const_int 2 [0x2]) # mask policy
> > #(const_int 0 [0])   # avl type
> > #(reg:SI 66 vl)
> > #(reg:SI 67 vtype)
> > #] UNSPEC_VPREDICATE)
> > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #(reg/v:VNx2QI 137 [ v1 ]))
> > #(unspec:VNx2BI [
> > #(reg:SI 0 zero)
> > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > # (expr_list:REG_DEAD (reg:DI 143)
> > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #(nil
> >
> > And the split pattern, only did on tail/maskoff element with undefined 
> > value:
> >
> > (define_split
> >   [(set (match_operand:VB  0 "register_operand")
> > (if_then_else:VB
> >   (unspec:VB
> > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >  (match_operand4 "vector_length_operand")
> >  (match_operand5 "const_int_operand")
> >  (match_operand6 "const_int_operand")
> >  (reg:SI VL_REGNUM)
> >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >   (match_operand:VB3 "vector_move_operand")
> >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand 
> > (since maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > (if_then_else:VNx2BI (unspec:VNx2BI [
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])  # all-1 mask
> > (reg:DI 143) # AVL reg, or vector length
> > (const_int 2 [0x2]) # mask policy
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > ] UNSPEC_VPREDICATE)
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])# all-1
> > (unspec:VNx2BI [
> > (reg:SI 0 zero)
> > ] UNSPEC_VUNDEF))) # still vundef
> >  (expr_list:REG_DEAD (reg:DI 143)
> > (nil)))
> Right.  My concern is that when we call relational_result it's going to 
> return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to 
> generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector 
> register to -1, it could just copy from the destination of the vmset to the 
> new target.  But if the vmset didn't set all the bits to 1, then the code is 
> wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Kito Cheng via Gcc-patches
I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2  wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the 
> V1 and V2.
>
> Pan
>
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law ; Kito Cheng 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> ; Andrew Waterman 
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng 
> Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew 
> Waterman 
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, so
> > I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #(if_then_else:VNx2BI (unspec:VNx2BI [
> > #(const_vector:VNx2BI repeat [
> > #(const_int 1 [0x1])
> > #])  # all-1 mask
> > #(reg:DI 143)  # AVL reg, or vector length
> > #(const_int 2 [0x2]) # mask policy
> > #(const_int 0 [0])   # avl type
> > #(reg:SI 66 vl)
> > #(reg:SI 67 vtype)
> > #] UNSPEC_VPREDICATE)
> > #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #(reg/v:VNx2QI 137 [ v1 ]))
> > #(unspec:VNx2BI [
> > #(reg:SI 0 zero)
> > #] UNSPEC_VUNDEF))) # maskoff and tail operand
> > # (expr_list:REG_DEAD (reg:DI 143)
> > #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #(nil
> >
> > And the split pattern, only did on tail/maskoff element with undefined 
> > value:
> >
> > (define_split
> >   [(set (match_operand:VB  0 "register_operand")
> > (if_then_else:VB
> >   (unspec:VB
> > [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >  (match_operand4 "vector_length_operand")
> >  (match_operand5 "const_int_operand")
> >  (match_operand6 "const_int_operand")
> >  (reg:SI VL_REGNUM)
> >  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >   (match_operand:VB3 "vector_move_operand")
> >   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand (since
> > maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > (if_then_else:VNx2BI (unspec:VNx2BI [
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])  # all-1 mask
> > (reg:DI 143) # AVL reg, or vector length
> > (const_int 2 [0x2]) # mask policy
> > (reg:SI 66 vl)
> > (reg:SI 67 vtype)
> > ] UNSPEC_VPREDICATE)
> > (const_vector:VNx2BI repeat [
> > (const_int 1 [0x1])
> > ])# all-1
> > (unspec:VNx2BI [
> > (reg:SI 0 zero)
> > ] UNSPEC_VUNDEF))) # still vundef
> >  (expr_list:REG_DEAD (reg:DI 143)
> > (nil)))
> Right.  My concern is that when we call relational_result it's going to 
> return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to 
> generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector 
> register to -1, it could just copy from the destination of the vmset to the 
> new target.  But if the vmset didn't set all the bits to 1, then the code is 
> wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff


RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-05 Thread Li, Pan2 via Gcc-patches
Hi kito,

Could you please help to share any suggestion about the PATCH? Comparing the V1 
and V2.

Pan


-Original Message-
From: Li, Pan2 
Sent: Wednesday, May 3, 2023 7:18 PM
To: Jeff Law ; Kito Cheng 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; Andrew Waterman 
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks all for comments, will work with kito to make it happen.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew Waterman 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #(if_then_else:VNx2BI (unspec:VNx2BI [
> #(const_vector:VNx2BI repeat [
> #(const_int 1 [0x1])
> #])  # all-1 mask
> #(reg:DI 143)  # AVL reg, or vector length
> #(const_int 2 [0x2]) # mask policy
> #(const_int 0 [0])   # avl type
> #(reg:SI 66 vl)
> #(reg:SI 67 vtype)
> #] UNSPEC_VPREDICATE)
> #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #(reg/v:VNx2QI 137 [ v1 ]))
> #(unspec:VNx2BI [
> #(reg:SI 0 zero)
> #] UNSPEC_VUNDEF))) # maskoff and tail operand
> # (expr_list:REG_DEAD (reg:DI 143)
> #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #(nil
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB  0 "register_operand")
> (if_then_else:VB
>   (unspec:VB
> [(match_operand:VB 1 "vector_all_trues_mask_operand")
>  (match_operand4 "vector_length_operand")
>  (match_operand5 "const_int_operand")
>  (match_operand6 "const_int_operand")
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (match_operand:VB3 "vector_move_operand")
>   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> (if_then_else:VNx2BI (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])  # all-1 mask
> (reg:DI 143) # AVL reg, or vector length
> (const_int 2 [0x2]) # mask policy
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])# all-1
> (unspec:VNx2BI [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF))) # still vundef
>  (expr_list:REG_DEAD (reg:DI 143)
> (nil)))
Right.  My concern is that when we call relational_result it's going to return 
-1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to 
generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector 
register to -1, it could just copy from the destination of the vmset to the new 
target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff


RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-03 Thread Li, Pan2 via Gcc-patches
Thanks all for comments, will work with kito to make it happen.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew Waterman 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #(if_then_else:VNx2BI (unspec:VNx2BI [
> #(const_vector:VNx2BI repeat [
> #(const_int 1 [0x1])
> #])  # all-1 mask
> #(reg:DI 143)  # AVL reg, or vector length
> #(const_int 2 [0x2]) # mask policy
> #(const_int 0 [0])   # avl type
> #(reg:SI 66 vl)
> #(reg:SI 67 vtype)
> #] UNSPEC_VPREDICATE)
> #(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #(reg/v:VNx2QI 137 [ v1 ]))
> #(unspec:VNx2BI [
> #(reg:SI 0 zero)
> #] UNSPEC_VUNDEF))) # maskoff and tail operand
> # (expr_list:REG_DEAD (reg:DI 143)
> #(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #(nil
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB  0 "register_operand")
> (if_then_else:VB
>   (unspec:VB
> [(match_operand:VB 1 "vector_all_trues_mask_operand")
>  (match_operand4 "vector_length_operand")
>  (match_operand5 "const_int_operand")
>  (match_operand6 "const_int_operand")
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>   (match_operand:VB3 "vector_move_operand")
>   (match_operand:VB2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> (if_then_else:VNx2BI (unspec:VNx2BI [
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])  # all-1 mask
> (reg:DI 143) # AVL reg, or vector length
> (const_int 2 [0x2]) # mask policy
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (const_vector:VNx2BI repeat [
> (const_int 1 [0x1])
> ])# all-1
> (unspec:VNx2BI [
> (reg:SI 0 zero)
> ] UNSPEC_VUNDEF))) # still vundef
>  (expr_list:REG_DEAD (reg:DI 143)
> (nil)))
Right.  My concern is that when we call relational_result it's going to return 
-1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to 
generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector 
register to -1, it could just copy from the destination of the vmset to the new 
target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-02 Thread Jeff Law via Gcc-patches




On 4/29/23 19:40, Kito Cheng wrote:

Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#(if_then_else:VNx2BI (unspec:VNx2BI [
#(const_vector:VNx2BI repeat [
#(const_int 1 [0x1])
#])  # all-1 mask
#(reg:DI 143)  # AVL reg, or vector length
#(const_int 2 [0x2]) # mask policy
#(const_int 0 [0])   # avl type
#(reg:SI 66 vl)
#(reg:SI 67 vtype)
#] UNSPEC_VPREDICATE)
#(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#(reg/v:VNx2QI 137 [ v1 ]))
#(unspec:VNx2BI [
#(reg:SI 0 zero)
#] UNSPEC_VUNDEF))) # maskoff and tail operand
# (expr_list:REG_DEAD (reg:DI 143)
#(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#(nil

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
  [(set (match_operand:VB  0 "register_operand")
(if_then_else:VB
  (unspec:VB
[(match_operand:VB 1 "vector_all_trues_mask_operand")
 (match_operand4 "vector_length_operand")
 (match_operand5 "const_int_operand")
 (match_operand6 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operand:VB3 "vector_move_operand")
  (match_operand:VB2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
(if_then_else:VNx2BI (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])  # all-1 mask
(reg:DI 143) # AVL reg, or vector length
(const_int 2 [0x2]) # mask policy
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])# all-1
(unspec:VNx2BI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) # still vundef
 (expr_list:REG_DEAD (reg:DI 143)
(nil)))
Right.  My concern is that when we call relational_result it's going to 
return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the 
potential to generate incorrect code.


For example, if there's a subsequent instruction that tried to set a 
vector register to -1, it could just copy from the destination of the 
vmset to the new target.  But if the vmset didn't set all the bits to 1, 
then the code is wrong.


With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.


Jeff


RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-30 Thread Li, Pan2 via Gcc-patches
Thanks all for comments. Summary what I have learned from the mail thread as 
below. Please feel free to correct me if any mistake.

1. The RVV VMSET has tail policy and the high bits of target register can be 
overridden to 1 or retain the value they held according to the ISA.
2. The semantics of tail policy is different with s390 according the macro 
comment " /* The truth element value for vector comparisons.  Our instructions 
always generate -1 in that case.  */ ".
3. We still have a lot of work to do for the RISC-V besides compiler.
4. The RTL pattern of PATCH v1 models tail policy and vector length as well.

Pan

-Original Message-
From: Kito Cheng  
Sent: Sunday, April 30, 2023 9:40 AM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; Andrew Waterman 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Hi Jeff:

The RTL pattern already models tail element and vector length well, so I don't 
feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#(if_then_else:VNx2BI (unspec:VNx2BI [
#(const_vector:VNx2BI repeat [
#(const_int 1 [0x1])
#])  # all-1 mask
#(reg:DI 143)  # AVL reg, or vector length
#(const_int 2 [0x2]) # mask policy
#(const_int 0 [0])   # avl type
#(reg:SI 66 vl)
#(reg:SI 67 vtype)
#] UNSPEC_VPREDICATE)
#(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#(reg/v:VNx2QI 137 [ v1 ]))
#(unspec:VNx2BI [
#(reg:SI 0 zero)
#] UNSPEC_VUNDEF))) # maskoff and tail operand
# (expr_list:REG_DEAD (reg:DI 143)
#(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#(nil

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB  0 "register_operand")
   (if_then_else:VB
 (unspec:VB
   [(match_operand:VB 1 "vector_all_trues_mask_operand")
(match_operand4 "vector_length_operand")
(match_operand5 "const_int_operand")
(match_operand6 "const_int_operand")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 (match_operand:VB3 "vector_move_operand")
 (match_operand:VB2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since maskoff 
is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
   (if_then_else:VNx2BI (unspec:VNx2BI [
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])  # all-1 mask
   (reg:DI 143) # AVL reg, or vector length
   (const_int 2 [0x2]) # mask policy
   (reg:SI 66 vl)
   (reg:SI 67 vtype)
   ] UNSPEC_VPREDICATE)
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])# all-1
   (unspec:VNx2BI [
   (reg:SI 0 zero)
   ] UNSPEC_VUNDEF))) # still vundef
(expr_list:REG_DEAD (reg:DI 143)
   (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law  wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for 
> instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to 
> understand precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't 
> set them to all ones in this case, then that would mean that defining 
> that macro is simply wrong for risc-v.
>
> jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Kito Cheng via Gcc-patches
Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#(if_then_else:VNx2BI (unspec:VNx2BI [
#(const_vector:VNx2BI repeat [
#(const_int 1 [0x1])
#])  # all-1 mask
#(reg:DI 143)  # AVL reg, or vector length
#(const_int 2 [0x2]) # mask policy
#(const_int 0 [0])   # avl type
#(reg:SI 66 vl)
#(reg:SI 67 vtype)
#] UNSPEC_VPREDICATE)
#(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#(reg/v:VNx2QI 137 [ v1 ]))
#(unspec:VNx2BI [
#(reg:SI 0 zero)
#] UNSPEC_VUNDEF))) # maskoff and tail operand
# (expr_list:REG_DEAD (reg:DI 143)
#(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#(nil

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB  0 "register_operand")
   (if_then_else:VB
 (unspec:VB
   [(match_operand:VB 1 "vector_all_trues_mask_operand")
(match_operand4 "vector_length_operand")
(match_operand5 "const_int_operand")
(match_operand6 "const_int_operand")
(reg:SI VL_REGNUM)
(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 (match_operand:VB3 "vector_move_operand")
 (match_operand:VB2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
   (if_then_else:VNx2BI (unspec:VNx2BI [
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])  # all-1 mask
   (reg:DI 143) # AVL reg, or vector length
   (const_int 2 [0x2]) # mask policy
   (reg:SI 66 vl)
   (reg:SI 67 vtype)
   ] UNSPEC_VPREDICATE)
   (const_vector:VNx2BI repeat [
   (const_int 1 [0x1])
   ])# all-1
   (unspec:VNx2BI [
   (reg:SI 0 zero)
   ] UNSPEC_VUNDEF))) # still vundef
(expr_list:REG_DEAD (reg:DI 143)
   (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law  wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.
>
> jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt

On Sat, 29 Apr 2023 10:52:50 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 11:48, Palmer Dabbelt wrote:


Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.


I guess we're kind of off the rails for a GCC patch, but that's
definately true.  Across the board RVV is going to just need a lot of
work, it's very different than SVE or AVX.

Unfortunately QEMU performance isn't really a priority on our end, but
it's great to see folks digging into it.

Well, when a user mode SPEC run goes from ~15 minutes to multiple hours
for a single input workload within specint it becomes a development
problem.  Daniel is loosely affiliated with my group in Ventana, so I
can bug him with this kind of stuff.


We've got another team actually doing the mechanics of the SPEC runs, we 
just do the compiler.  So while I guess it is a problem, it's not my 
problem ;)


Maybe not the best way to go about things, but there's only so much that 
can be done...


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches




On 4/29/23 11:48, Palmer Dabbelt wrote:


Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.


I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.


Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.
Well, when a user mode SPEC run goes from ~15 minutes to multiple hours 
for a single input workload within specint it becomes a development 
problem.  Daniel is loosely affiliated with my group in Ventana, so I 
can bug him with this kind of stuff.


jeff




Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches




On 4/29/23 11:21, Andrew Waterman wrote:



The relevant statement in the spec is that "the tail elements are always 
updated with a tail-agnostic policy".  The vmset.m instruction will 
cause mask register bits [0, vl-1] to be set to 1; elements [vl, 
VLMAX-1] will either be undisturbed or set to 1, i.e., effectively 
unspecified.
Makes sense.  Just have to stitch together bits from different locations 
in the manual.


The net being that I can't think we can define that macro for RISC-V in 
the way that Pan wants, the semantics just don't line up correctly.


jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt

On Sat, 29 Apr 2023 10:46:37 PDT (-0700), jeffreya...@gmail.com wrote:



On 4/29/23 11:28, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.    I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.


QEMU specifically takes advantage of the behavior Andrew is pointing out
it the spec, and will soon do so more aggressively (assuming the patches
Daniel just sent out get merged).

Yea.  And taking advantage of that behavior is definitely a performance
issue for QEMU.  There's still work to do though.  QEMU on vector code
is running crazy slow.


I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.


Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches




On 4/29/23 11:28, Palmer Dabbelt wrote:

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.    I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.


QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).
Yea.  And taking advantage of that behavior is definitely a performance 
issue for QEMU.  There's still work to do though.  QEMU on vector code 
is running crazy slow.


jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Palmer Dabbelt

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:




On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
>
> It makes sense to me. For the EQ operator we should have CONSTM1.
That's not the way I interpret the RVV documentation.  Of course it's
not terribly clear.I guess one could do some experiments with qemu
or try to dig into the sail code and figure out the intent from those.


QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).



Does this mean s390 parts has similar issue here? Then for instructions
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand
precisely how the vector compares work.

But as it stands this really isn't a simplify-rtx question, but a
question of the semantics of risc-v.   What happens with the high bits
in the destination mask register is critical -- and if risc-v doesn't
set them to all ones in this case, then that would mean that defining
that macro is simply wrong for risc-v.


The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.



jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Andrew Waterman via Gcc-patches
On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

>
> jeff


Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Jeff Law via Gcc-patches




On 4/28/23 20:55, Li, Pan2 wrote:

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. 
That's not the way I interpret the RVV documentation.  Of course it's 
not terribly clear.I guess one could do some experiments with qemu 
or try to dig into the sail code and figure out the intent from those.




Does this mean s390 parts has similar issue here? Then for instructions 
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand 
precisely how the vector compares work.


But as it stands this really isn't a simplify-rtx question, but a 
question of the semantics of risc-v.   What happens with the high bits 
in the destination mask register is critical -- and if risc-v doesn't 
set them to all ones in this case, then that would mean that defining 
that macro is simply wrong for risc-v.


jeff


RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Li, Pan2 via Gcc-patches
Hi Jeff

Just have a try in simplify_rtx for this optimization in PATCH v2. Could you 
please help to share any idea about this when you free? Thank you!

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617117.html

Pan

-Original Message-
From: Li, Pan2 
Sent: Saturday, April 29, 2023 10:55 AM
To: Jeff Law ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this 
mean s390 parts has similar issue here? Then for instructions like VMSEQ, we 
need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding 
is we have one output bit per input element in the comparison.  So unless the 
number of elements matches the bit width of the mask register, this isn't going 
to work.

Am I missing something?

Jeff




RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Li, Pan2 via Gcc-patches
Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this 
mean s390 parts has similar issue here? Then for instructions like VMSEQ, we 
need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.vv8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding 
is we have one output bit per input element in the comparison.  So unless the 
number of elements matches the bit width of the mask register, this isn't going 
to work.

Am I missing something?

Jeff




Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Jeff Law via Gcc-patches




On 4/28/23 09:21, Pan Li via Gcc-patches wrote:

From: Pan Li 

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
   return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
  Adjust test check condition.

I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My 
understanding is we have one output bit per input element in the 
comparison.  So unless the number of elements matches the bit width of 
the mask register, this isn't going to work.


Am I missing something?

Jeff




[PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-28 Thread Pan Li via Gcc-patches
From: Pan Li 

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.vv8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
  Adjust test check condition.
---
 gcc/config/riscv/riscv.h| 5 +
 .../riscv/rvv/base/integer_compare_insn_shortcut.c  | 6 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..4473115d3a9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1096,4 +1096,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
 
+/* Like s390, riscv also defined this macro for the vector comparision.  Then
+   the simplify-rtx relational_result will canonicalize the result to the
+   CONST1_RTX for the simplification.  */
+#define VECTOR_STORE_FLAG_VALUE(MODE) CONSTM1_RTX (GET_MODE_INNER (MODE))
+
 #endif /* ! GCC_RISCV_H */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@ vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t 
v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 
} } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 
7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */
-- 
2.34.1