Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Max Filippov via Gcc-patches
On Mon, Jun 5, 2023 at 8:15 AM Max Filippov  wrote:
>
> Hi Suwa-san,
>
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
>  wrote:
> >
> > This patch optimizes the boolean evaluation of EQ/NE against zero
> > by adding two insn_and_split patterns similar to SImode conditional
> > store:
> >
> > "eq_zero":
> > op0 = (op1 == 0) ? 1 : 0;
> > op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
> >
> > "movsicc_ne0_reg_0":
> > op0 = (op1 != 0) ? op2 : 0;
> > op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
> >
> > /* example #1 */
> > int bool_eqSI(int x) {
> >   return x == 0;
> > }
> > int bool_neSI(int x) {
> >   return x != 0;
> > }
> >
> > ;; after (TARGET_NSA)
> > bool_eqSI:
> > nsaua2, a2
> > srlia2, a2, 5
> > ret.n
> > bool_neSI:
> > mov.n   a9, a2
> > movi.n  a2, 1
> > moveqz  a2, a9, a9
> > ret.n
> >
> > These also work in SFmode by ignoring their sign bits, and further-
> > more, the branch if EQ/NE against zero in SFmode is also done in the
> > same manner.
> >
> > The reasons for this optimization in SFmode are:
> >
> >   - Only zero values (negative or non-negative) contain no bits of 1
> > with both the exponent and the mantissa.
> >   - EQ/NE comparisons involving NaNs produce no signal even if they
> > are signaling.
> >   - Even if the use of IEEE 754 single-precision floating-point co-
> > processor is configured (TARGET_HARD_FLOAT is true):
> > 1. Load zero value to FP register
> > 2. Possibly, additional FP move if the comparison target is
> >an address register
> > 3. FP equality check instruction
> > 4. Read the boolean register containing the result, or condi-
> >tional branch
> > As noted above, a considerable number of instructions are still
> > generated.
> >
> > /* example #2 */
> > int bool_eqSF(float x) {
> >   return x == 0;
> > }
> > int bool_neSF(float x) {
> >   return x != 0;
> > }
> > int bool_ltSF(float x) {
> >   return x < 0;
> > }
> > extern void foo(void);
> > void cb_eqSF(float x) {
> >   if(x != 0)
> > foo();
> > }
> > void cb_neSF(float x) {
> >   if(x == 0)
> > foo();
> > }
> > void cb_geSF(float x) {
> >   if(x < 0)
> > foo();
> > }
> >
> > ;; after
> > ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> > bool_eqSF:
> > add.n   a2, a2, a2
> > nsaua2, a2
> > srlia2, a2, 5
> > ret.n
> > bool_neSF:
> > add.n   a9, a2, a2
> > movi.n  a2, 1
> > moveqz  a2, a9, a9
> > ret.n
> > bool_ltSF:
> > movi.n  a9, 0
> > wfr f0, a2
> > wfr f1, a9
> > olt.s   b0, f0, f1
> > movi.n  a9, 0
> > movi.n  a2, 1
> > movfa2, a9, b0
> > ret.n
> > cb_eqSF:
> > add.n   a2, a2, a2
> > beqz.n  a2, .L6
> > j.l foo, a9
> > .L6:
> > ret.n
> > cb_neSF:
> > add.n   a2, a2, a2
> > bnez.n  a2, .L8
> > j.l foo, a9
> > .L8:
> > ret.n
> > cb_geSF:
> > addisp, sp, -16
> > movi.n  a3, 0
> > s32i.n  a12, sp, 8
> > s32i.n  a0, sp, 12
> > mov.n   a12, a2
> > call0   __unordsf2
> > bnez.n  a2, .L10
> > movi.n  a3, 0
> > mov.n   a2, a12
> > call0   __gesf2
> > bneia2, -1, .L10
> > l32i.n  a0, sp, 12
> > l32i.n  a12, sp, 8
> > addisp, sp, 16
> > j.l foo, a9
> > .L10:
> > l32i.n  a0, sp, 12
> > l32i.n  a12, sp, 8
> > addisp, sp, 16
> > ret.n
> >
> > gcc/ChangeLog:
> >
> > * config/xtensa/predicates.md (const_float_0_operand):
> > Rename from obsolete "const_float_1_operand" and change the
> > constant to compare.
> > (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> > New.
> > * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> > Add code for EQ/NE comparison with constant zero in SFmode.
> > (xtensa_expand_scc): Added code to derive boolean evaluation
> > of EQ/NE with constant zero for comparison in SFmode.
> > (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> > zero inside "cbranchsf4" to 0.
> > * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> > Change "match_operator" and the third "match_operand" to the
> > ones mentioned above.
> > (movsicc_ne0_reg_zero, eq_zero): New.
> > ---
> >  gcc/config/xtensa/predicates.md | 17 +--
> >  gcc/config/xtensa/xtensa.cc | 45 
> >  gcc/config/xtensa/xtensa.md | 53 +
> >  3 file

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Takayuki 'January June' Suwa via Gcc-patches
On 2023/06/06 0:15, Max Filippov wrote:
> Hi Suwa-san,
Hi!  Thanks for your regtest every time.

> 
> On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
>  wrote:
>>
>> This patch optimizes the boolean evaluation of EQ/NE against zero
>> by adding two insn_and_split patterns similar to SImode conditional
>> store:
>>
>> "eq_zero":
>> op0 = (op1 == 0) ? 1 : 0;
>> op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
>>
>> "movsicc_ne0_reg_0":
>> op0 = (op1 != 0) ? op2 : 0;
>> op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
>>
>> /* example #1 */
>> int bool_eqSI(int x) {
>>   return x == 0;
>> }
>> int bool_neSI(int x) {
>>   return x != 0;
>> }
>>
>> ;; after (TARGET_NSA)
>> bool_eqSI:
>> nsaua2, a2
>> srlia2, a2, 5
>> ret.n
>> bool_neSI:
>> mov.n   a9, a2
>> movi.n  a2, 1
>> moveqz  a2, a9, a9
>> ret.n
>>
>> These also work in SFmode by ignoring their sign bits, and further-
>> more, the branch if EQ/NE against zero in SFmode is also done in the
>> same manner.
>>
>> The reasons for this optimization in SFmode are:
>>
>>   - Only zero values (negative or non-negative) contain no bits of 1
>> with both the exponent and the mantissa.
>>   - EQ/NE comparisons involving NaNs produce no signal even if they
>> are signaling.
>>   - Even if the use of IEEE 754 single-precision floating-point co-
>> processor is configured (TARGET_HARD_FLOAT is true):
>> 1. Load zero value to FP register
>> 2. Possibly, additional FP move if the comparison target is
>>an address register
>> 3. FP equality check instruction
>> 4. Read the boolean register containing the result, or condi-
>>tional branch
>> As noted above, a considerable number of instructions are still
>> generated.
>>
>> /* example #2 */
>> int bool_eqSF(float x) {
>>   return x == 0;
>> }
>> int bool_neSF(float x) {
>>   return x != 0;
>> }
>> int bool_ltSF(float x) {
>>   return x < 0;
>> }
>> extern void foo(void);
>> void cb_eqSF(float x) {
>>   if(x != 0)
>> foo();
>> }
>> void cb_neSF(float x) {
>>   if(x == 0)
>> foo();
>> }
>> void cb_geSF(float x) {
>>   if(x < 0)
>> foo();
>> }
>>
>> ;; after
>> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
>> bool_eqSF:
>> add.n   a2, a2, a2
>> nsaua2, a2
>> srlia2, a2, 5
>> ret.n
>> bool_neSF:
>> add.n   a9, a2, a2
>> movi.n  a2, 1
>> moveqz  a2, a9, a9
>> ret.n
>> bool_ltSF:
>> movi.n  a9, 0
>> wfr f0, a2
>> wfr f1, a9
>> olt.s   b0, f0, f1
>> movi.n  a9, 0
>> movi.n  a2, 1
>> movfa2, a9, b0
>> ret.n
>> cb_eqSF:
>> add.n   a2, a2, a2
>> beqz.n  a2, .L6
>> j.l foo, a9
>> .L6:
>> ret.n
>> cb_neSF:
>> add.n   a2, a2, a2
>> bnez.n  a2, .L8
>> j.l foo, a9
>> .L8:
>> ret.n
>> cb_geSF:
>> addisp, sp, -16
>> movi.n  a3, 0
>> s32i.n  a12, sp, 8
>> s32i.n  a0, sp, 12
>> mov.n   a12, a2
>> call0   __unordsf2
>> bnez.n  a2, .L10
>> movi.n  a3, 0
>> mov.n   a2, a12
>> call0   __gesf2
>> bneia2, -1, .L10
>> l32i.n  a0, sp, 12
>> l32i.n  a12, sp, 8
>> addisp, sp, 16
>> j.l foo, a9
>> .L10:
>> l32i.n  a0, sp, 12
>> l32i.n  a12, sp, 8
>> addisp, sp, 16
>> ret.n
>>
>> gcc/ChangeLog:
>>
>> * config/xtensa/predicates.md (const_float_0_operand):
>> Rename from obsolete "const_float_1_operand" and change the
>> constant to compare.
>> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
>> New.
>> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
>> Add code for EQ/NE comparison with constant zero in SFmode.
>> (xtensa_expand_scc): Added code to derive boolean evaluation
>> of EQ/NE with constant zero for comparison in SFmode.
>> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
>> zero inside "cbranchsf4" to 0.
>> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
>> Change "match_operator" and the third "match_operand" to the
>> ones mentioned above.
>> (movsicc_ne0_reg_zero, eq_zero): New.
>> ---
>>  gcc/config/xtensa/predicates.md | 17 +--
>>  gcc/config/xtensa/xtensa.cc | 45 
>>  gcc/config/xtensa/xtensa.md | 53 +
>>  3 files changed, 106 insertions(+), 9 deletions(-)
> 
> This version performs much better than v1, but there's still new
> testsuit

Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode

2023-06-05 Thread Max Filippov via Gcc-patches
Hi Suwa-san,

On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch optimizes the boolean evaluation of EQ/NE against zero
> by adding two insn_and_split patterns similar to SImode conditional
> store:
>
> "eq_zero":
> op0 = (op1 == 0) ? 1 : 0;
> op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */
>
> "movsicc_ne0_reg_0":
> op0 = (op1 != 0) ? op2 : 0;
> op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */
>
> /* example #1 */
> int bool_eqSI(int x) {
>   return x == 0;
> }
> int bool_neSI(int x) {
>   return x != 0;
> }
>
> ;; after (TARGET_NSA)
> bool_eqSI:
> nsaua2, a2
> srlia2, a2, 5
> ret.n
> bool_neSI:
> mov.n   a9, a2
> movi.n  a2, 1
> moveqz  a2, a9, a9
> ret.n
>
> These also work in SFmode by ignoring their sign bits, and further-
> more, the branch if EQ/NE against zero in SFmode is also done in the
> same manner.
>
> The reasons for this optimization in SFmode are:
>
>   - Only zero values (negative or non-negative) contain no bits of 1
> with both the exponent and the mantissa.
>   - EQ/NE comparisons involving NaNs produce no signal even if they
> are signaling.
>   - Even if the use of IEEE 754 single-precision floating-point co-
> processor is configured (TARGET_HARD_FLOAT is true):
> 1. Load zero value to FP register
> 2. Possibly, additional FP move if the comparison target is
>an address register
> 3. FP equality check instruction
> 4. Read the boolean register containing the result, or condi-
>tional branch
> As noted above, a considerable number of instructions are still
> generated.
>
> /* example #2 */
> int bool_eqSF(float x) {
>   return x == 0;
> }
> int bool_neSF(float x) {
>   return x != 0;
> }
> int bool_ltSF(float x) {
>   return x < 0;
> }
> extern void foo(void);
> void cb_eqSF(float x) {
>   if(x != 0)
> foo();
> }
> void cb_neSF(float x) {
>   if(x == 0)
> foo();
> }
> void cb_geSF(float x) {
>   if(x < 0)
> foo();
> }
>
> ;; after
> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
> bool_eqSF:
> add.n   a2, a2, a2
> nsaua2, a2
> srlia2, a2, 5
> ret.n
> bool_neSF:
> add.n   a9, a2, a2
> movi.n  a2, 1
> moveqz  a2, a9, a9
> ret.n
> bool_ltSF:
> movi.n  a9, 0
> wfr f0, a2
> wfr f1, a9
> olt.s   b0, f0, f1
> movi.n  a9, 0
> movi.n  a2, 1
> movfa2, a9, b0
> ret.n
> cb_eqSF:
> add.n   a2, a2, a2
> beqz.n  a2, .L6
> j.l foo, a9
> .L6:
> ret.n
> cb_neSF:
> add.n   a2, a2, a2
> bnez.n  a2, .L8
> j.l foo, a9
> .L8:
> ret.n
> cb_geSF:
> addisp, sp, -16
> movi.n  a3, 0
> s32i.n  a12, sp, 8
> s32i.n  a0, sp, 12
> mov.n   a12, a2
> call0   __unordsf2
> bnez.n  a2, .L10
> movi.n  a3, 0
> mov.n   a2, a12
> call0   __gesf2
> bneia2, -1, .L10
> l32i.n  a0, sp, 12
> l32i.n  a12, sp, 8
> addisp, sp, 16
> j.l foo, a9
> .L10:
> l32i.n  a0, sp, 12
> l32i.n  a12, sp, 8
> addisp, sp, 16
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (const_float_0_operand):
> Rename from obsolete "const_float_1_operand" and change the
> constant to compare.
> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
> New.
> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
> Add code for EQ/NE comparison with constant zero in SFmode.
> (xtensa_expand_scc): Added code to derive boolean evaluation
> of EQ/NE with constant zero for comparison in SFmode.
> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
> zero inside "cbranchsf4" to 0.
> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
> Change "match_operator" and the third "match_operand" to the
> ones mentioned above.
> (movsicc_ne0_reg_zero, eq_zero): New.
> ---
>  gcc/config/xtensa/predicates.md | 17 +--
>  gcc/config/xtensa/xtensa.cc | 45 
>  gcc/config/xtensa/xtensa.md | 53 +
>  3 files changed, 106 insertions(+), 9 deletions(-)

This version performs much better than v1, but there's still new
testsuite failure in the gcc.c-torture/execute/bitfld-3.c
and the following change in the generated code
from:

   l32i.n  a11, a7, 8
   l8uia9, a7, 12
   movia10, 0xff
   add.n   a9, a9, a10
   addi.n  a7, a11, -1
   movi.n