Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode
On Mon, Jun 5, 2023 at 8:15 AM Max Filippov wrote: > > Hi Suwa-san, > > On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa > wrote: > > > > This patch optimizes the boolean evaluation of EQ/NE against zero > > by adding two insn_and_split patterns similar to SImode conditional > > store: > > > > "eq_zero": > > op0 = (op1 == 0) ? 1 : 0; > > op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */ > > > > "movsicc_ne0_reg_0": > > op0 = (op1 != 0) ? op2 : 0; > > op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */ > > > > /* example #1 */ > > int bool_eqSI(int x) { > > return x == 0; > > } > > int bool_neSI(int x) { > > return x != 0; > > } > > > > ;; after (TARGET_NSA) > > bool_eqSI: > > nsaua2, a2 > > srlia2, a2, 5 > > ret.n > > bool_neSI: > > mov.n a9, a2 > > movi.n a2, 1 > > moveqz a2, a9, a9 > > ret.n > > > > These also work in SFmode by ignoring their sign bits, and further- > > more, the branch if EQ/NE against zero in SFmode is also done in the > > same manner. > > > > The reasons for this optimization in SFmode are: > > > > - Only zero values (negative or non-negative) contain no bits of 1 > > with both the exponent and the mantissa. > > - EQ/NE comparisons involving NaNs produce no signal even if they > > are signaling. > > - Even if the use of IEEE 754 single-precision floating-point co- > > processor is configured (TARGET_HARD_FLOAT is true): > > 1. Load zero value to FP register > > 2. Possibly, additional FP move if the comparison target is > >an address register > > 3. FP equality check instruction > > 4. Read the boolean register containing the result, or condi- > >tional branch > > As noted above, a considerable number of instructions are still > > generated. > > > > /* example #2 */ > > int bool_eqSF(float x) { > > return x == 0; > > } > > int bool_neSF(float x) { > > return x != 0; > > } > > int bool_ltSF(float x) { > > return x < 0; > > } > > extern void foo(void); > > void cb_eqSF(float x) { > > if(x != 0) > > foo(); > > } > > void cb_neSF(float x) { > > if(x == 0) > > foo(); > > } > > void cb_geSF(float x) { > > if(x < 0) > > foo(); > > } > > > > ;; after > > ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT) > > bool_eqSF: > > add.n a2, a2, a2 > > nsaua2, a2 > > srlia2, a2, 5 > > ret.n > > bool_neSF: > > add.n a9, a2, a2 > > movi.n a2, 1 > > moveqz a2, a9, a9 > > ret.n > > bool_ltSF: > > movi.n a9, 0 > > wfr f0, a2 > > wfr f1, a9 > > olt.s b0, f0, f1 > > movi.n a9, 0 > > movi.n a2, 1 > > movfa2, a9, b0 > > ret.n > > cb_eqSF: > > add.n a2, a2, a2 > > beqz.n a2, .L6 > > j.l foo, a9 > > .L6: > > ret.n > > cb_neSF: > > add.n a2, a2, a2 > > bnez.n a2, .L8 > > j.l foo, a9 > > .L8: > > ret.n > > cb_geSF: > > addisp, sp, -16 > > movi.n a3, 0 > > s32i.n a12, sp, 8 > > s32i.n a0, sp, 12 > > mov.n a12, a2 > > call0 __unordsf2 > > bnez.n a2, .L10 > > movi.n a3, 0 > > mov.n a2, a12 > > call0 __gesf2 > > bneia2, -1, .L10 > > l32i.n a0, sp, 12 > > l32i.n a12, sp, 8 > > addisp, sp, 16 > > j.l foo, a9 > > .L10: > > l32i.n a0, sp, 12 > > l32i.n a12, sp, 8 > > addisp, sp, 16 > > ret.n > > > > gcc/ChangeLog: > > > > * config/xtensa/predicates.md (const_float_0_operand): > > Rename from obsolete "const_float_1_operand" and change the > > constant to compare. > > (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator): > > New. > > * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch): > > Add code for EQ/NE comparison with constant zero in SFmode. > > (xtensa_expand_scc): Added code to derive boolean evaluation > > of EQ/NE with constant zero for comparison in SFmode. > > (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value > > zero inside "cbranchsf4" to 0. > > * config/xtensa/xtensa.md (cbranchsf4, cstoresf4): > > Change "match_operator" and the third "match_operand" to the > > ones mentioned above. > > (movsicc_ne0_reg_zero, eq_zero): New. > > --- > > gcc/config/xtensa/predicates.md | 17 +-- > > gcc/config/xtensa/xtensa.cc | 45 > > gcc/config/xtensa/xtensa.md | 53 + > > 3 file
Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode
On 2023/06/06 0:15, Max Filippov wrote: > Hi Suwa-san, Hi! Thanks for your regtest every time. > > On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa > wrote: >> >> This patch optimizes the boolean evaluation of EQ/NE against zero >> by adding two insn_and_split patterns similar to SImode conditional >> store: >> >> "eq_zero": >> op0 = (op1 == 0) ? 1 : 0; >> op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */ >> >> "movsicc_ne0_reg_0": >> op0 = (op1 != 0) ? op2 : 0; >> op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */ >> >> /* example #1 */ >> int bool_eqSI(int x) { >> return x == 0; >> } >> int bool_neSI(int x) { >> return x != 0; >> } >> >> ;; after (TARGET_NSA) >> bool_eqSI: >> nsaua2, a2 >> srlia2, a2, 5 >> ret.n >> bool_neSI: >> mov.n a9, a2 >> movi.n a2, 1 >> moveqz a2, a9, a9 >> ret.n >> >> These also work in SFmode by ignoring their sign bits, and further- >> more, the branch if EQ/NE against zero in SFmode is also done in the >> same manner. >> >> The reasons for this optimization in SFmode are: >> >> - Only zero values (negative or non-negative) contain no bits of 1 >> with both the exponent and the mantissa. >> - EQ/NE comparisons involving NaNs produce no signal even if they >> are signaling. >> - Even if the use of IEEE 754 single-precision floating-point co- >> processor is configured (TARGET_HARD_FLOAT is true): >> 1. Load zero value to FP register >> 2. Possibly, additional FP move if the comparison target is >>an address register >> 3. FP equality check instruction >> 4. Read the boolean register containing the result, or condi- >>tional branch >> As noted above, a considerable number of instructions are still >> generated. >> >> /* example #2 */ >> int bool_eqSF(float x) { >> return x == 0; >> } >> int bool_neSF(float x) { >> return x != 0; >> } >> int bool_ltSF(float x) { >> return x < 0; >> } >> extern void foo(void); >> void cb_eqSF(float x) { >> if(x != 0) >> foo(); >> } >> void cb_neSF(float x) { >> if(x == 0) >> foo(); >> } >> void cb_geSF(float x) { >> if(x < 0) >> foo(); >> } >> >> ;; after >> ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT) >> bool_eqSF: >> add.n a2, a2, a2 >> nsaua2, a2 >> srlia2, a2, 5 >> ret.n >> bool_neSF: >> add.n a9, a2, a2 >> movi.n a2, 1 >> moveqz a2, a9, a9 >> ret.n >> bool_ltSF: >> movi.n a9, 0 >> wfr f0, a2 >> wfr f1, a9 >> olt.s b0, f0, f1 >> movi.n a9, 0 >> movi.n a2, 1 >> movfa2, a9, b0 >> ret.n >> cb_eqSF: >> add.n a2, a2, a2 >> beqz.n a2, .L6 >> j.l foo, a9 >> .L6: >> ret.n >> cb_neSF: >> add.n a2, a2, a2 >> bnez.n a2, .L8 >> j.l foo, a9 >> .L8: >> ret.n >> cb_geSF: >> addisp, sp, -16 >> movi.n a3, 0 >> s32i.n a12, sp, 8 >> s32i.n a0, sp, 12 >> mov.n a12, a2 >> call0 __unordsf2 >> bnez.n a2, .L10 >> movi.n a3, 0 >> mov.n a2, a12 >> call0 __gesf2 >> bneia2, -1, .L10 >> l32i.n a0, sp, 12 >> l32i.n a12, sp, 8 >> addisp, sp, 16 >> j.l foo, a9 >> .L10: >> l32i.n a0, sp, 12 >> l32i.n a12, sp, 8 >> addisp, sp, 16 >> ret.n >> >> gcc/ChangeLog: >> >> * config/xtensa/predicates.md (const_float_0_operand): >> Rename from obsolete "const_float_1_operand" and change the >> constant to compare. >> (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator): >> New. >> * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch): >> Add code for EQ/NE comparison with constant zero in SFmode. >> (xtensa_expand_scc): Added code to derive boolean evaluation >> of EQ/NE with constant zero for comparison in SFmode. >> (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value >> zero inside "cbranchsf4" to 0. >> * config/xtensa/xtensa.md (cbranchsf4, cstoresf4): >> Change "match_operator" and the third "match_operand" to the >> ones mentioned above. >> (movsicc_ne0_reg_zero, eq_zero): New. >> --- >> gcc/config/xtensa/predicates.md | 17 +-- >> gcc/config/xtensa/xtensa.cc | 45 >> gcc/config/xtensa/xtensa.md | 53 + >> 3 files changed, 106 insertions(+), 9 deletions(-) > > This version performs much better than v1, but there's still new > testsuit
Re: [PATCH v2] xtensa: Optimize boolean evaluation or branching when EQ/NE to zero in S[IF]mode
Hi Suwa-san, On Mon, Jun 5, 2023 at 2:37 AM Takayuki 'January June' Suwa wrote: > > This patch optimizes the boolean evaluation of EQ/NE against zero > by adding two insn_and_split patterns similar to SImode conditional > store: > > "eq_zero": > op0 = (op1 == 0) ? 1 : 0; > op0 = clz(op1) >> 5; /* optimized (requires TARGET_NSA) */ > > "movsicc_ne0_reg_0": > op0 = (op1 != 0) ? op2 : 0; > op0 = op2; if (op1 == 0) ? op0 = op1; /* optimized */ > > /* example #1 */ > int bool_eqSI(int x) { > return x == 0; > } > int bool_neSI(int x) { > return x != 0; > } > > ;; after (TARGET_NSA) > bool_eqSI: > nsaua2, a2 > srlia2, a2, 5 > ret.n > bool_neSI: > mov.n a9, a2 > movi.n a2, 1 > moveqz a2, a9, a9 > ret.n > > These also work in SFmode by ignoring their sign bits, and further- > more, the branch if EQ/NE against zero in SFmode is also done in the > same manner. > > The reasons for this optimization in SFmode are: > > - Only zero values (negative or non-negative) contain no bits of 1 > with both the exponent and the mantissa. > - EQ/NE comparisons involving NaNs produce no signal even if they > are signaling. > - Even if the use of IEEE 754 single-precision floating-point co- > processor is configured (TARGET_HARD_FLOAT is true): > 1. Load zero value to FP register > 2. Possibly, additional FP move if the comparison target is >an address register > 3. FP equality check instruction > 4. Read the boolean register containing the result, or condi- >tional branch > As noted above, a considerable number of instructions are still > generated. > > /* example #2 */ > int bool_eqSF(float x) { > return x == 0; > } > int bool_neSF(float x) { > return x != 0; > } > int bool_ltSF(float x) { > return x < 0; > } > extern void foo(void); > void cb_eqSF(float x) { > if(x != 0) > foo(); > } > void cb_neSF(float x) { > if(x == 0) > foo(); > } > void cb_geSF(float x) { > if(x < 0) > foo(); > } > > ;; after > ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT) > bool_eqSF: > add.n a2, a2, a2 > nsaua2, a2 > srlia2, a2, 5 > ret.n > bool_neSF: > add.n a9, a2, a2 > movi.n a2, 1 > moveqz a2, a9, a9 > ret.n > bool_ltSF: > movi.n a9, 0 > wfr f0, a2 > wfr f1, a9 > olt.s b0, f0, f1 > movi.n a9, 0 > movi.n a2, 1 > movfa2, a9, b0 > ret.n > cb_eqSF: > add.n a2, a2, a2 > beqz.n a2, .L6 > j.l foo, a9 > .L6: > ret.n > cb_neSF: > add.n a2, a2, a2 > bnez.n a2, .L8 > j.l foo, a9 > .L8: > ret.n > cb_geSF: > addisp, sp, -16 > movi.n a3, 0 > s32i.n a12, sp, 8 > s32i.n a0, sp, 12 > mov.n a12, a2 > call0 __unordsf2 > bnez.n a2, .L10 > movi.n a3, 0 > mov.n a2, a12 > call0 __gesf2 > bneia2, -1, .L10 > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > addisp, sp, 16 > j.l foo, a9 > .L10: > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > addisp, sp, 16 > ret.n > > gcc/ChangeLog: > > * config/xtensa/predicates.md (const_float_0_operand): > Rename from obsolete "const_float_1_operand" and change the > constant to compare. > (cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator): > New. > * config/xtensa/xtensa.cc (xtensa_expand_conditional_branch): > Add code for EQ/NE comparison with constant zero in SFmode. > (xtensa_expand_scc): Added code to derive boolean evaluation > of EQ/NE with constant zero for comparison in SFmode. > (xtensa_rtx_costs): Change cost of CONST_DOUBLE with value > zero inside "cbranchsf4" to 0. > * config/xtensa/xtensa.md (cbranchsf4, cstoresf4): > Change "match_operator" and the third "match_operand" to the > ones mentioned above. > (movsicc_ne0_reg_zero, eq_zero): New. > --- > gcc/config/xtensa/predicates.md | 17 +-- > gcc/config/xtensa/xtensa.cc | 45 > gcc/config/xtensa/xtensa.md | 53 + > 3 files changed, 106 insertions(+), 9 deletions(-) This version performs much better than v1, but there's still new testsuite failure in the gcc.c-torture/execute/bitfld-3.c and the following change in the generated code from: l32i.n a11, a7, 8 l8uia9, a7, 12 movia10, 0xff add.n a9, a9, a10 addi.n a7, a11, -1 movi.n