Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Missing CC to Jeff Law. Sorry. 在 2024/6/12 10:41, HAO CHEN GUI 写道: > Hi, > This patch replaces rtx_cost with insn_cost in forward propagation. > In the PR, one constant vector should be propagated and replace a > pseudo in a store insn if we know it's a duplicated constant vector. > It reduces the insn cost but not rtx cost. In this case, the cost is > determined by destination operand (memory or pseudo). Unfortunately, > rtx cost can't help. > > The test case is added in the second rs6000 specific patch. > > Compared to previous version, the main changes are: > 1. Invoke change_is_worthwhile to judge if the cost is reduced and > the replacement is worthwhile. > 2. Invalidate recog data before getting the insn cost for the new > rtl as insn cost might call extract_constrain_insn_cached and > extract_insn_cached to cache the recog data. The cache data is > invalid for the new rtl and it causes ICE. > 3. Check if the insn cost of new rtl is zero which means unknown > cost. The replacement should be rejected at this situation. > > Previous version > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html > > The patch causes a regression cases on i386 as the pattern cost > regulation has a bug. Please refer the patch and discussion here. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > ChangeLog > fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile > > gcc/ > * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile > to judge if a replacement is worthwhile. > * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Invalidate > recog data before getting the insn cost for the new rtl. Check if > the insn cost of new rtl is unknown and fail the replacement. > > patch.diff > diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc > index de543923b92..975de0eec7f 100644 > --- a/gcc/fwprop.cc > +++ b/gcc/fwprop.cc > @@ -471,29 +471,19 @@ try_fwprop_subst_pattern (obstack_watermark , > insn_change _change, >redo_changes (0); > } > > - /* ??? In theory, it should be better to use insn costs rather than > - set_src_costs here. That would involve replacing this code with > - change_is_worthwhile. */ >bool ok = recog (attempt, use_change); > - if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) > -if (rtx use_set = single_set (use_rtl)) > - { > - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); > - temporarily_undo_changes (0); > - auto old_cost = set_src_cost (SET_SRC (use_set), > - GET_MODE (SET_DEST (use_set)), speed); > - redo_changes (0); > - auto new_cost = set_src_cost (SET_SRC (use_set), > - GET_MODE (SET_DEST (use_set)), speed); > - if (new_cost > old_cost > - || (new_cost == old_cost && !prop.likely_profitable_p ())) > - { > - if (dump_file) > - fprintf (dump_file, "change not profitable" > -" (cost %d -> cost %d)\n", old_cost, new_cost); > - ok = false; > - } > - } > + if (ok && !prop.changed_mem_p () && !use_insn->is_asm () > + && single_set (use_rtl)) > +{ > + if (!change_is_worthwhile (use_change, false) > + || (!prop.likely_profitable_p () > + && !change_is_worthwhile (use_change, true))) > + { > + if (dump_file) > + fprintf (dump_file, "change not profitable"); > + ok = false; > + } > +} > >if (!ok) > { > diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc > index 11639e81bb7..9bad6c2070c 100644 > --- a/gcc/rtl-ssa/changes.cc > +++ b/gcc/rtl-ssa/changes.cc > @@ -185,7 +185,18 @@ rtl_ssa::changes_are_worthwhile (array_slice *const> changes, > * change->old_cost ()); >if (!change->is_deletion ()) > { > + /* Invalidate recog data as insn_cost may call > + extract_insn_cached. */ > + INSN_CODE (change->rtl ()) = -1; > change->new_cost = insn_cost (change->rtl (), for_speed); > + /* If the cost is unknown, replacement is not worthwhile. */ > + if (!change->new_cost) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Reject replacement due to unknown insn cost.\n"); > + return false; > + } > new_cost += change->new_cost; > if (for_speed) > weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)
[Patch-2v2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the optimization. The patch depends on the first generic patch which uses insn cost in fwprop. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html Compared to previous version, the main change is to remove the predict and put the check in insn condition and gcc assertion. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store gcc/ PR target/113325 * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..89eb32a0758 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3368,6 +3368,32 @@ (define_insn "*vsx_stxvd2x4_le_" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) +(define_insn_and_split "vsx_stxvd2x4_le_const_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (match_operand:VSX_W 1 "immediate_operand" "W"))] + "!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR + && const_vec_duplicate_p (operands[1])" + "#" + "&& 1" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (vec_select:VSX_W + (match_dup 2) + (parallel [(const_int 2) (const_int 3) +(const_int 0) (const_int 1)])))] +{ + /* Here all the constants must be loaded without memory. */ + gcc_assert (easy_altivec_constant (operands[1], mode)); + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) +: operands[1]; + +} + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + (define_insn "*vsx_stxvd2x8_le_V8HI" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (vec_select:V8HI diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..3ca1fcbc9ba --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
[PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the cost is determined by destination operand (memory or pseudo). Unfortunately, rtx cost can't help. The test case is added in the second rs6000 specific patch. Compared to previous version, the main changes are: 1. Invoke change_is_worthwhile to judge if the cost is reduced and the replacement is worthwhile. 2. Invalidate recog data before getting the insn cost for the new rtl as insn cost might call extract_constrain_insn_cached and extract_insn_cached to cache the recog data. The cache data is invalid for the new rtl and it causes ICE. 3. Check if the insn cost of new rtl is zero which means unknown cost. The replacement should be rejected at this situation. Previous version https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html The patch causes a regression cases on i386 as the pattern cost regulation has a bug. Please refer the patch and discussion here. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? ChangeLog fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile gcc/ * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile to judge if a replacement is worthwhile. * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Invalidate recog data before getting the insn cost for the new rtl. Check if the insn cost of new rtl is unknown and fail the replacement. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index de543923b92..975de0eec7f 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -471,29 +471,19 @@ try_fwprop_subst_pattern (obstack_watermark , insn_change _change, redo_changes (0); } - /* ??? In theory, it should be better to use insn costs rather than - set_src_costs here. That would involve replacing this code with - change_is_worthwhile. */ bool ok = recog (attempt, use_change); - if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) -if (rtx use_set = single_set (use_rtl)) - { - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); - temporarily_undo_changes (0); - auto old_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); - redo_changes (0); - auto new_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); - if (new_cost > old_cost - || (new_cost == old_cost && !prop.likely_profitable_p ())) - { - if (dump_file) - fprintf (dump_file, "change not profitable" - " (cost %d -> cost %d)\n", old_cost, new_cost); - ok = false; - } - } + if (ok && !prop.changed_mem_p () && !use_insn->is_asm () + && single_set (use_rtl)) +{ + if (!change_is_worthwhile (use_change, false) + || (!prop.likely_profitable_p () + && !change_is_worthwhile (use_change, true))) + { + if (dump_file) + fprintf (dump_file, "change not profitable"); + ok = false; + } +} if (!ok) { diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc index 11639e81bb7..9bad6c2070c 100644 --- a/gcc/rtl-ssa/changes.cc +++ b/gcc/rtl-ssa/changes.cc @@ -185,7 +185,18 @@ rtl_ssa::changes_are_worthwhile (array_slice changes, * change->old_cost ()); if (!change->is_deletion ()) { + /* Invalidate recog data as insn_cost may call +extract_insn_cached. */ + INSN_CODE (change->rtl ()) = -1; change->new_cost = insn_cost (change->rtl (), for_speed); + /* If the cost is unknown, replacement is not worthwhile. */ + if (!change->new_cost) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, +"Reject replacement due to unknown insn cost.\n"); + return false; + } new_cost += change->new_cost; if (for_speed) weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)
Re: [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi Kewen, 在 2024/6/5 17:00, Kewen.Lin 写道: > This predicate can be moved to its only use (define_insn part condition). > The const_vector match_code check is redundant as const_vec_duplicate_p > already checks that, I wonder if we really need easy_altivec_constant? > Even if one vector constant doesn't meet easy_altivec_constant, but if > it matches the desired duplicated pattern, it doesn't need the swapping > either, no? Thanks for your comments. I think we need easy_altivec_constant as the constant will be directly moved to a vector register after split. It might fail if it's not a easy alitvec constant? [(set (match_dup 2) (match_dup 1)) Thanks Gui Haochen
Ping [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi, Gently ping the patch. https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html Thanks Gui Haochen 在 2024/1/26 9:17, HAO CHEN GUI 写道: > Hi, > This patch creates an insn_and_split pattern which helps the duplicated > constant vector replace the source pseudo of store insn in fwprop pass. > Thus the store can be implemented by a single stxvd2x and it eliminates the > unnecessary byte swap insn on P8 LE. The test case shows the optimization. > > The patch depends on the first generic patch which uses insn cost in fwprop. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. > > Thanks > Gui Haochen > > > ChangeLog > rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store > > gcc/ > PR target/113325 > * config/rs6000/predicates.md (duplicate_easy_altivec_constant): New. > * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New. > > gcc/testsuite/ > PR target/113325 > * gcc.target/powerpc/pr113325.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index ef7d3f214c4..8ab6db630b7 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant" >return false; > }) > > +;; Return 1 if it's a duplicated easy_altivec_constant. > +(define_predicate "duplicate_easy_altivec_constant" > + (and (match_code "const_vector") > + (match_test "easy_altivec_constant (op, mode)")) > +{ > + return const_vec_duplicate_p (op); > +}) > + > ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF. > (define_predicate "easy_vector_constant_add_self" >(and (match_code "const_vector") > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 26fa32829af..98e4be26f64 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_" >"stxvd2x %x1,%y0" >[(set_attr "type" "vecstore")]) > > +(define_insn_and_split "vsx_stxvd2x4_le_const_" > + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") > + (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))] > + "!BYTES_BIG_ENDIAN > + && VECTOR_MEM_VSX_P (mode) > + && !TARGET_P9_VECTOR" > + "#" > + "&& 1" > + [(set (match_dup 2) > + (match_dup 1)) > + (set (match_dup 0) > + (vec_select:VSX_W > + (match_dup 2) > + (parallel [(const_int 2) (const_int 3) > + (const_int 0) (const_int 1)])))] > +{ > + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) > + : operands[1]; > + > +} > + [(set_attr "type" "vecstore") > + (set_attr "length" "8")]) > + > (define_insn "*vsx_stxvd2x8_le_V8HI" >[(set (match_operand:V8HI 0 "memory_operand" "=Z") > (vec_select:V8HI > diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c > b/gcc/testsuite/gcc.target/powerpc/pr113325.c > new file mode 100644 > index 000..dff68ac0a51 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ > + > +void* foo (void* s1) > +{ > + return __builtin_memset (s1, 0, 32); > +}
Re: [PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi Jeff, 在 2024/6/4 22:14, Jeff Law 写道: > > > On 1/25/24 6:16 PM, HAO CHEN GUI wrote: >> Hi, >> This patch replaces rtx_cost with insn_cost in forward propagation. >> In the PR, one constant vector should be propagated and replace a >> pseudo in a store insn if we know it's a duplicated constant vector. >> It reduces the insn cost but not rtx cost. In this case, the kind of >> destination operand (memory or pseudo) decides the cost and rtx cost >> can't reflect it. >> >> The test case is added in the second target specific patch. >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is it OK for next stage 1? >> >> Thanks >> Gui Haochen >> >> >> ChangeLog >> fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern >> >> gcc/ >> PR target/113325 >> * fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with >> insn_cost. > Testcase? I don't care of it's ppc specific. > > I think we generally want to move from rtx_cost to insn_cost, so I think the > change itself is fine. We just want to make sure a test covers the change in > some manner. > > Also note this a change to generic code and could likely trigger failures on > various targets that have assembler scanning tests. So once you've got a > testcase and the full patch is ack'd we'll need to watch closely for > regressions reported on other targets. > > > So ACK'd once you add a testcase. > > Jeff Thanks for your comments. The test case is in this rs6000 patch. The patch is still under review. https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html I have sent the second version of the patch. The main change is to detect the zero cost returned by insn_cost as it means the cost is unknown. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html I have already tested the patch on other targets. I have found some regression on x86 due to the wrong cost conversion from set_src_cost to pattern_cost. I have sent another patch for this issue. Reviewers have different thoughts on it. It's pending now. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html
Ping [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, Gently ping the series of patches. [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html [PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html [PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html Thanks Gui Haochen 在 2024/5/24 14:02, HAO CHEN GUI 写道: > Hi, > This patch implemented optab_isinf for SFDF and IEEE128 by test > data class instructions. > > Compared with previous version, the main change is to narrow > down the predict for float operand according to review's advice. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. Is it OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: Implement optab_isinf for SFDF and IEEE128 > > gcc/ > PR target/97786 > * config/rs6000/vsx.md (isinf2 for SFDF): New expand. > (isinf2 for IEEE128): New expand. > > gcc/testsuite/ > PR target/97786 > * gcc.target/powerpc/pr97786-1.c: New test. > * gcc.target/powerpc/pr97786-2.c: New test. > > patch.diff > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index f135fa079bd..08cce11da60 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" >operands[4] = CONST0_RTX (SImode); > }) > > +(define_expand "isinf2" > + [(use (match_operand:SI 0 "gpc_reg_operand")) > + (use (match_operand:SFDF 1 "vsx_register_operand"))] > + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" > +{ > + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); > + DONE; > +}) > + > +(define_expand "isinf2" > + [(use (match_operand:SI 0 "gpc_reg_operand")) > + (use (match_operand:IEEE128 1 "vsx_register_operand"))] > + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" > +{ > + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT > (0x30))); > + DONE; > +}) > + > ;; The VSX Scalar Test Negative Quad-Precision > (define_expand "xststdcnegqp_" >[(set (match_dup 2) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c > b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c > new file mode 100644 > index 000..c1c4f64ee8b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > + > +int test1 (double x) > +{ > + return __builtin_isinf (x); > +} > + > +int test2 (float x) > +{ > + return __builtin_isinf (x); > +} > + > +int test3 (float x) > +{ > + return __builtin_isinff (x); > +} > + > +/* { dg-final { scan-assembler-not {\mfcmp} } } */ > +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c > b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c > new file mode 100644 > index 000..ed305e8572e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target ppc_float128_hw } */ > +/* { dg-require-effective-target powerpc_vsx } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } > */ > + > +int test1 (long double x) > +{ > + return __builtin_isinf (x); > +} > + > +int test2 (long double x) > +{ > + return __builtin_isinfl (x); > +} > + > +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ > +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
Ping [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal
Hi, All issues were addressed. Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html Thanks Gui Haochen 在 2024/5/29 14:36, HAO CHEN GUI 写道: > Hi, > This patch adds an optab for __builtin_isnormal. The normal check can be > implemented on rs6000 by a single instruction. It needs an optab to be > expanded to the certain sequence of instructions. > > The subsequent patches will implement the expand on rs6000. > > Compared to previous version, the main change is to specify return > value of the optab should be either 0 or 1. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > optab: Add isnormal_optab for isnormal builtin > > gcc/ > * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab > for isnormal builtin. > * optabs.def (isnormal_optab): New. > * doc/md.texi (isnormal): Document. > > > patch.diff > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index 53e9d210541..89ba56abf17 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl) >builtin_optab = isfinite_optab; >break; > case BUILT_IN_ISNORMAL: > + builtin_optab = isnormal_optab; > + break; > CASE_FLT_FN (BUILT_IN_FINITE): > case BUILT_IN_FINITED32: > case BUILT_IN_FINITED64: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 3eb4216141e..4fd7da095fe 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point > number and 0 > otherwise. @var{m} is a scalar floating point mode. Operand 0 > has mode @code{SImode}, and operand 1 has mode @var{m}. > > +@cindex @code{isnormal@var{m}2} instruction pattern > +@item @samp{isnormal@var{m}2} > +Return 1 if operand 1 is a normal floating point number and 0 > +otherwise. @var{m} is a scalar floating point mode. Operand 0 > +has mode @code{SImode}, and operand 1 has mode @var{m}. > + > @end table > > @end ifset > diff --git a/gcc/optabs.def b/gcc/optabs.def > index dcd77315c2a..3c401fc0b4c 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") > OPTAB_D (ilogb_optab, "ilogb$a2") > OPTAB_D (isinf_optab, "isinf$a2") > OPTAB_D (isfinite_optab, "isfinite$a2") > +OPTAB_D (isnormal_optab, "isnormal$a2") > OPTAB_D (issignaling_optab, "issignaling$a2") > OPTAB_D (ldexp_optab, "ldexp$a3") > OPTAB_D (log10_optab, "log10$a2")
Ping [PATCHv5] Optab: add isfinite_optab for __builtin_isfinite
Hi, All issues were addressed. Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html Thanks Gui Haochen 在 2024/5/29 14:36, HAO CHEN GUI 写道: > Hi, > This patch adds an optab for __builtin_isfinite. The finite check can be > implemented on rs6000 by a single instruction. It needs an optab to be > expanded to the certain sequence of instructions. > > The subsequent patches will implement the expand on rs6000. > > Compared to previous version, the main change is to specify return > value of the optab should be either 0 or 1. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > optab: Add isfinite_optab for isfinite builtin > > gcc/ > * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab > for isfinite builtin. > * optabs.def (isfinite_optab): New. > * doc/md.texi (isfinite): Document. > > > patch.diff > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index f8d94c4b435..53e9d210541 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl) >errno_set = true; builtin_optab = ilogb_optab; break; > CASE_FLT_FN (BUILT_IN_ISINF): >builtin_optab = isinf_optab; break; > -case BUILT_IN_ISNORMAL: > case BUILT_IN_ISFINITE: > + builtin_optab = isfinite_optab; > + break; > +case BUILT_IN_ISNORMAL: > CASE_FLT_FN (BUILT_IN_FINITE): > case BUILT_IN_FINITED32: > case BUILT_IN_FINITED64: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 5730bda80dc..3eb4216141e 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with > operand 2. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{isfinite@var{m}2} instruction pattern > +@item @samp{isfinite@var{m}2} > +Return 1 if operand 1 is a finite floating point number and 0 > +otherwise. @var{m} is a scalar floating point mode. Operand 0 > +has mode @code{SImode}, and operand 1 has mode @var{m}. > + > @end table > > @end ifset > diff --git a/gcc/optabs.def b/gcc/optabs.def > index ad14f9328b9..dcd77315c2a 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") > OPTAB_D (hypot_optab, "hypot$a3") > OPTAB_D (ilogb_optab, "ilogb$a2") > OPTAB_D (isinf_optab, "isinf$a2") > +OPTAB_D (isfinite_optab, "isfinite$a2") > OPTAB_D (issignaling_optab, "issignaling$a2") > OPTAB_D (ldexp_optab, "ldexp$a3") > OPTAB_D (log10_optab, "log10$a2")
[PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]
Hi, This patch optimizes vector construction with two vector doubleword loads. It generates an optimal insn sequence as "xxlor" has lower latency than "mtvsrdd" on Power10. Compared with previous version, the main change is to use "isa" attribute to guard "lxsd" and "lxsdx". https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Optimize vector construction with two vector doubleword loads When constructing a vector by two doublewords from memory, originally it does ld 10,0(3) ld 9,0(4) mtvsrdd 34,9,10 An optimal sequence on Power10 should be lxsd 0,0(4) lxvrdx 1,0,3 xxlor 34,1,32 This patch does this optimization by insn combine and split. gcc/ PR target/103568 * config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn pattern. (vsx_ld_highpart_zero_): New insn pattern. (vsx_concat_mem_): New insn_and_split pattern. gcc/testsuite/ PR target/103568 * gcc.target/powerpc/pr103568.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..f9a2a260e89 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) +(define_insn "vsx_ld_lowpart_zero_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "zero_constant" "j,j")))] + "" + "@ + lxsd %0,%1 + lxsdx %x0,%y1" + [(set_attr "type" "vecload,vecload") + (set_attr "isa" "p9v,p7v")]) + +(define_insn "vsx_ld_highpart_zero_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand: 1 "zero_constant" "j") + (match_operand: 2 "memory_operand" "Z")))] + "TARGET_POWER10" + "lxvrdx %x0,%y2" + [(set_attr "type" "vecload")]) + (define_insn "vsx_ld_elemrev_v1ti" [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa") (vec_select:V1TI @@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_" } [(set_attr "type" "vecperm,vecmove")]) +(define_insn_and_split "vsx_concat_mem_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "memory_operand" "Z,Z")))] + "TARGET_POWER10 && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX (mode), + operands[1])); + emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2], +CONST0_RTX (mode))); + emit_insn (gen_ior3 (operands[0], tmp1, tmp2)); + DONE; +}) + ;; Combiner patterns to allow creating XXPERMDI's to access either double ;; word element in a vector register. (define_insn "*vsx_concat__1" diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c b/gcc/testsuite/gcc.target/powerpc/pr103568.c new file mode 100644 index 000..b2a06fb2162 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +vector double test (double *a, double *b) +{ + return (vector double) {*a, *b}; +} + +vector long long test1 (long long *a, long long *b) +{ + return (vector long long) {*a, *b}; +} + +/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +
[PATCH, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]
Hi, This patch optimizes vector construction with two vector doubleword loads. It generates an optimal insn sequence as "xxlor" has lower latency than "mtvsrdd" on Power10. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Optimize vector construction with two vector doubleword loads When constructing a vector by two doublewords from memory, originally it does ld 10,0(3) ld 9,0(4) mtvsrdd 34,9,10 An optimal sequence on Power10 should be lxsd 0,0(4) lxvrdx 1,0,3 xxlor 34,1,32 This patch does this optimization by insn combine and split. gcc/ PR target/103568 * config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn pattern. (vsx_ld_highpart_zero_): New insn pattern. (vsx_concat_mem_): New insn_and_split pattern. gcc/testsuite/ PR target/103568 * gcc.target/powerpc/pr103568.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..3c98e3d4e13 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1395,6 +1395,26 @@ (define_insn "vsx_ld_elemrev_v2di" "lxvd2x %x0,%y1" [(set_attr "type" "vecload")]) +(define_insn "vsx_ld_lowpart_zero_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "zero_constant" "j,j")))] + "TARGET_P9_VECTOR" + "@ + lxsd %0,%1 + lxsdx %x0,%y1" + [(set_attr "type" "vecload")]) + +(define_insn "vsx_ld_highpart_zero_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand: 1 "zero_constant" "j") + (match_operand: 2 "memory_operand" "Z")))] + "TARGET_POWER10" + "lxvrdx %x0,%y2" + [(set_attr "type" "vecload")]) + (define_insn "vsx_ld_elemrev_v1ti" [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa") (vec_select:V1TI @@ -3063,6 +3083,26 @@ (define_insn "vsx_concat_" } [(set_attr "type" "vecperm,vecmove")]) +(define_insn_and_split "vsx_concat_mem_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa") + (vec_concat:VSX_D + (match_operand: 1 "memory_operand" "wY,Z") + (match_operand: 2 "memory_operand" "Z,Z")))] + "TARGET_POWER10 && can_create_pseudo_p ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX (mode), + operands[1])); + emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2], +CONST0_RTX (mode))); + emit_insn (gen_ior3 (operands[0], tmp1, tmp2)); + DONE; +}) + ;; Combiner patterns to allow creating XXPERMDI's to access either double ;; word element in a vector register. (define_insn "*vsx_concat__1" diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c b/gcc/testsuite/gcc.target/powerpc/pr103568.c new file mode 100644 index 000..b2a06fb2162 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +vector double test (double *a, double *b) +{ + return (vector double) {*a, *b}; +} + +vector long long test1 (long long *a, long long *b) +{ + return (vector long long) {*a, *b}; +} + +/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +
Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]
Hi Kewen, 在 2024/5/29 13:26, Kewen.Lin 写道: > I can understand re-using "unordered" and "eq" will save some efforts than > doing with unspecs, but they are actually RTL codes instead of bits on the > specific hardware CR, a downside is that people who isn't aware of this > design point can have some misunderstanding when reading/checking the code > or dumping, from this perspective unspecs (with reasonable name) can be > more meaningful. Normally adopting RTL code is better since they have the > chance to be considered (optimized) in generic pass/code, but it isn't the > case here as we just use the code itself but not be with the same semantic > (meaning). Looking forward to others' opinions on this, if we want to adopt > "unordered" and "eq" like what this patch does, I think we should at least > emphasize such points in rs6000-modes.def. Thanks so much for your comments. IMHO, the core is if we can re-define "unordered" or "eq" for certain CC mode on a specific target. If we can't or it's unsafe, we have to use the unspecs. In this case, I just want to define the code "unordered" on CCBCD as testing if the bit 3 is set on this CR field. Actually rs6000 already use "lt" code to test if bit 0 is set for vector compare instructions. The following expand is an example. (define_expand "vector_ae__p" [(parallel [(set (reg:CC CR6_REGNO) (unspec:CC [(ne:CC (match_operand:VI 1 "vlogical_operand") (match_operand:VI 2 "vlogical_operand"))] UNSPEC_PREDICATE)) (set (match_dup 3) (ne:VI (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") (lt:SI (reg:CC CR6_REGNO) (const_int 0))) (set (match_dup 0) (xor:SI (match_dup 0) (const_int 1)))] I think the "lt" on CC just doesn't mean it compares if CC value is less than an integer. It just tests the "lt" bit (bit 0) is set or not on this CC. Looking forward to your and Segher's further invaluable comments. Thanks Gui Haochen
[PATCH-1v3] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. Compared with previous version, the main change is to set the range to 1 if it's infinite number otherwise to 0. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op for isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 55dfbb23ce2..4e60a42eaac 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1175,6 +1175,63 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cfn_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (_bound ()) + && !real_isinf (_bound ( + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ +
[PATCH-3v2] Value Range: Add range op for builtin isnormal
Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Compared to previous version, the main change is to set the range to 1 if it's normal number otherwise to 0. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isnormal The former patch adds optab for builtin isnormal. Thus builtin isnormal might not be folded at front end. So the range op for isnormal is needed for value range analysis. This patch adds range op for builtin isnormal. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. * value-range.h (class frange): Declare known_isnormal and known_isdenormal_or_zero. (frange::known_isnormal): Define. (frange::known_isdenormal_or_zero): Define. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 5ec5c828fa4..6787f532f11 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1289,6 +1289,61 @@ public: } } op_cfn_isfinite; +//Implement range operator for CFN_BUILT_IN_ISNORMAL +class cfn_isnormal : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isnormal ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf () + || op1.known_isdenormal_or_zero ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isnormal; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isfinite; break; +case CFN_BUILT_IN_ISNORMAL: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isnormal; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c new file mode 100644 index 000..c4df4d839b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ diff --git a/gcc/value-range.h b/gcc/value-range.h index 37ce91dc52d..1443d1906e5 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -588,6 +588,8 @@ public: bool maybe_isinf () const; bool signbit_p (bool ) const; bool nan_signbit_p (bool ) const; + bool known_isnormal () const; + bool known_isdenormal_or_zero () const; protected: virtual bool contains_p (tree cst) const override; @@ -1650,6 +1652,33 @@ frange::known_isfinite () const return (!maybe_isnan () && !real_isinf (_min) && !real_isinf (_max)); } +// Return TRUE if range is known to be normal. + +inline bool +frange::known_isnormal () const +{ + if (!known_isfinite ()) +return false; + + machine_mode mode =
[PATCH-2v4] Value Range: Add range op for builtin isfinite
Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set the range to 1 if it's finite number otherwise to 0. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 4e60a42eaac..5ec5c828fa4 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1233,6 +1233,62 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cfn_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
[PATCHv5] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify return value of the optab should be either 0 or 1. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. * doc/md.texi (isnormal): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 53e9d210541..89ba56abf17 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl) builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; + break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 3eb4216141e..4fd7da095fe 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point number and 0 otherwise. @var{m} is a scalar floating point mode. Operand 0 has mode @code{SImode}, and operand 1 has mode @var{m}. +@cindex @code{isnormal@var{m}2} instruction pattern +@item @samp{isnormal@var{m}2} +Return 1 if operand 1 is a normal floating point number and 0 +otherwise. @var{m} is a scalar floating point mode. Operand 0 +has mode @code{SImode}, and operand 1 has mode @var{m}. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv5] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify return value of the optab should be either 0 or 1. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. * doc/md.texi (isfinite): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..53e9d210541 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; + break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..3eb4216141e 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with operand 2. This pattern is not allowed to @code{FAIL}. +@cindex @code{isfinite@var{m}2} instruction pattern +@item @samp{isfinite@var{m}2} +Return 1 if operand 1 is a finite floating point number and 0 +otherwise. @var{m} is a scalar floating point mode. Operand 0 +has mode @code{SImode}, and operand 1 has mode @var{m}. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv4] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify acceptable input and output modes for the optab. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652814.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. * doc/md.texi (isnormal): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index b8432f84020..ccd57fce522 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) case BUILT_IN_ISFINITE: builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 7be0c75baf9..491cd09c620 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8563,6 +8563,12 @@ Set operand 0 to nonzero if operand 1 is a finite floating point number and to 0 otherwise. Input mode should be a scalar floating point mode and output mode should be @code{SImode}. +@cindex @code{isnormal@var{m}2} instruction pattern +@item @samp{isnormal@var{m}2} +Set operand 0 to nonzero if operand 1 is a normal floating point +number and to 0 otherwise. Input mode should be a scalar floating +point mode and return mode should be @code{SImode}. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv4] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify acceptable input and output modes for the optab. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652813.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. * doc/md.texi (isfinite): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..b8432f84020 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..7be0c75baf9 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with operand 2. This pattern is not allowed to @code{FAIL}. +@cindex @code{isfinite@var{m}2} instruction pattern +@item @samp{isfinite@var{m}2} +Set operand 0 to nonzero if operand 1 is a finite floating point +number and to 0 otherwise. Input mode should be a scalar floating +point mode and output mode should be @code{SImode}. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv3] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify acceptable modes for the optab. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652172.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. * doc/md.texi (isnormal): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index b8432f84020..ccd57fce522 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) case BUILT_IN_ISFINITE: builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index bc67324872f..7de9c2b5b70 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8566,6 +8566,15 @@ and to 0 otherwise. If this pattern @code{FAIL}, a call to the library function @code{isfinite} is used. +@cindex @code{isnormal@var{m}2} instruction pattern +@item @samp{isnormal@var{m}2} +Set operand 0 to nonzero if operand 1 is a normal @code{SFmode}, +@code{DFmode}, or @code{TFmode} floating point number and to 0 +otherwise. + +If this pattern @code{FAIL}, a call to the library function +@code{isnormal} is used. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv3] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to specify acceptable modes for the optab. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. * doc/md.texi (isfinite): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..b8432f84020 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..67407fad37d 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered with operand 2. This pattern is not allowed to @code{FAIL}. +@cindex @code{isfinite@var{m}2} instruction pattern +@item @samp{isfinite@var{m}2} +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode}, +@code{DFmode}, or @code{TFmode} floating point number and to 0 +otherwise. + +If this pattern @code{FAIL}, a call to the library function +@code{isfinite} is used. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
Re: [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite
Hi Kewen, Thanks for your comments. 在 2024/5/27 11:18, Kewen.Lin 写道: > Does this require "This pattern is not allowed to FAIL."? > > I guess yes? Since if it's decided to go with this pattern > expanding, there is no fall back? The builtin is inline folded if the optab doesn't exist on the target. Otherwise, it is expanded by target specific insns. If it fails at expand, the library is called. It can't fall back to inline folding when it fails at expand. I am not sure whether it should be marked "allowed to FAIL" or not. Could anyone advice me? Thanks Gui Haochen
Ping^2 [Patch, rs6000] Enable overlap memory store for block memory clear
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html Thanks Gui Haochen 在 2024/5/8 9:55, HAO CHEN GUI 写道: > Hi, > As now it's stage 1, gently ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html > > Thanks > Gui Haochen > > 在 2024/2/26 10:25, HAO CHEN GUI 写道: >> Hi, >> This patch enables overlap memory store for block memory clear which >> saves the number of store instructions. The expander calls >> widest_fixed_size_mode_for_block_clear to get the mode for looped block >> clear and calls widest_fixed_size_mode_for_block_clear to get the mode >> for last overlapped clear. >> >> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk or next stage 1? >> >> Thanks >> Gui Haochen >> >> >> ChangeLog >> rs6000: Enable overlap memory store for block memory clear >> >> gcc/ >> * config/rs6000/rs6000-string.cc >> (widest_fixed_size_mode_for_block_clear): New. >> (smallest_fixed_size_mode_for_block_clear): New. >> (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to >> get the mode for looped memory stores and call >> smallest_fixed_size_mode_for_block_clear to get the mode for the last >> overlapped memory store. >> >> gcc/testsuite >> * gcc.target/powerpc/block-clear-1.c: New. >> >> >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 133e5382af2..c2a6095a586 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -38,6 +38,49 @@ >> #include "profile-count.h" >> #include "predict.h" >> >> +/* Return the widest mode which mode size is less than or equal to the >> + size. */ >> +static fixed_size_mode >> +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int >> align, >> +bool unaligned_vsx_ok) >> +{ >> + machine_mode mode; >> + >> + if (TARGET_ALTIVEC >> + && size >= 16 >> + && (align >= 128 >> + || unaligned_vsx_ok)) >> +mode = V4SImode; >> + else if (size >= 8 >> + && TARGET_POWERPC64 >> + && (align >= 64 >> + || !STRICT_ALIGNMENT)) >> +mode = DImode; >> + else if (size >= 4 >> + && (align >= 32 >> + || !STRICT_ALIGNMENT)) >> +mode = SImode; >> + else if (size >= 2 >> + && (align >= 16 >> + || !STRICT_ALIGNMENT)) >> +mode = HImode; >> + else >> +mode = QImode; >> + >> + return as_a (mode); >> +} >> + >> +/* Return the smallest mode which mode size is smaller than or eqaul to >> + the size. */ >> +static fixed_size_mode >> +smallest_fixed_size_mode_for_block_clear (unsigned int size) >> +{ >> + if (size > UNITS_PER_WORD) >> +return as_a (V4SImode); >> + >> + return smallest_int_mode_for_size (size * BITS_PER_UNIT); >> +} >> + >> /* Expand a block clear operation, and return 1 if successful. Return 0 >> if we should let the compiler generate normal code. >> >> @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) >>HOST_WIDE_INT align; >>HOST_WIDE_INT bytes; >>int offset; >> - int clear_bytes; >>int clear_step; >> >>/* If this is not a fixed size move, just call memcpy */ >> @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) >> >>bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); >> >> - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) >> + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, >> + unaligned_vsx_ok); >> + offset = 0; >> + rtx dest; >> + >> + do >> { >> - machine_mode mode = BLKmode; >> - rtx dest; >> + unsigned int size = GET_MODE_SIZE (mode); >> >> - if (TARGET_ALTIVEC >> - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) >> + while (bytes >= size) >> { >> - clear_bytes = 16; >> - mode = V4SImode; >> -} >> - else if (bytes >= 8 && TARGET_POWERPC64
Ping [PATCH-1v2] Value Range: Add range op for builtin isinf
Hi, Gently ping the series of patches which add range op. [PATCH-1v2] Value Range: Add range op for builtin isinf https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html [PATCH-2v3] Value Range: Add range op for builtin isfinite https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html [PATCH-3] Value Range: Add range op for builtin isnormal https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html Thanks Gui Haochen 在 2024/5/21 10:52, HAO CHEN GUI 写道: > Hi, > The builtin isinf is not folded at front end if the corresponding optab > exists. It causes the range evaluation failed on the targets which has > optab_isinf. For instance, range-sincos.c will fail on the targets which > has optab_isinf as it calls builtin_isinf. > > This patch fixed the problem by adding range op for builtin isinf. > > Compared with previous version, the main change is to set varying if > nothing is known about the range. > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > Thanks > Gui Haochen > > > ChangeLog > Value Range: Add range op for builtin isinf > > The builtin isinf is not folded at front end if the corresponding optab > exists. So the range op for isinf is needed for value range analysis. > This patch adds range op for builtin isinf. > > gcc/ > * gimple-range-op.cc (class cfn_isinf): New. > (op_cfn_isinf): New variables. > (gimple_range_op_handler::maybe_builtin_call): Handle > CASE_FLT_FN (BUILT_IN_ISINF). > > gcc/testsuite/ > * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. > > patch.diff > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc > index 55dfbb23ce2..eb1b0aff77c 100644 > --- a/gcc/gimple-range-op.cc > +++ b/gcc/gimple-range-op.cc > @@ -1175,6 +1175,62 @@ private: >bool m_is_pos; > } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); > > +// Implement range operator for CFN_BUILT_IN_ISINF > +class cfn_isinf : public range_operator > +{ > +public: > + using range_operator::fold_range; > + using range_operator::op1_range; > + virtual bool fold_range (irange , tree type, const frange , > +const irange &, relation_trio) const override > + { > +if (op1.undefined_p ()) > + return false; > + > +if (op1.known_isinf ()) > + { > + r.set_nonzero (type); > + return true; > + } > + > +if (op1.known_isnan () > + || (!real_isinf (_bound ()) > + && !real_isinf (_bound ( > + { > + r.set_zero (type); > + return true; > + } > + > +r.set_varying (type); > +return true; > + } > + virtual bool op1_range (frange , tree type, const irange , > + const frange &, relation_trio) const override > + { > +if (lhs.undefined_p ()) > + return false; > + > +if (lhs.zero_p ()) > + { > + nan_state nan (true); > + r.set (type, real_min_representable (type), > +real_max_representable (type), nan); > + return true; > + } > + > +if (!range_includes_zero_p (lhs)) > + { > + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. > + // Set range to [-INF,+INF] > + r.set_varying (type); > + r.clear_nan (); > + return true; > + } > + > +r.set_varying (type); > +return true; > + } > +} op_cfn_isinf; > > // Implement range operator for CFN_BUILT_IN_ > class cfn_parity : public range_operator > @@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call () >m_operator = _cfn_signbit; >break; > > +CASE_FLT_FN (BUILT_IN_ISINF): > + m_op1 = gimple_call_arg (call, 0); > + m_operator = _cfn_isinf; > + break; > + > CASE_CFN_COPYSIGN_ALL: >m_op1 = gimple_call_arg (call, 0); >m_op2 = gimple_call_arg (call, 1); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > new file mode 100644 > index 000..468f1bcf5c7 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > @@ -0,0 +1,44 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-evrp" } */ > + > +#include > +void link_error(); > + > +void > +test1 (double x) > +{ > + if (x > __DBL_MAX__ && !__builtin_isinf (x)) > +link_error (); > + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) > +link_error (); > +} > + > +void > +test2
Ping [PATCHv2] Optab: add isnormal_optab for __builtin_isnormal
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652172.html Thanks Gui Haochen 在 2024/5/20 16:15, HAO CHEN GUI 写道: > Hi, > This patch adds an optab for __builtin_isnormal. The normal check can be > implemented on rs6000 by a single instruction. It needs an optab to be > expanded to the certain sequence of instructions. > > The subsequent patches will implement the expand on rs6000. > > Compared to previous version, the main change is to document isnormal > in md.texi. > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > optab: Add isnormal_optab for isnormal builtin > > gcc/ > * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab > for isnormal builtin. > * optabs.def (isnormal_optab): New. > * doc/md.texi (isnormal): Document. > > > patch.diff > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index b8432f84020..ccd57fce522 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) > case BUILT_IN_ISFINITE: >builtin_optab = isfinite_optab; break; > case BUILT_IN_ISNORMAL: > + builtin_optab = isnormal_optab; break; > CASE_FLT_FN (BUILT_IN_FINITE): > case BUILT_IN_FINITED32: > case BUILT_IN_FINITED64: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 8ed70b3feea..b81b9dec18a 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}. > Set operand 0 to nonzero if operand 1 is a finite floating-point > number and to 0 otherwise. > > +@cindex @code{isnormal@var{m}2} instruction pattern > +@item @samp{isnormal@var{m}2} > +Set operand 0 to nonzero if operand 1 is a normal floating-point > +number and to 0 otherwise. > + > @end table > > @end ifset > diff --git a/gcc/optabs.def b/gcc/optabs.def > index dcd77315c2a..3c401fc0b4c 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") > OPTAB_D (ilogb_optab, "ilogb$a2") > OPTAB_D (isinf_optab, "isinf$a2") > OPTAB_D (isfinite_optab, "isfinite$a2") > +OPTAB_D (isnormal_optab, "isnormal$a2") > OPTAB_D (issignaling_optab, "issignaling$a2") > OPTAB_D (ldexp_optab, "ldexp$a3") > OPTAB_D (log10_optab, "log10$a2")
Ping [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html Thanks Gui Haochen 在 2024/5/20 16:15, HAO CHEN GUI 写道: > Hi, > This patch adds an optab for __builtin_isfinite. The finite check can be > implemented on rs6000 by a single instruction. It needs an optab to be > expanded to the certain sequence of instructions. > > The subsequent patches will implement the expand on rs6000. > > Compared to previous version, the main change is to document isfinite > in md.texi. > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > optab: Add isfinite_optab for isfinite builtin > > gcc/ > * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab > for isfinite builtin. > * optabs.def (isfinite_optab): New. > * doc/md.texi (isfinite): Document. > > > patch.diff > diff --git a/gcc/builtins.cc b/gcc/builtins.cc > index f8d94c4b435..b8432f84020 100644 > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) >errno_set = true; builtin_optab = ilogb_optab; break; > CASE_FLT_FN (BUILT_IN_ISINF): >builtin_optab = isinf_optab; break; > -case BUILT_IN_ISNORMAL: > case BUILT_IN_ISFINITE: > + builtin_optab = isfinite_optab; break; > +case BUILT_IN_ISNORMAL: > CASE_FLT_FN (BUILT_IN_FINITE): > case BUILT_IN_FINITED32: > case BUILT_IN_FINITED64: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 5730bda80dc..8ed70b3feea 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with > operand 2. > > This pattern is not allowed to @code{FAIL}. > > +@cindex @code{isfinite@var{m}2} instruction pattern > +@item @samp{isfinite@var{m}2} > +Set operand 0 to nonzero if operand 1 is a finite floating-point > +number and to 0 otherwise. > + > @end table > > @end ifset > diff --git a/gcc/optabs.def b/gcc/optabs.def > index ad14f9328b9..dcd77315c2a 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") > OPTAB_D (hypot_optab, "hypot$a3") > OPTAB_D (ilogb_optab, "ilogb$a2") > OPTAB_D (isinf_optab, "isinf$a2") > +OPTAB_D (isfinite_optab, "isfinite$a2") > OPTAB_D (issignaling_optab, "issignaling$a2") > OPTAB_D (ldexp_optab, "ldexp$a3") > OPTAB_D (log10_optab, "log10$a2")
Ping^2 [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]
Hi, Gently ping them. Thanks Gui Haochen 在 2024/5/13 9:56, HAO CHEN GUI 写道: > Hi, > Gently ping the series of patches. > [PATCH-1, rs6000]Add a new type of CC mode - CCBCD for bcd insns [PR100736, > PR114732] > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650217.html > [PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650218.html > [PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650219.html > [PATCH-4, rs6000] Optimize single cc bit reverse implementation > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650220.html > [PATCH-5, rs6000] Replace explicit CC bit reverse with common format > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html > [PATCH-6, rs6000] Split setcc to two insns after reload > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html > > Thanks > Gui Haochen > > 在 2024/4/30 15:18, HAO CHEN GUI 写道: >> Hi, >> It's the first patch of a series of patches optimizing CC modes on >> rs6000. >> >> bcd insns set all four bits of a CR field. But it has different single >> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used >> to indict overflow or invalid number. It's not a bit for unordered test. >> So the "le" test should be reversed to "gt" not "ungt". The "ge" test >> should be reversed to "lt" not "unlt". That's the root cause of PR100736 >> and PR114732. >> >> This patch fixes the issue by adding a new type of CC mode - CCBCD for >> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will >> be merged to a uniform pattern which is for all CC modes in sequential >> patch. >> >> The rtl code "unordered" is still used for testing overflow or >> invalid number. IMHO, the "unordered" on a CC mode can be considered as >> testing the forth bit of a CR field setting or not. The "eq" on a CC mode >> can be considered as testing the third bit setting or not. Thus we avoid >> creating lots of unspecs for the CR bit testing. >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk? >> >> Thanks >> Gui Haochen >> >> >> ChangeLog >> rs6000: Add a new type of CC mode - CCBCD for bcd insns >> >> gcc/ >> PR target/100736 >> PR target/114732 >> * config/rs6000/altivec.md (bcd_): Replace CCFP >> with CCBCD. >> (*bcd_test_): Likewise. >> (*bcd_test2_): Likewise. >> (bcd__): Likewise. >> (*bcdinvalid_): Likewise. >> (bcdinvalid_): Likewise. >> (bcdshift_v16qi): Likewise. >> (bcdmul10_v16qi): Likewise. >> (bcddiv10_v16qi): Likewise. >> (peephole for bcd_add/sub): Likewise. >> * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD >> and its supported comparison codes. >> * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD. >> * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD >> assertion. >> * config/rs6000/rs6000.md (CC_any): Add CCBCD. >> (ccbcd_rev): New code iterator. >> (*_cc): New insn and split pattern for CCBCD reverse >> compare. >> >> gcc/testsuite/ >> PR target/100736 >> PR target/114732 >> * gcc.target/powerpc/pr100736.c: New. >> * gcc.target/powerpc/pr114732.c: New. >> >> patch.diff >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index bb20441c096..9fa8cf89f61 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -4443,7 +4443,7 @@ (define_insn "bcd_" >>(match_operand:VBCD 2 "register_operand" "v") >>(match_operand:QI 3 "const_0_to_1_operand" "n")] >> UNSPEC_BCD_ADD_SUB)) >> - (clobber (reg:CCFP CR6_REGNO))] >> + (clobber (reg:CCBCD CR6_REGNO))] >>"TARGET_P8_VECTOR" >>"bcd. %0,%1,%2,%3" >>[(set_attr "type" "vecsimple")]) >> @@ -4454,8 +4454,8 @@ (define_insn "bcd_" >> ;; probably should be one that can go in the VMX (Altivec) registers, so we >> ;; can't use DDmode or DFmode. >> (define_insn "*bcd_test_" >> - [(set (reg:CCFP CR6_REGNO) >> -(compare:CCFP >> + [(set (reg:CCBCD CR6_REGNO) &g
[PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to narrow down the predict for float operand according to review's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652130.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2 for SFDF): New expand. (isnormal2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 95214d732f0..d4d98543912 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5353,6 +5353,28 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..2df472e35d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..00478dbf3ef --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to narrow down the predict for float operand according to review's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652129.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2 for SFDF): New expand. (isfinite2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 08cce11da60..95214d732f0 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5331,6 +5331,28 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..01faa962bd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..0e106b9f23a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to narrow down the predict for float operand according to review's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isinf2 for SFDF): New expand. (isinf2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..08cce11da60 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "vsx_register_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..c1c4f64ee8b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..ed305e8572e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi Peter, Thanks for your comments. 在 2024/5/23 5:58, Peter Bergner 写道: > Is there a reason not to use the vsx_register_operand predicate for op1 > which matches the predicate for the operand of the xststdcp pattern > we're passing op1 to? No, I will fix them. Thanks Gui Haochen
[PATCH-3] Value Range: Add range op for builtin isnormal
Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isnormal The former patch adds optab for builtin isnormal. Thus builtin isnormal might not be folded at front end. So the range op for isnormal is needed for value range analysis. This patch adds range op for builtin isnormal. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. * value-range.h (class frange): Declare known_isnormal and known_isdenormal_or_zero. (frange::known_isnormal): Define. (frange::known_isdenormal_or_zero): Define. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index d69900d1f56..4c3f9c98282 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1281,6 +1281,60 @@ public: } } op_cfn_isfinite; +//Implement range operator for CFN_BUILT_IN_ISNORMAL +class cfn_isnormal : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isnormal ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf () + || op1.known_isdenormal_or_zero ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isnormal; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1383,6 +1437,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isfinite; break; +case CFN_BUILT_IN_ISNORMAL: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isnormal; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c new file mode 100644 index 000..c4df4d839b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isnormal (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ diff --git a/gcc/value-range.h b/gcc/value-range.h index 37ce91dc52d..1443d1906e5 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -588,6 +588,8 @@ public: bool maybe_isinf () const; bool signbit_p (bool ) const; bool nan_signbit_p (bool ) const; + bool known_isnormal () const; + bool known_isdenormal_or_zero () const; protected: virtual bool contains_p (tree cst) const override; @@ -1650,6 +1652,33 @@ frange::known_isfinite () const return (!maybe_isnan () && !real_isinf (_min) && !real_isinf (_max)); } +// Return TRUE if range is known to be normal. + +inline bool +frange::known_isnormal () const +{ + if (!known_isfinite ()) +return false; + + machine_mode mode = TYPE_MODE (type ()); + return (!real_isdenormal (_min, mode) && !real_isdenormal (_max, mode) + && !real_iszero (_min) && !real_iszero (_max) + && (!real_isneg (_min) || real_isneg (_max))); +} + +// Return TRUE if
[PATCH-2v3] Value Range: Add range op for builtin isfinite
Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 922ee7bf0f7..49b6d7abde1 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1229,6 +1229,61 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cfn_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1326,6 +1381,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
[PATCH-1v2] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. Compared with previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op for isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 55dfbb23ce2..eb1b0aff77c 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1175,6 +1175,62 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cfn_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (_bound ()) + && !real_isinf (_bound ( + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ +
[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to document isnormal in md.texi. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. * doc/md.texi (isnormal): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index b8432f84020..ccd57fce522 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) case BUILT_IN_ISFINITE: builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 8ed70b3feea..b81b9dec18a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}. Set operand 0 to nonzero if operand 1 is a finite floating-point number and to 0 otherwise. +@cindex @code{isnormal@var{m}2} instruction pattern +@item @samp{isnormal@var{m}2} +Set operand 0 to nonzero if operand 1 is a normal floating-point +number and to 0 otherwise. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version, the main change is to document isfinite in md.texi. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. * doc/md.texi (isfinite): Document. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..b8432f84020 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5730bda80dc..8ed70b3feea 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with operand 2. This pattern is not allowed to @code{FAIL}. +@cindex @code{isfinite@var{m}2} instruction pattern +@item @samp{isfinite@var{m}2} +Set operand 0 to nonzero if operand 1 is a finite floating-point +number and to 0 otherwise. + @end table @end ifset diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite
Hi Andrew, 在 2024/5/19 3:42, Andrew Pinski 写道: > This is missing adding documentation for the new optab. > It should be documented in md.texi under `Standard Pattern Names For > Generation` section. Thanks for your reminder. I will add ones for all patches. Thanks Gui Haochen
[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649368.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2 for SFDF): New expand. (isnormal2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index ab17178e0a8..cae30dc431e 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5353,6 +5353,28 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..2df472e35d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..0416970b89b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2 for SFDF): New expand. (isfinite2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f0cc02f7e7b..cbb538d6d86 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5333,6 +5333,28 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..01faa962bd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..5fc98084274 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to modify the dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFDF and IEEE128 gcc/ PR target/97786 * config/rs6000/vsx.md (isinf2 for SFDF): New expand. (isinf2 for IEEE128): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..fa20fb4df91 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,24 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..c1c4f64ee8b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmp} } } */ +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..21d90868268 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ppc_float128_hw } */ +/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]
Hi Segher, Thanks for your review comments. I will modify it and resend. Just one question on the insn condition. 在 2024/5/17 1:25, Segher Boessenkool 写道: >> +(define_expand "isnormal2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> +(use (match_operand:SFDF 1 "gpc_reg_operand"))] >> + "TARGET_HARD_FLOAT >> + && TARGET_P9_VECTOR" > Please put the condition on just one line if it is as simple and short > as this. > > Why is TARGET_P9_VECTOR the correct condition? This expand calls gen_xststdcp which is a P9 vector instruction and relies on "TARGET_P9_VECTOR". So I set the condition.
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Andrew, Thanks so much for your explanation. I got it. I will address the issue. Thanks Gui Haochen 在 2024/5/15 2:45, Andrew MacLeod 写道: > > On 5/9/24 04:47, HAO CHEN GUI wrote: >> Hi Mikael, >> >> Thanks for your comments. >> >> 在 2024/5/9 16:03, Mikael Morin 写道: >>> I think the canonical API behaviour sets R to varying and returns true >>> instead of just returning false if nothing is known about the range. >>> >>> I'm not sure whether it makes any difference; Aldy can probably tell. But >>> if the type is bool, varying is [0,1] which is better than unknown range. >> Should the varying be set by caller when fold_range returns false? >> Just like following codes in value-query.cc. >> >> if (!op.fold_range (r, type, r0, r1)) >> r.set_varying (type); >> > This would be dangerous in the general case. fold_range may have returned > false because 'type' is an unsupported range type. Generally this is why we > prefer range-ops to return TRUE and VARYING rather than FALSE for unknown > values. When FALSE is returned, we should stop working with ranges because > something is amok. > > Andrew >
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Jakub, Thanks for your review comments. 在 2024/5/14 23:57, Jakub Jelinek 写道: > BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins, > would be nice to handle the others as well. > > E.g. isnormal/isnan/isinf, fpclassify etc. > Yes, I already sent the patches which add range op for isnormal/isnan/isinf for review. I will modify them according to review comments and submit them again. > Note, the man page says for e.g. isnormal that it returns nonzero or zero, > but in reality I think we implement it always inline and can check if > it always returns [0,1]. > Some others like isinf return [-1,1] though I think and fpclassify > returns union of all the passed int values. The gcc inline code always returns 0 or 1 for isnormal/isnan/isinf. But I wonder if all targets' expand can promise it. The rs6000 has an instruction for isnormal/isnan/isinf. So we're making the patch not to call inline codes and expand them by ourselves. Though rs6000 instruction returns 0 or 1 for them, not sure if other targets are the same. Thanks Gui Haochen
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Mikael, Thanks for your comments. 在 2024/5/9 16:03, Mikael Morin 写道: > I think the canonical API behaviour sets R to varying and returns true > instead of just returning false if nothing is known about the range. > > I'm not sure whether it makes any difference; Aldy can probably tell. But if > the type is bool, varying is [0,1] which is better than unknown range. Should the varying be set by caller when fold_range returns false? Just like following codes in value-query.cc. if (!op.fold_range (r, type, r0, r1)) r.set_varying (type); Thanks Gui Haochen
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
Hi, 在 2024/5/10 20:50, Richard Biener 写道: > IMO give we're dispatching to the rtx_cost hook eventually it needs > documenting there or alternatively catching zero and adjusting its > result there. Of course cost == 0 ? 1 : cost is wrong as it makes > zero vs. one the same cost - using cost + 1 when from rtx_cost > might be more correct, at least preserving relative costs. I tested the draft patch which sets "cost > 0 ? cost + 1 : 1;". Some regression cases are found on x86. The main problems are: The cost compare with COSTS_N_INSNS (1) doesn't works any more with the patch. As all costs are added with 1, the following compare returns true when the cost is 5 but false originally. if (cost > COSTS_N_INSNS (1)) Another problem is the cost is from set_src_cost, it doesn't take dest into consideration. For example, the cost of a store "[`x']=r109:SI" is set to 1 as it only measure the cost of set_src. It seems unreasonable. IMHO, the cost less than COSTS_N_INSNS (1) is meaningful in rtx_cost calculation but unreasonable for an insn. Should the minimum cost of an insn be set to COSTS_N_INSNS (1)? Thanks Gui Haochen
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Aldy, Thanks for your review comments. 在 2024/5/13 19:18, Aldy Hernandez 写道: > On Thu, May 9, 2024 at 10:05 AM Mikael Morin wrote: >> >> Hello, >> >> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit : >>> Hi, >>>The former patch adds isfinite optab for __builtin_isfinite. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html >>> >>>Thus the builtin might not be folded at front end. The range op for >>> isfinite is needed for value range analysis. This patch adds them. >>> >>>Compared to last version, this version fixes a typo. >>> >>>Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> Value Range: Add range op for builtin isfinite >>> >>> The former patch adds optab for builtin isfinite. Thus builtin isfinite >>> might >>> not be folded at front end. So the range op for isfinite is needed for >>> value >>> range analysis. This patch adds range op for builtin isfinite. >>> >>> gcc/ >>> * gimple-range-op.cc (class cfn_isfinite): New. >>> (op_cfn_finite): New variables. >>> (gimple_range_op_handler::maybe_builtin_call): Handle >>> CFN_BUILT_IN_ISFINITE. >>> >>> gcc/testsuite/ >>> * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. >>> >>> patch.diff >>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >>> index 9de130b4022..99c511728d3 100644 >>> --- a/gcc/gimple-range-op.cc >>> +++ b/gcc/gimple-range-op.cc >>> @@ -1192,6 +1192,56 @@ public: >>> } >>> } op_cfn_isinf; >>> >>> +//Implement range operator for CFN_BUILT_IN_ISFINITE >>> +class cfn_isfinite : public range_operator >>> +{ >>> +public: >>> + using range_operator::fold_range; >>> + using range_operator::op1_range; >>> + virtual bool fold_range (irange , tree type, const frange , >>> +const irange &, relation_trio) const override >>> + { >>> +if (op1.undefined_p ()) >>> + return false; >>> + >>> +if (op1.known_isfinite ()) >>> + { >>> + r.set_nonzero (type); >>> + return true; >>> + } >>> + >>> +if (op1.known_isnan () >>> + || op1.known_isinf ()) >>> + { >>> + r.set_zero (type); >>> + return true; >>> + } >>> + >>> +return false; >> I think the canonical API behaviour sets R to varying and returns true >> instead of just returning false if nothing is known about the range. > > Correct. If we know it's varying, we just set varying and return > true. Returning false is usually reserved for "I have no idea". > However, every caller of fold_range() should know to ignore a return > of false, so you should be safe. So it's better to set varying here and return true? > >> >> I'm not sure whether it makes any difference; Aldy can probably tell. >> But if the type is bool, varying is [0,1] which is better than unknown >> range. > > Also, I see you're setting zero/nonzero. Is the return type known to > be boolean, because if so, we usually prefer to one of: The return type is int. For __builtin_isfinite, the result is nonzero when the float is a finite number, 0 otherwise. > > r = range_true () > r = range_false () > r = range_true_and_false (); > > It doesn't matter either way, but it's probably best to use these as > they force boolean_type_node automatically. > > I don't have a problem with this patch, but I would prefer the > floating point savvy people to review this, as there are no members of > the ranger team that are floating point experts :). > > Also, I see you mention in your original post that this patch was > needed as a follow-up to this one: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html > > I don't see the above patch in the source tree currently: Sorry, I may not express it clear. I sent a series of patches for review. Some patches depend on others. The patch I mentioned is a patch also under review. Here is the list of the series of patches. Some of them are generic, and others are rs6000 specific. [PATCH] Value Range: Add range op for builtin isinf https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html [patch, rs6000] Implement optab_isinf for SFmo
Ping [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]
Hi, Gently ping the series of patches. [PATCH-1, rs6000]Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650217.html [PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650218.html [PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650219.html [PATCH-4, rs6000] Optimize single cc bit reverse implementation https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650220.html [PATCH-5, rs6000] Replace explicit CC bit reverse with common format https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html [PATCH-6, rs6000] Split setcc to two insns after reload https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html Thanks Gui Haochen 在 2024/4/30 15:18, HAO CHEN GUI 写道: > Hi, > It's the first patch of a series of patches optimizing CC modes on > rs6000. > > bcd insns set all four bits of a CR field. But it has different single > bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used > to indict overflow or invalid number. It's not a bit for unordered test. > So the "le" test should be reversed to "gt" not "ungt". The "ge" test > should be reversed to "lt" not "unlt". That's the root cause of PR100736 > and PR114732. > > This patch fixes the issue by adding a new type of CC mode - CCBCD for > all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will > be merged to a uniform pattern which is for all CC modes in sequential > patch. > > The rtl code "unordered" is still used for testing overflow or > invalid number. IMHO, the "unordered" on a CC mode can be considered as > testing the forth bit of a CR field setting or not. The "eq" on a CC mode > can be considered as testing the third bit setting or not. Thus we avoid > creating lots of unspecs for the CR bit testing. > > Bootstrapped and tested on powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > Thanks > Gui Haochen > > > ChangeLog > rs6000: Add a new type of CC mode - CCBCD for bcd insns > > gcc/ > PR target/100736 > PR target/114732 > * config/rs6000/altivec.md (bcd_): Replace CCFP > with CCBCD. > (*bcd_test_): Likewise. > (*bcd_test2_): Likewise. > (bcd__): Likewise. > (*bcdinvalid_): Likewise. > (bcdinvalid_): Likewise. > (bcdshift_v16qi): Likewise. > (bcdmul10_v16qi): Likewise. > (bcddiv10_v16qi): Likewise. > (peephole for bcd_add/sub): Likewise. > * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD > and its supported comparison codes. > * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD. > * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD > assertion. > * config/rs6000/rs6000.md (CC_any): Add CCBCD. > (ccbcd_rev): New code iterator. > (*_cc): New insn and split pattern for CCBCD reverse > compare. > > gcc/testsuite/ > PR target/100736 > PR target/114732 > * gcc.target/powerpc/pr100736.c: New. > * gcc.target/powerpc/pr114732.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index bb20441c096..9fa8cf89f61 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -4443,7 +4443,7 @@ (define_insn "bcd_" > (match_operand:VBCD 2 "register_operand" "v") > (match_operand:QI 3 "const_0_to_1_operand" "n")] >UNSPEC_BCD_ADD_SUB)) > - (clobber (reg:CCFP CR6_REGNO))] > + (clobber (reg:CCBCD CR6_REGNO))] >"TARGET_P8_VECTOR" >"bcd. %0,%1,%2,%3" >[(set_attr "type" "vecsimple")]) > @@ -4454,8 +4454,8 @@ (define_insn "bcd_" > ;; probably should be one that can go in the VMX (Altivec) registers, so we > ;; can't use DDmode or DFmode. > (define_insn "*bcd_test_" > - [(set (reg:CCFP CR6_REGNO) > - (compare:CCFP > + [(set (reg:CCBCD CR6_REGNO) > + (compare:CCBCD >(unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v") > (match_operand:VBCD 2 "register_operand" "v") > (match_operand:QI 3 "const_0_to_1_operand" "i")] > @@ -4472,8 +4472,8 @@ (define_insn "*bcd_test2_" > (match_operand:VBCD 2 "register_operand" "v") >
[PATCH] rtlanal: Correct cost regularization in pattern_cost
Hi, The cost return from set_src_cost might be zero. Zero for pattern_cost means unknown cost. So the regularization converts the zero to COSTS_N_INSNS (1). // pattern_cost cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed); return cost > 0 ? cost : COSTS_N_INSNS (1); But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's untouched and just returned by pattern_cost. Thus "zero" from set_src_cost is higher than "one" from set_src_cost. For instance, i386 returns cost "one" for zero_extend op. //ix86_rtx_costs case ZERO_EXTEND: /* The zero extensions is often completely free on x86_64, so make it as cheap as possible. */ if (TARGET_64BIT && mode == DImode && GET_MODE (XEXP (x, 0)) == SImode) *total = 1; This patch fixes the problem by converting all costs which are less than COSTS_N_INSNS (1) to COSTS_N_INSNS (1). Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rtlanal: Correct cost regularization in pattern_cost For the pattern_cost (insn_cost), the smallest known cost is COSTS_N_INSNS (1) and zero means the cost is unknown. The method calls set_src_cost which might returns 0 or a value less than COSTS_N_INSNS (1). For these cases, pattern_cost should always return COSTS_N_INSNS (1). Current regularization is wrong and a value less than COSTS_N_INSNS (1) but larger than 0 will be returned. This patch corrects it. gcc/ * rtlanal.cc (pattern_cost): Return COSTS_N_INSNS (1) when the cost is less than COSTS_N_INSNS (1). patch.diff diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc index 4158a531bdd..f7b3d7d72ce 100644 --- a/gcc/rtlanal.cc +++ b/gcc/rtlanal.cc @@ -5762,7 +5762,7 @@ pattern_cost (rtx pat, bool speed) return 0; cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed); - return cost > 0 ? cost : COSTS_N_INSNS (1); + return cost > COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1); } /* Calculate the cost of a single instruction. A return value of zero
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
Hi Richard, Thanks for your comments. 在 2024/5/10 15:16, Richard Biener 写道: > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no > longer meaningful. So shouldn't it instead be > > return cost > 0 ? cost : 1; Yes, it's better. > > ? Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost > is invalid and thus the target is at fault (I do think that making zero the > unknown value is quite bad since that makes it impossible to have zero > as cost represented). > > It seems the check is to aovid pattern_cost return zero (unknown), so the > comment holds to pattern_cost the same (it returns an 'int' so the better > exceptional value would have been -1, avoiding the compare). But sometime it adds an insn cost. If the unknown cost is -1, the total cost might be distorted. > > Richard. Thanks Gui Haochen
[PATCHv2] rs6000: Enable overlapped by-pieces operations
Hi, This patch enables overlapped by-piece operations. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. Compared to previous version, the change is to remove power8 requirement from test case. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651045.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Enable overlapped by-pieces operations This patch enables overlapped by-piece operations by defining TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. gcc/ * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. gcc/testsuite/ * gcc.target/powerpc/block-cmp-9.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 117999613d8..e713a1e1d57 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1776,6 +1776,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = #undef TARGET_CONST_ANCHOR #define TARGET_CONST_ANCHOR 0x8000 +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true + /* Processor table. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c new file mode 100644 index 000..f16429c2ffb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ + +/* Test if by-piece overlap compare is enabled and following case is + implemented by two overlap word loads and compares. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 7) == 0; +}
[PATCH-1v2] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the cost is determined by destination operand (memory or pseudo). Unfortunately, rtx cost can't help. The test case is added in the second target specific patch. https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html Compared to previous version, the main change is not to do substitution if either new or old insn cost is zero. The zero means the cost is unknown. Previous version https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643994.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? ChangeLog fwprop: Replace set_src_cost with insn_cost in try_fwprop_subst_pattern gcc/ * fwprop.cc (try_fwprop_subst_pattern): Replace set_src_cost with insn_cost. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index cb6fd6700ca..184a22678b7 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -470,21 +470,19 @@ try_fwprop_subst_pattern (obstack_watermark , insn_change _change, redo_changes (0); } - /* ??? In theory, it should be better to use insn costs rather than - set_src_costs here. That would involve replacing this code with - change_is_worthwhile. */ bool ok = recog (attempt, use_change); if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) -if (rtx use_set = single_set (use_rtl)) +if (single_set (use_rtl)) { bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); + auto new_cost = insn_cost (use_rtl, speed); temporarily_undo_changes (0); - auto old_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); + /* Invalide recog data. */ + INSN_CODE (use_rtl) = -1; + auto old_cost = insn_cost (use_rtl, speed); redo_changes (0); - auto new_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); - if (new_cost > old_cost + if (new_cost == 0 || old_cost == 0 + || new_cost > old_cost || (new_cost == old_cost && !prop.likely_profitable_p ())) { if (dump_file)
Re: [PATCH] rs6000: Enable overlapped by-pieces operations
Hi Kewen, 在 2024/5/9 13:44, Kewen.Lin 写道: > Why does it need power8 forced here? I think it over. It's no need. For the sub-targets which library is called, l[hb]z won't be generated too. Thanks Gui Haochen
Re: [PATCH] rs6000: Enable overlapped by-pieces operations
Hi Kewen, Thanks for your comments. 在 2024/5/9 13:44, Kewen.Lin 写道: > Hi, > > on 2024/5/8 14:47, HAO CHEN GUI wrote: >> Hi, >> This patch enables overlapped by-piece operations. On rs6000, default >> move/set/clear ratio is 2. So the overlap is only enabled with compare >> by-pieces. > > Thanks for enabling this, did you evaluate if it can help some benchmark? Tested it with SPEC2017. No obvious performance impact. I think memory compare might not be hot enough. Tested it with my micro benchmark. 5-10% performance gain when compare length is 7. > >> >> Bootstrapped and tested on powerpc64-linux BE and LE with no >> regressions. Is it OK for the trunk? >> >> Thanks >> Gui Haochen >> >> ChangeLog >> rs6000: Enable overlapped by-pieces operations >> >> This patch enables overlapped by-piece operations by defining >> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >> ratio is 2. So the overlap is only enabled with compare by-pieces. >> >> gcc/ >> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >> >> gcc/testsuite/ >> * gcc.target/powerpc/block-cmp-9.c: New. >> >> >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index 6b9a40fcc66..2b5f5cf1d86 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const >> rs6000_attribute_table[] = >> #undef TARGET_CONST_ANCHOR >> #define TARGET_CONST_ANCHOR 0x8000 >> >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >> + >> >> >> /* Processor table. */ >> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> new file mode 100644 >> index 000..b5f51affbb7 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >> @@ -0,0 +1,11 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > > Why does it need power8 forced here? I just want to exclude P7 LE as targetm.slow_unaligned_access return false for it and the expand cmpmemsi won't be invoked. > > BR, > Kewen > >> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >> + >> +/* Test if by-piece overlap compare is enabled and following case is >> + implemented by two overlap word loads and compares. */ >> + >> +int foo (const char* s1, const char* s2) >> +{ >> + return __builtin_memcmp (s1, s2, 7) == 0; >> +} > Thanks Gui Haochen
[PATCH] rs6000: Enable overlapped by-pieces operations
Hi, This patch enables overlapped by-piece operations. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Enable overlapped by-pieces operations This patch enables overlapped by-piece operations by defining TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. gcc/ * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. gcc/testsuite/ * gcc.target/powerpc/block-cmp-9.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 6b9a40fcc66..2b5f5cf1d86 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] = #undef TARGET_CONST_ANCHOR #define TARGET_CONST_ANCHOR 0x8000 +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true + /* Processor table. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c new file mode 100644 index 000..b5f51affbb7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ + +/* Test if by-piece overlap compare is enabled and following case is + implemented by two overlap word loads and compares. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 7) == 0; +}
Ping^3 [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]
Hi, As now it's stage-1, gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html Gui Haochen Thanks 在 2023/4/24 13:35, HAO CHEN GUI 写道: > Hi, > Gently ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html > > Thanks > Gui Haochen > > 在 2023/2/20 10:10, HAO CHEN GUI 写道: >> Hi, >> Gently ping this: >> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html >> >> Gui Haochen >> Thanks >> >> 在 2023/2/8 13:08, HAO CHEN GUI 写道: >>> Hi, >>> The logical operations for TImode is split after reload pass right now. >>> Some >>> potential optimizations miss as the split is too late. This patch removes >>> TImode from "AND", "IOR", "XOR" and "NOT" expander so that these logical >>> operations can be split at expand pass. The new test case illustrates the >>> optimization. >>> >>> Two test cases of pr92398 are merged into one as all sub-targets generates >>> the same sequence of instructions with the patch. >>> >>> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. >>> >>> Thanks >>> Gui Haochen >>> >>> >>> ChangeLog >>> 2023-02-08 Haochen Gui >>> >>> gcc/ >>> PR target/100694 >>> * config/rs6000/rs6000.md (BOOL_128_V): New mode iterator for 128-bit >>> vector types. >>> (and3): Replace BOOL_128 with BOOL_128_V. >>> (ior3): Likewise. >>> (xor3): Likewise. >>> (one_cmpl2 expander): New expander with BOOL_128_V. >>> (one_cmpl2 insn_and_split): Rename to ... >>> (*one_cmpl2): ... this. >>> >>> gcc/testsuite/ >>> PR target/100694 >>> * gcc.target/powerpc/pr100694.c: New. >>> * gcc.target/powerpc/pr92398.c: New. >>> * gcc.target/powerpc/pr92398.h: Remove. >>> * gcc.target/powerpc/pr92398.p9-.c: Remove. >>> * gcc.target/powerpc/pr92398.p9+.c: Remove. >>> >>> >>> patch.diff >>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md >>> index 4bd1dfd3da9..455b7329643 100644 >>> --- a/gcc/config/rs6000/rs6000.md >>> +++ b/gcc/config/rs6000/rs6000.md >>> @@ -743,6 +743,15 @@ (define_mode_iterator BOOL_128 [TI >>> (V2DF "TARGET_ALTIVEC") >>> (V1TI "TARGET_ALTIVEC")]) >>> >>> +;; Mode iterator for logical operations on 128-bit vector types >>> +(define_mode_iterator BOOL_128_V [(V16QI "TARGET_ALTIVEC") >>> +(V8HI "TARGET_ALTIVEC") >>> +(V4SI "TARGET_ALTIVEC") >>> +(V4SF "TARGET_ALTIVEC") >>> +(V2DI "TARGET_ALTIVEC") >>> +(V2DF "TARGET_ALTIVEC") >>> +(V1TI "TARGET_ALTIVEC")]) >>> + >>> ;; For the GPRs we use 3 constraints for register outputs, two that are the >>> ;; same as the output register, and a third where the output register is an >>> ;; early clobber, so we don't have to deal with register overlaps. For the >>> @@ -7135,23 +7144,23 @@ (define_expand "subti3" >>> ;; 128-bit logical operations expanders >>> >>> (define_expand "and3" >>> - [(set (match_operand:BOOL_128 0 "vlogical_operand") >>> - (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") >>> - (match_operand:BOOL_128 2 "vlogical_operand")))] >>> + [(set (match_operand:BOOL_128_V 0 "vlogical_operand") >>> + (and:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand") >>> + (match_operand:BOOL_128_V 2 "vlogical_operand")))] >>>"" >>>"") >>> >>> (define_expand "ior3" >>> - [(set (match_operand:BOOL_128 0 "vlogical_operand") >>> -(ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") >>> - (match_operand:BOOL_128 2 "vlogical_operand")))] >>> + [(set (match_operand:BOOL_128_V 0 "vlogical_operand") >>> + (ior:BOOL_128_V (match
Ping [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/18 17:10, HAO CHEN GUI 写道: > Hi, > Gently ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html > > Thanks > Gui Haochen > > 在 2024/3/11 13:41, HAO CHEN GUI 写道: >> Hi, >> This patch tries to fix the problem when a canonical form doesn't benefit >> on a specific target. The const operand of AND is and with the nonzero >> bits of another operand in combine pass. It's a canonical form, but it's no >> benefits for the target which has rotate and mask insns. As the mask is >> truncated, it can't match the insn conditions which it originally matches. >> For example, the following insn condition checks the sum of two AND masks. >> When one of the mask is truncated, the condition breaks. >> >> (define_insn "*rotlsi3_insert_5" >> [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") >> (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") >> (match_operand:SI 2 "const_int_operand" "n,n")) >> (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") >> (match_operand:SI 4 "const_int_operand" "n,n"] >> "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" >> ... >> >> This patch tries to fix the problem by comparing the rtx cost. If another >> operand (varop) is not changed and rtx cost with new mask is not less than >> the original one, the mask is restored to original one. >> >> I'm not sure if comparison of rtx cost here is proper. The outer code is >> unknown and I suppose it as "SET". Also the rtx cost might not be accurate. >> From my understanding, the canonical forms should always benefit as it can't >> be undo in combine pass. Do we have a perfect solution for this kind of >> issues? Looking forward for your advice. >> >> Another similar issues for canonical forms. Whether the widen mode for >> lshiftrt is always good? >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html >> >> Thanks >> Gui Haochen >> >> ChangeLog >> Combine: Don't truncate const operand of AND if it's no benefits >> >> In combine pass, the canonical form is to turn off all bits in the constant >> that are know to already be zero for AND. >> >> /* Turn off all bits in the constant that are known to already be zero. >> Thus, if the AND isn't needed at all, we will have CONSTOP == >> NONZERO_BITS >> which is tested below. */ >> >> constop &= nonzero; >> >> But it doesn't benefit when the target has rotate and mask insert insns. >> The AND mask is truncated and lost its information. Thus it can't match >> the insn conditions. For example, the following insn condition checks >> the sum of two AND masks. >> >> (define_insn "*rotlsi3_insert_5" >> [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") >> (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") >> (match_operand:SI 2 "const_int_operand" "n,n")) >> (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") >> (match_operand:SI 4 "const_int_operand" "n,n"] >> "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" >> ... >> >> This patch restores the const operand of AND if the another operand is >> not optimized and the truncated const operand doesn't save the rtx cost. >> >> gcc/ >> * combine.cc (simplify_and_const_int_1): Restore the const operand >> of AND if varop is not optimized and the rtx cost of the new const >> operand is not reduced. >> >> gcc/testsuite/ >> * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and >> adjust the number of rotate and mask insns. >> * gcc.target/powerpc/rlwimi-1.c: Likewise. >> * gcc.target/powerpc/rlwimi-2.c: Likewise. >> >> patch.diff >> diff --git a/gcc/combine.cc b/gcc/combine.cc >> index a4479f8d836..16ff
Re: [Patch, rs6000] Enable overlap memory store for block memory clear
Hi, As now it's stage 1, gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html Thanks Gui Haochen 在 2024/2/26 10:25, HAO CHEN GUI 写道: > Hi, > This patch enables overlap memory store for block memory clear which > saves the number of store instructions. The expander calls > widest_fixed_size_mode_for_block_clear to get the mode for looped block > clear and calls widest_fixed_size_mode_for_block_clear to get the mode > for last overlapped clear. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk or next stage 1? > > Thanks > Gui Haochen > > > ChangeLog > rs6000: Enable overlap memory store for block memory clear > > gcc/ > * config/rs6000/rs6000-string.cc > (widest_fixed_size_mode_for_block_clear): New. > (smallest_fixed_size_mode_for_block_clear): New. > (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to > get the mode for looped memory stores and call > smallest_fixed_size_mode_for_block_clear to get the mode for the last > overlapped memory store. > > gcc/testsuite > * gcc.target/powerpc/block-clear-1.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-string.cc > b/gcc/config/rs6000/rs6000-string.cc > index 133e5382af2..c2a6095a586 100644 > --- a/gcc/config/rs6000/rs6000-string.cc > +++ b/gcc/config/rs6000/rs6000-string.cc > @@ -38,6 +38,49 @@ > #include "profile-count.h" > #include "predict.h" > > +/* Return the widest mode which mode size is less than or equal to the > + size. */ > +static fixed_size_mode > +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int > align, > + bool unaligned_vsx_ok) > +{ > + machine_mode mode; > + > + if (TARGET_ALTIVEC > + && size >= 16 > + && (align >= 128 > + || unaligned_vsx_ok)) > +mode = V4SImode; > + else if (size >= 8 > +&& TARGET_POWERPC64 > +&& (align >= 64 > +|| !STRICT_ALIGNMENT)) > +mode = DImode; > + else if (size >= 4 > +&& (align >= 32 > +|| !STRICT_ALIGNMENT)) > +mode = SImode; > + else if (size >= 2 > +&& (align >= 16 > +|| !STRICT_ALIGNMENT)) > +mode = HImode; > + else > +mode = QImode; > + > + return as_a (mode); > +} > + > +/* Return the smallest mode which mode size is smaller than or eqaul to > + the size. */ > +static fixed_size_mode > +smallest_fixed_size_mode_for_block_clear (unsigned int size) > +{ > + if (size > UNITS_PER_WORD) > +return as_a (V4SImode); > + > + return smallest_int_mode_for_size (size * BITS_PER_UNIT); > +} > + > /* Expand a block clear operation, and return 1 if successful. Return 0 > if we should let the compiler generate normal code. > > @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) >HOST_WIDE_INT align; >HOST_WIDE_INT bytes; >int offset; > - int clear_bytes; >int clear_step; > >/* If this is not a fixed size move, just call memcpy */ > @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) > >bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); > > - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) > + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, > + unaligned_vsx_ok); > + offset = 0; > + rtx dest; > + > + do > { > - machine_mode mode = BLKmode; > - rtx dest; > + unsigned int size = GET_MODE_SIZE (mode); > > - if (TARGET_ALTIVEC > - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) > + while (bytes >= size) > { > - clear_bytes = 16; > - mode = V4SImode; > - } > - else if (bytes >= 8 && TARGET_POWERPC64 > -&& (align >= 64 || !STRICT_ALIGNMENT)) > - { > - clear_bytes = 8; > - mode = DImode; > - if (offset == 0 && align < 64) > - { > - rtx addr; > + dest = adjust_address (orig_dest, mode, offset); > + emit_move_insn (dest, CONST0_RTX (mode)); > > - /* If the address form is reg+offset with offset not a > - multiple of four, reload into reg indirect form here > - rather than waiting for reload. This way we get one > - reload, not one per store. */ > -
[PATCHv2] Value range: Add range op for __builtin_isfinite
Hi, The former patch adds isfinite optab for __builtin_isfinite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Thus the builtin might not be folded at front end. The range op for isfinite is needed for value range analysis. This patch adds them. Compared to last version, this version fixes a typo. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 9de130b4022..99c511728d3 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1192,6 +1192,56 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cfn_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +return false; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p ()) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +return false; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1288,6 +1338,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
[PATCH-6, rs6000] Split setcc to two insns after reload
Hi, It's the sixth patch of a series of patches optimizing CC modes on rs6000. This patch splits setcc to two separate insns after reload so that other insns can be inserted between them. It should increase the parallelism. The rotate_cr pattern still needs the info of the number of cr fields as the pass pro_and_epilogue might change the cr register. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Split setcc to two insns after reload This patch splits setcc to two separate insns after reload so that other insns can be inserted between them. gcc/ * config/rs6000/rs6000.md (c_enum unpsec): Add UNSPEC_MFCR and UNSPEC_ROTATE_CR. (*move_from_cr): New. (insn set_cc): Remove. (*rotate_cr): New. (insn_and_split set_cc): New. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index ccf392b6409..0ad08e3111e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -159,6 +159,8 @@ (define_c_enum "unspec" UNSPEC_XXSPLTIW_CONST UNSPEC_FMAX UNSPEC_FMIN + UNSPEC_MFCR + UNSPEC_ROTATE_CR ]) ;; @@ -12744,26 +12746,51 @@ (define_insn_and_split "*cmp_internal2" } }) -;; Now we have the scc insns. We can do some combinations because of the -;; way the machine works. -;; -;; Note that this is probably faster if we can put an insn between the -;; mfcr and rlinm, but this is tricky. Let's leave it for now. In most -;; cases the insns below which don't use an intermediate CR field will -;; be used instead. -(define_insn "set_cc" + +(define_insn "*move_from_cr" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") - (match_operator:GPR 1 "scc_comparison_operator" - [(match_operand 2 "cc_reg_operand" "y") -(const_int 0)]))] + (unspec:GPR [(match_operand 1 "cc_reg_operand" "y")] + UNSPEC_MFCR))] "" - "mfcr %0%Q2\;rlwinm %0,%0,%J1,1" + "mfcr %0%Q1" [(set (attr "type") (cond [(match_test "TARGET_MFCRF") (const_string "mfcrf") ] - (const_string "mfcr"))) - (set_attr "length" "8")]) + (const_string "mfcr")))]) + +;; Split the insn after reload so that other insns can be inserted +;; between mfcr and rlinm. +(define_insn_and_split "set_cc" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") + (match_operator:GPR 1 "scc_comparison_operator" + [(match_operand 2 "cc_reg_operand" "y") +(const_int 0)]))] + "!TARGET_POWER10 + || (GET_MODE (operands[2]) != CCmode + && GET_MODE (operands[2]) != CCUNSmode)" + "#" + "&& reload_completed" + [(set (match_dup 0) + (unspec:GPR [(match_dup 2)] + UNSPEC_MFCR)) + (set (match_dup 0) + (unspec:GPR [(match_dup 0) +(match_dup 1)] + UNSPEC_ROTATE_CR))] + "" + [(set_attr "length" "8")]) + +(define_insn "*rotate_cr" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") + (unspec:GPR [(match_operand:GPR 3 "gpc_reg_operand" "r") +(match_operator:GPR 1 "scc_comparison_operator" + [(match_operand 2 "cc_reg_operand" "y") +(const_int 0)])] + UNSPEC_ROTATE_CR))] + "" + "rlwinm %0,%3,%J1,1" +) (define_insn_and_split "*set_rev" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
[PATCH-5, rs6000] Replace explicit CC bit reverse with common format
Hi, It's the fifth patch of a series of patches optimizing CC modes on rs6000. There are some explicit CR6 bit reverse (mfcr/xor) expand in vector.md. As the forth patch optimized CC bit reverse implement, the patch changes the explicit format to the common format (testing if the bit is not set). With the common format, it can matches different implement on different sub-targets. On Power10, it should be setbcr. On Power9, it's isel. On Power8 and below, it's mfcr/xor. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Replace explicit CC bit reverse with common format This patch replaces explicit CC bit reverse (mfcr/xor) with the common format so that it can match setbcr on Power 10, isel on Power 9 and mfcr/xor on other sub-targets. gcc/ * config/rs6000/vector.md (vector_ae__p): Replace explicit CC bit reverse with common format. (vector_ae_v2di_p): Likewise. (vector_ae_v1ti_p): Likewise. (vector_ae__p): Likewise. (cr6_test_for_zero): Likewise. (cr6_test_for_lt): Likewise. gcc/testsuite/ * gcc.target/powerpc/vsu/vec-any-eq-10.c: Replace rlwinm with isel. * gcc.target/powerpc/vsu/vec-any-eq-14.c: Replace rlwinm with isel. * gcc.target/powerpc/vsu/vec-any-eq-7.c: Replace rlwinm with isel. * gcc.target/powerpc/vsu/vec-any-eq-8.c: Replace rlwinm with isel. * gcc.target/powerpc/vsu/vec-any-eq-9.c: Replace rlwinm with isel. patch.diff diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index f86c1f2990e..b1bbf9bac2d 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -942,11 +942,8 @@ (define_expand "vector_ae__p" (ne:VI (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (lt:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ge:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_P9_VECTOR" { operands[3] = gen_reg_rtx (mode); @@ -1027,11 +1024,8 @@ (define_expand "vector_ae_v2di_p" (eq:V2DI (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (eq:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ne:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_P9_VECTOR" { operands[3] = gen_reg_rtx (V2DImode); @@ -1048,11 +1042,8 @@ (define_expand "vector_ae_v1ti_p" (eq:V1TI (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (eq:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ne:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_POWER10" { operands[3] = gen_reg_rtx (V1TImode); @@ -1095,11 +1086,8 @@ (define_expand "vector_ae__p" (eq:VEC_F (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (eq:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ne:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_P9_VECTOR" { operands[3] = gen_reg_rtx (mode); @@ -1172,11 +1160,8 @@ (define_expand "cr6_test_for_zero" ;; integer constant first argument equals one (aka __CR6_EQ_REV in altivec.h). (define_expand "cr6_test_for_zero_reverse" [(set (match_operand:SI 0 "register_operand" "=r") - (eq:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ne:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_ALTIVEC || TARGET_VSX" "") @@ -1198,11 +1183,8 @@ (define_expand "cr6_test_for_lt" ;; (aka __CR6_LT_REV in altivec.h). (define_expand "cr6_test_for_lt_reverse" [(set (match_operand:SI 0 "register_operand" "=r") - (lt:SI (reg:CCLTEQ CR6_REGNO) - (const_int 0))) - (set (match_dup 0) - (xor:SI (match_dup 0) - (const_int 1)))] + (ge:SI (reg:CCLTEQ CR6_REGNO) + (const_int 0)))] "TARGET_ALTIVEC || TARGET_VSX" "") diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c index 30dfc83a97b..9743a496fb5 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c @@ -15,4 +15,4 @@ test_any_equal (vector unsigned long long *arg1_p, } /* { dg-final { scan-assembler "vcmpequd." } } */ -/* { dg-final { scan-assembler "rlwinm
[PATCH-4, rs6000] Optimize single cc bit reverse implementation
Hi, It's the forth patch of a series of patches optimizing CC modes on rs6000. The single CC bit reverse can be implemented by setbcr on Power10 or isel on Power9 or mfcr on Power8 and below. Originally CCFP is not supported for isel and setbcr as bcd insns use CCFP and its bit reverse is not the same as normal CCFP mode. Previous patches add new CC modes according to the usage of CC bits. So now single CC bit reverse can be supported on all CC modes with a uniform pattern. This patch removes unordered and ordered from codes list of CCFP with finite_math_only set. These two are no needed as bcd insns use a separate CC mode now. reverse_condition is replaced with rs6000_reverse_condition as all CC modes can be reversed. A new isel version single CC bit reverse pattern is added. fp and bcd CC reverse pattern are removed and a uniform single CC bit reverse pattern is added, which is mfcr version. The new test cases illustrate the different implementation of single cc bit reverse test. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Optimize single cc bit reverse implementation This patch implements single cc bit reverse by mfcr (on Power8 and below) or isel (on Power9) or setbcr (on Power10) with all CC modes. gcc/ * config/rs6000/predicates.md (branch_comparison_operator): Remove unordered and ordered from CCFP with finite_math_only. (scc_comparison_operator): Add unle and unge. * config/rs6000/rs6000.md (CCANY): Add CCFP, CCBCD, CCLTEQ and CCEQ. (*isel_reversed__): Replace reverse_condition with rs6000_reverse_condition. (*set_rev): New insn_and_split pattern for single cc bit reverse P9 version. (fp_rev, ccbcd_rev): Remove. (*_cc): Remove the pattern for CCFP and CCBCD. Merge them to... (*set_rev): ...this, the new insn_and_split pattern for single cc bit reverse P8 and below version. gcc/testsuite/ * gcc.target/powerpc/cc_rev.h: New. * gcc.target/powerpc/cc_rev_1.c: New. * gcc.target/powerpc/cc_rev_2.c: New. * gcc.target/powerpc/cc_rev_3.c: New. patch.diff diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 322e7639fd4..ddb46799bff 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1348,7 +1348,7 @@ (define_predicate "branch_comparison_operator" (match_test "GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC") (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode") (if_then_else (match_test "flag_finite_math_only") - (match_code "lt,le,gt,ge,eq,ne,unordered,ordered") + (match_code "lt,le,gt,ge,eq,ne") (match_code "lt,gt,eq,unordered,unge,unle,ne,ordered")) (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCBCDmode") (match_code "lt,le,gt,ge,eq,ne,unordered,ordered") @@ -1397,7 +1397,7 @@ (define_predicate "scc_comparison_operator" ;; an SCC insn. (define_predicate "scc_rev_comparison_operator" (and (match_operand 0 "branch_comparison_operator") - (match_code "ne,le,ge,leu,geu,ordered"))) + (match_code "ne,le,ge,leu,geu,ordered,unle,unge"))) ;; Return 1 if OP is a comparison operator suitable for floating point ;; vector/scalar comparisons that generate a -1/0 mask. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2c6255395d1..ccf392b6409 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -5509,7 +5509,7 @@ (define_expand "movcc" ;; leave out the mode in operand 4 and use one pattern, but reload can ;; change the mode underneath our feet and then gets confused trying ;; to reload the value. -(define_mode_iterator CCANY [CC CCUNS]) +(define_mode_iterator CCANY [CC CCUNS CCFP CCBCD CCLTEQ CCEQ]) (define_insn "isel__" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r") (if_then_else:GPR @@ -5536,7 +5536,8 @@ (define_insn "*isel_reversed__" (match_operand:GPR 3 "reg_or_zero_operand" "O,b")))] "TARGET_ISEL" { - PUT_CODE (operands[1], reverse_condition (GET_CODE (operands[1]))); + PUT_CODE (operands[1], rs6000_reverse_condition (mode, + GET_CODE (operands[1]))); return "isel %0,%3,%2,%j1"; } [(set_attr "type" "isel")]) @@ -12764,6 +12765,27 @@ (define_insn "set_cc" (const_string "mfcr"))) (set_attr "length" "8")]) +(define_insn_and_split "*set_rev" + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") + (match_operator:GPR 1 "scc_rev_comparison_operator" + [(match_operand:CCANY 2 "cc_reg_operand" "y") +(const_int 0)]))] + "TARGET_ISEL + && !TARGET_POWER10" + "#" + "&& 1" + [(set (match_dup 2) + (const_int 1)) + (set (match_dup 0) + (if_then_else:GPR +
[PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ
Hi, It's the third patch of a series of patches optimizing CC modes on rs6000. This patch sets CC mode of vector string isolate insns to CCEQ instead of CCFP as these insns only set/check CR bit 2. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Set CC mode of vector string isolate insns to CCEQ gcc/ * config/rs6000/altivec.md (vstrir_p_direct_): Replace CCFP with CCEQ. (vstril_p_direct_): Likewise. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index bd79a3f9e84..a883a814a82 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -932,9 +932,9 @@ (define_insn "vstrir_p_direct_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIR)) - (set (reg:CC CR6_REGNO) - (unspec:CC [(match_dup 1)] - UNSPEC_VSTRIR))] + (set (reg:CCEQ CR6_REGNO) + (unspec:CCEQ [(match_dup 1)] +UNSPEC_VSTRIR))] "TARGET_POWER10" "vstrir. %0,%1" [(set_attr "type" "vecsimple")]) @@ -984,9 +984,9 @@ (define_insn "vstril_p_direct_" (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand" "v")] UNSPEC_VSTRIL)) - (set (reg:CC CR6_REGNO) - (unspec:CC [(match_dup 1)] - UNSPEC_VSTRIR))] + (set (reg:CCEQ CR6_REGNO) + (unspec:CCEQ [(match_dup 1)] +UNSPEC_VSTRIR))] "TARGET_POWER10" "vstril. %0,%1" [(set_attr "type" "vecsimple")])
[PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ
Hi, It's the second patch of a series of patches optimizing CC modes on rs6000. This patch adds a new type of CC mode - CCLTEQ used for the case which only set CR bit 0 and 2. The bit 1 and 3 are not used. The vector compare and test data class instructions are the cases. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add a new type of CC mode - CCLTEQ The new mode is used for the case which only checks cr bit 0 and 2. gcc/ * config/rs6000/altivec.md (altivec_vcmpequ_p): Replace CCFP with CCLTEQ. (altivec_vcmpequt_p): Likewise. (*altivec_vcmpgts_p): Likewise. (*altivec_vcmpgtst_p): Likewise. (*altivec_vcmpgtu_p): Likewise. (*altivec_vcmpgtut_p): Likewise. (*altivec_vcmpeqfp_p): Likewise. (*altivec_vcmpgtfp_p): Likewise. (*altivec_vcmpgefp_p): Likewise. (altivec_vcmpbfp_p): Likewise. * config/rs6000/predicates.md (branch_comparison_operator): Add CCLTEQ and its supported comparison codes. * config/rs6000/rs6000-modes.def (CC_MODE): Add CCLTEQ. * config/rs6000/rs6000.cc (validate_condition_mode): Add assertion for CCLTEQ. * config/rs6000/rs6000.md (CC_any): Add CCLTEQ. * config/rs6000/vector.md (vector_eq__p): Replace CCFP with CCLTEQ. (vector_eq_v1ti_p): Likewise. (vector_ne__p): Likewise. (vector_ae__p): Likewise. (vector_nez__p): Likewise. (vector_ne_v2di_p): Likewise. (vector_ne_v1ti_p): Likewise. (vector_ae_v2di_p): Likewise. (vector_ae_v1ti_p): Likewise. (vector_ne__p): Likewise. (vector_ae__p): Likewise. (vector_gt__p): Likewise. (vector_gt_v1ti_p): Likewise. (vector_ge__p): Likewise. (vector_gtu__p): Likewise. (cr6_test_for_zero): Likewise. (cr6_test_for_zero_reverse): Likewise. (cr6_test_for_lt): Likewise. (cr6_test_for_lt_reverse): Likewise. * config/rs6000/vsx.md (*vsx_eq__p): Likewise. (*vsx_gt__p): Likewise. (*vsx_ge__p): Likewise. (xststdcqp_): Likewise. (xststdcp): Likewise. (xststdcnegqp_): Likewise. (xststdcnegp): Likewise. (*xststdcqp_): Likewise. (*xststdcp): Likewise. (*vsx_ne__p): Likewise. (*vector_nez__p): Likewise. (vcmpnezb_p): Likewise. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 9fa8cf89f61..bd79a3f9e84 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2650,10 +2650,10 @@ (define_expand "cbranchv16qi4" ;; Compare vectors producing a vector result and a predicate, setting CR6 to ;; indicate a combined status (define_insn "altivec_vcmpequ_p" - [(set (reg:CC CR6_REGNO) - (unspec:CC [(eq:CC (match_operand:VI2 1 "register_operand" "v") - (match_operand:VI2 2 "register_operand" "v"))] - UNSPEC_PREDICATE)) + [(set (reg:CCLTEQ CR6_REGNO) + (unspec:CCLTEQ [(eq:CC (match_operand:VI2 1 "register_operand" "v") + (match_operand:VI2 2 "register_operand" "v"))] + UNSPEC_PREDICATE)) (set (match_operand:VI2 0 "register_operand" "=v") (eq:VI2 (match_dup 1) (match_dup 2)))] @@ -2662,10 +2662,11 @@ (define_insn "altivec_vcmpequ_p" [(set_attr "type" "veccmpfx")]) (define_insn "altivec_vcmpequt_p" - [(set (reg:CC CR6_REGNO) - (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v") - (match_operand:V1TI 2 "altivec_register_operand" "v"))] - UNSPEC_PREDICATE)) + [(set (reg:CCLTEQ CR6_REGNO) + (unspec:CCLTEQ + [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v") + (match_operand:V1TI 2 "altivec_register_operand" "v"))] + UNSPEC_PREDICATE)) (set (match_operand:V1TI 0 "altivec_register_operand" "=v") (eq:V1TI (match_dup 1) (match_dup 2)))] @@ -2686,10 +2687,10 @@ (define_expand "altivec_vcmpne_" }) (define_insn "*altivec_vcmpgts_p" - [(set (reg:CC CR6_REGNO) - (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v") - (match_operand:VI2 2 "register_operand" "v"))] - UNSPEC_PREDICATE)) + [(set (reg:CCLTEQ CR6_REGNO) + (unspec:CCLTEQ [(gt:CC (match_operand:VI2 1 "register_operand" "v") + (match_operand:VI2 2 "register_operand" "v"))] + UNSPEC_PREDICATE)) (set (match_operand:VI2 0 "register_operand" "=v") (gt:VI2 (match_dup 1) (match_dup 2)))] @@ -2698,10 +2699,10 @@ (define_insn "*altivec_vcmpgts_p" [(set_attr "type" "veccmpfx")]) (define_insn "*altivec_vcmpgtst_p" - [(set (reg:CC CR6_REGNO) - (unspec:CC [(gt:CC
[PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]
Hi, It's the first patch of a series of patches optimizing CC modes on rs6000. bcd insns set all four bits of a CR field. But it has different single bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used to indict overflow or invalid number. It's not a bit for unordered test. So the "le" test should be reversed to "gt" not "ungt". The "ge" test should be reversed to "lt" not "unlt". That's the root cause of PR100736 and PR114732. This patch fixes the issue by adding a new type of CC mode - CCBCD for all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will be merged to a uniform pattern which is for all CC modes in sequential patch. The rtl code "unordered" is still used for testing overflow or invalid number. IMHO, the "unordered" on a CC mode can be considered as testing the forth bit of a CR field setting or not. The "eq" on a CC mode can be considered as testing the third bit setting or not. Thus we avoid creating lots of unspecs for the CR bit testing. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add a new type of CC mode - CCBCD for bcd insns gcc/ PR target/100736 PR target/114732 * config/rs6000/altivec.md (bcd_): Replace CCFP with CCBCD. (*bcd_test_): Likewise. (*bcd_test2_): Likewise. (bcd__): Likewise. (*bcdinvalid_): Likewise. (bcdinvalid_): Likewise. (bcdshift_v16qi): Likewise. (bcdmul10_v16qi): Likewise. (bcddiv10_v16qi): Likewise. (peephole for bcd_add/sub): Likewise. * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD and its supported comparison codes. * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD. * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD assertion. * config/rs6000/rs6000.md (CC_any): Add CCBCD. (ccbcd_rev): New code iterator. (*_cc): New insn and split pattern for CCBCD reverse compare. gcc/testsuite/ PR target/100736 PR target/114732 * gcc.target/powerpc/pr100736.c: New. * gcc.target/powerpc/pr114732.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index bb20441c096..9fa8cf89f61 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -4443,7 +4443,7 @@ (define_insn "bcd_" (match_operand:VBCD 2 "register_operand" "v") (match_operand:QI 3 "const_0_to_1_operand" "n")] UNSPEC_BCD_ADD_SUB)) - (clobber (reg:CCFP CR6_REGNO))] + (clobber (reg:CCBCD CR6_REGNO))] "TARGET_P8_VECTOR" "bcd. %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) @@ -4454,8 +4454,8 @@ (define_insn "bcd_" ;; probably should be one that can go in the VMX (Altivec) registers, so we ;; can't use DDmode or DFmode. (define_insn "*bcd_test_" - [(set (reg:CCFP CR6_REGNO) - (compare:CCFP + [(set (reg:CCBCD CR6_REGNO) + (compare:CCBCD (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v") (match_operand:VBCD 2 "register_operand" "v") (match_operand:QI 3 "const_0_to_1_operand" "i")] @@ -4472,8 +4472,8 @@ (define_insn "*bcd_test2_" (match_operand:VBCD 2 "register_operand" "v") (match_operand:QI 3 "const_0_to_1_operand" "i")] UNSPEC_BCD_ADD_SUB)) - (set (reg:CCFP CR6_REGNO) - (compare:CCFP + (set (reg:CCBCD CR6_REGNO) + (compare:CCBCD (unspec:V2DF [(match_dup 1) (match_dup 2) (match_dup 3)] @@ -4566,8 +4566,8 @@ (define_insn "vclrrb" [(set_attr "type" "vecsimple")]) (define_expand "bcd__" - [(parallel [(set (reg:CCFP CR6_REGNO) - (compare:CCFP + [(parallel [(set (reg:CCBCD CR6_REGNO) + (compare:CCBCD (unspec:V2DF [(match_operand:VBCD 1 "register_operand") (match_operand:VBCD 2 "register_operand") (match_operand:QI 3 "const_0_to_1_operand")] @@ -4575,7 +4575,7 @@ (define_expand "bcd__" (match_dup 4))) (clobber (match_scratch:VBCD 5))]) (set (match_operand:SI 0 "register_operand") - (BCD_TEST:SI (reg:CCFP CR6_REGNO) + (BCD_TEST:SI (reg:CCBCD CR6_REGNO) (const_int 0)))] "TARGET_P8_VECTOR" { @@ -4583,8 +4583,8 @@ (define_expand "bcd__" }) (define_insn "*bcdinvalid_" - [(set (reg:CCFP CR6_REGNO) - (compare:CCFP + [(set (reg:CCBCD CR6_REGNO) + (compare:CCBCD (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")] UNSPEC_BCDSUB) (match_operand:V2DF 2 "zero_constant" "j"))) @@ -4594,14 +4594,14 @@ (define_insn "*bcdinvalid_" [(set_attr
Re: [PATCH] Value range: Add range op for __builtin_isfinite
Yes, it's my typo. Thanks. Gui Haochen 在 2024/4/23 17:10, rep.dot@gmail.com 写道: > On 12 April 2024 07:30:10 CEST, HAO CHEN GUI wrote: > > >> >> >> patch.diff >> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >> index 9de130b4022..99c511728d3 100644 >> --- a/gcc/gimple-range-op.cc >> +++ b/gcc/gimple-range-op.cc >> @@ -1192,6 +1192,56 @@ public: >> } >> } op_cfn_isinf; >> >> +//Implement range operator for CFN_BUILT_IN_ISFINITE >> +class cnf_isfinite : public range_operator >> +{ > > > s/cnf/cfn/g > I guess. > thanks
[PATCH, rs6000] Use bcdsub. instead of bcdadd. for bcd invalid number checking
Hi, This patch replace bcdadd. with bcdsub. for bcd invalid number checking. bcdadd on two same numbers might cause overflow which also set overflow/invalid bit so that we can't distinguish it's invalid or overflow. The bcdsub doesn't have the problem as subtracting on two same number never causes overflow. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Use bcdsub. instead of bcdadd. for bcd invalid number checking bcdadd. might causes overflow which also set the overflow/invalid bit. bcdsub. doesn't have the issue when do subtracting on two same bcd number. gcc/ * config/rs6000/altivec.md (*bcdinvalid_): Replace bcdadd with bcdsub. (bcdinvalid_): Likewise. gcc/testsuite/ * gcc.target/powerpc/bcd-4.c: Adjust the number of bcdadd and bcdsub. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 4d4c94ff0a0..bb20441c096 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -4586,18 +4586,18 @@ (define_insn "*bcdinvalid_" [(set (reg:CCFP CR6_REGNO) (compare:CCFP (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")] - UNSPEC_BCDADD) + UNSPEC_BCDSUB) (match_operand:V2DF 2 "zero_constant" "j"))) (clobber (match_scratch:VBCD 0 "=v"))] "TARGET_P8_VECTOR" - "bcdadd. %0,%1,%1,0" + "bcdsub. %0,%1,%1,0" [(set_attr "type" "vecsimple")]) (define_expand "bcdinvalid_" [(parallel [(set (reg:CCFP CR6_REGNO) (compare:CCFP (unspec:V2DF [(match_operand:VBCD 1 "register_operand")] -UNSPEC_BCDADD) +UNSPEC_BCDSUB) (match_dup 2))) (clobber (match_scratch:VBCD 3))]) (set (match_operand:SI 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-4.c b/gcc/testsuite/gcc.target/powerpc/bcd-4.c index 2c7041c4d32..6d2c59ef792 100644 --- a/gcc/testsuite/gcc.target/powerpc/bcd-4.c +++ b/gcc/testsuite/gcc.target/powerpc/bcd-4.c @@ -2,8 +2,8 @@ /* { dg-require-effective-target int128 } */ /* { dg-require-effective-target p9vector_hw } */ /* { dg-options "-mdejagnu-cpu=power9 -O2 -save-temps" } */ -/* { dg-final { scan-assembler-times {\mbcdadd\M} 7 } } */ -/* { dg-final { scan-assembler-times {\mbcdsub\M} 18 } } */ +/* { dg-final { scan-assembler-times {\mbcdadd\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mbcdsub\M} 20 } } */ /* { dg-final { scan-assembler-times {\mbcds\M} 2 } } */ /* { dg-final { scan-assembler-times {\mdenbcdq\M} 1 } } */
[PATCH, rs6000] Fix test case bcd4.c
Hi, This patch fixes loss of return statement in maxbcd of bcd-4.c. Without return statement, it returns an invalid bcd number and make the test noneffective. The patch also enables test to run on Power9 and Big Endian, as all bcd instructions are supported from Power9. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Fix bcd test case gcc/testsuite/ * gcc.target/powerpc/bcd-4.c: Enable the case to be tested on Power9. Enable the case to be run on big endian. Fix function maxbcd and other misc. problems. patch.diff diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-4.c b/gcc/testsuite/gcc.target/powerpc/bcd-4.c index 2c8554dfe82..8c0bac2720f 100644 --- a/gcc/testsuite/gcc.target/powerpc/bcd-4.c +++ b/gcc/testsuite/gcc.target/powerpc/bcd-4.c @@ -1,7 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target int128 } */ -/* { dg-require-effective-target power10_hw } */ -/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */ +/* { dg-require-effective-target p9vector_hw } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2 -save-temps" } */ /* { dg-final { scan-assembler-times {\mbcdadd\M} 7 } } */ /* { dg-final { scan-assembler-times {\mbcdsub\M} 18 } } */ /* { dg-final { scan-assembler-times {\mbcds\M} 2 } } */ @@ -44,10 +44,20 @@ vector unsigned char maxbcd(unsigned int sign) vector unsigned char result; int i; +#ifdef _BIG_ENDIAN + for (i = 0; i < 15; i++) +#else for (i = 15; i > 0; i--) +#endif result[i] = 0x99; - result[0] = sign << 4 | 0x9; +#ifdef _BIG_ENDIAN + result[15] = 0x90 | sign; +#else + result[0] = 0x90 | sign; +#endif + + return result; } vector unsigned char num2bcd(long int a, int encoding) @@ -70,9 +80,17 @@ vector unsigned char num2bcd(long int a, int encoding) hi = a % 10; // 1st digit a = a / 10; +#ifdef _BIG_ENDIAN + result[15] = hi << 4| sign; +#else result[0] = hi << 4| sign; +#endif +#ifdef _BIG_ENDIAN + for (i = 14; i >= 0; i--) +#else for (i = 1; i < 16; i++) +#endif { low = a % 10; a = a / 10; @@ -117,7 +135,11 @@ int main () } /* result should be positive */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_POS0) +#else if ((result[0] & 0xF) != BCD_POS0) +#endif #if DEBUG printf("ERROR: __builtin_bcdadd sign of result is %d. Does not match " "expected_result = %d\n", @@ -150,7 +172,11 @@ int main () } /* Result should be positive, alternate encoding. */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_POS1) +#else if ((result[0] & 0xF) != BCD_POS1) +#endif #if DEBUG printf("ERROR: __builtin_bcdadd sign of result is %d. Does not " "match expected_result = %d\n", @@ -183,7 +209,11 @@ int main () } /* result should be negative */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_NEG) +#else if ((result[0] & 0xF) != BCD_NEG) +#endif #if DEBUG printf("ERROR: __builtin_bcdadd sign, neg of result is %d. Does not " "match expected_result = %d\n", @@ -217,7 +247,11 @@ int main () } /* result should be positive, alt encoding */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_NEG) +#else if ((result[0] & 0xF) != BCD_NEG) +#endif #if DEBUG printf("ERROR: __builtin_bcdadd sign, of result is %d. Does not match " "expected_result = %d\n", @@ -250,7 +284,11 @@ int main () } /* result should be positive */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_POS1) +#else if ((result[0] & 0xF) != BCD_POS1) +#endif #if DEBUG printf("ERROR: __builtin_bcdsub sign, result is %d. Does not match " "expected_result = %d\n", @@ -283,7 +321,7 @@ int main () abort(); #endif - a = maxbcd(BCD_NEG); + a = maxbcd(BCD_POS0); b = maxbcd(BCD_NEG); if (__builtin_bcdsub_ofl (a, b, 0) == 0) @@ -462,8 +500,12 @@ int main () } /* result should be positive */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_POS0) +#else if ((result[0] & 0xF) != BCD_POS0) -#if 0 +#endif +#if DEBUG printf("ERROR: __builtin_bcdmul10 sign, result is %d. Does not match " "expected_result = %d\n", result[0] & 0xF, BCD_POS1); @@ -492,7 +534,11 @@ int main () } /* result should be positive */ +#ifdef _BIG_ENDIAN + if ((result[15] & 0xF) != BCD_POS0) +#else if ((result[0] & 0xF) != BCD_POS0) +#endif #if DEBUG printf("ERROR: __builtin_bcddiv10 sign, result is %d. Does not match " "expected_result = %d\n",
[PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]
Hi, This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test data class instructions. This patch relies on former patch which adds optab_isnormal. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isnormal for SFmode, DFmode and TFmode gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal2): New expand for SFmode and DFmode. (isnormal2): New expand for TFmode. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index a6c72ae33b0..d1c9ef5447c 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5357,6 +5357,30 @@ (define_expand "isfinite2" DONE; }) +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isnormal2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c new file mode 100644 index 000..a0d848497b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */ + +int test1 (double x) +{ + return __builtin_isnormal (x); +} + +int test2 (float x) +{ + return __builtin_isnormal (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c new file mode 100644 index 000..d591073d281 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isnormal (x); +} + + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH] Optab: add isnormal_optab for __builtin_isnormal
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for next stage-1? Thanks Gui Haochen ChangeLog optab: Add isnormal_optab for isnormal builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab for isnormal builtin. * optabs.def (isnormal_optab): New. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 3174f52ebe8..defb39de95f 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl) case BUILT_IN_ISFINITE: builtin_optab = isfinite_optab; break; case BUILT_IN_ISNORMAL: + builtin_optab = isnormal_optab; break; CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/optabs.def b/gcc/optabs.def index dcd77315c2a..3c401fc0b4c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") OPTAB_D (isfinite_optab, "isfinite$a2") +OPTAB_D (isnormal_optab, "isnormal$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double [PR97786]
Hi, This patch folds builtin_isfinite on IBM long double to builtin_isfinite on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html implemented the DFmode isfinite_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double For IBM long double, INF and NAN is encoded in the high-order double value only. So the builtin_isfinite on IBM long double can be folded to builtin_isfinite on double type. As former patch implemented DFmode isfinite_optab, this patch converts builtin_isfinite on IBM long double to builtin_isfinite on double type if the DFmode isfinite_optab exists. gcc/ PR target/97786 * builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double isfinite call to double isfinite call when DFmode isfinite_optab exists. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-6.c: New test. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 5262aa01660..3174f52ebe8 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -9605,6 +9605,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree fndecl, tree arg) type = double_type_node; mode = DFmode; arg = fold_build1_loc (loc, NOP_EXPR, type, arg); + tree const isfinite_fn = builtin_decl_explicit (BUILT_IN_ISFINITE); + if (interclass_mathfn_icode (arg, isfinite_fn) != CODE_FOR_nothing) + { + result = build_call_expr (isfinite_fn, 1, arg); + return result; + } } get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false); real_from_string (, buf); diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-6.c b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c new file mode 100644 index 000..c86c765651d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler {\mxststdcdp\M} } } */
[PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode [PR97786]
Hi, This patch implemented optab_finite for SF/DF/TFmode by rs6000 test data class instructions. This patch relies on former patch which adds optab_finite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isfinite for SFmode, DFmode and TFmode gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite2): New expand for SFmode and DFmode. (isfinite2): New expand for TFmode. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f0cc02f7e7b..a6c72ae33b0 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5333,6 +5333,31 @@ (define_expand "isinf2" DONE; }) +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + +(define_expand "isfinite2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70))); + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); + DONE; +}) + + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c new file mode 100644 index 000..55b5ff507b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */ + +int test1 (double x) +{ + return __builtin_isfinite (x); +} + +int test2 (float x) +{ + return __builtin_isfinite (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c new file mode 100644 index 000..5b5a89681fc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isfinite (x); +} + + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */
[PATCH] Value range: Add range op for __builtin_isfinite
Hi, The former patch adds isfinite optab for __builtin_isfinite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Thus the builtin might not be folded at front end. The range op for isfinite is needed for value range analysis. This patch adds them. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 9de130b4022..99c511728d3 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1192,6 +1192,56 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cnf_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +return false; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p ()) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +return false; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1288,6 +1338,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
[PATCH] Optab: add isfinite_optab for __builtin_isfinite
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for next stage-1? Thanks Gui Haochen ChangeLog optab: Add isfinite_optab for isfinite builtin gcc/ * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab for isfinite builtin. * optabs.def (isfinite_optab): New. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index d2786f207b8..5262aa01660 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) errno_set = true; builtin_optab = ilogb_optab; break; CASE_FLT_FN (BUILT_IN_ISINF): builtin_optab = isinf_optab; break; -case BUILT_IN_ISNORMAL: case BUILT_IN_ISFINITE: + builtin_optab = isfinite_optab; break; +case BUILT_IN_ISNORMAL: CASE_FLT_FN (BUILT_IN_FINITE): case BUILT_IN_FINITED32: case BUILT_IN_FINITED64: diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..dcd77315c2a 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3") OPTAB_D (hypot_optab, "hypot$a3") OPTAB_D (ilogb_optab, "ilogb$a2") OPTAB_D (isinf_optab, "isinf$a2") +OPTAB_D (isfinite_optab, "isfinite$a2") OPTAB_D (issignaling_optab, "issignaling$a2") OPTAB_D (ldexp_optab, "ldexp$a3") OPTAB_D (log10_optab, "log10$a2")
[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]
Hi, This patch folds builtin_isinf on IBM long double to builtin_isinf on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html implemented the DFmode isinf_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double For IBM long double, Inf is encoded in the high-order double value only. So the builtin_isinf on IBM long double can be folded to builtin_isinf on double type. As former patch implemented DFmode isinf_optab, this patch converts builtin_isinf on IBM long double to builtin_isinf on double type if the DFmode isinf_optab exists. gcc/ PR target/97786 * builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double isinf call to double isinf call when DFmode isinf_optab exists. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-3.c: New test. patch.diff diff --git a/gcc/builtins.cc b/gcc/builtins.cc index eda8bea9c4b..d2786f207b8 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -9574,6 +9574,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree fndecl, tree arg) type = double_type_node; mode = DFmode; arg = fold_build1_loc (loc, NOP_EXPR, type, arg); + tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF); + if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing) + { + result = build_call_expr (isinf_fn, 1, arg); + return result; + } } get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false); real_from_string (, buf); diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c new file mode 100644 index 000..1c816921e1a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */
[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]
Hi, This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test data class instructions. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFmode, DFmode and TFmode gcc/ PR target/97786 * config/rs6000/vsx.md (isinf2): New expand for SFmode and DFmode. (isinf2): New expand for TFmode. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f135fa079bd..f0cc02f7e7b 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5313,6 +5313,26 @@ (define_expand "xststdcp" operands[4] = CONST0_RTX (SImode); }) +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:SFDF 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + +(define_expand "isinf2" + [(use (match_operand:SI 0 "gpc_reg_operand")) + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] + "TARGET_HARD_FLOAT + && TARGET_P9_VECTOR" +{ + emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30))); + DONE; +}) + ;; The VSX Scalar Test Negative Quad-Precision (define_expand "xststdcnegqp_" [(set (match_dup 2) diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c new file mode 100644 index 000..1b1e6d642de --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */ + +int test1 (double x) +{ + return __builtin_isinf (x); +} + +int test2 (float x) +{ + return __builtin_isinf (x); +} + +int test3 (float x) +{ + return __builtin_isinff (x); +} + +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c new file mode 100644 index 000..de7f2d67c4b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-require-effective-target ppc_float128_sw } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble -Wno-psabi" } */ + +int test1 (long double x) +{ + return __builtin_isinf (x); +} + +int test2 (long double x) +{ + return __builtin_isinfl (x); +} + +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */ +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */
[PATCH] Value Range: Add range op for builtin isinf
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem by adding range op for builtin isinf. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isinf The builtin isinf is not folded at front end if the corresponding optab exists. So the range op fro isinf is needed for value range analysis. This patch adds range op for builtin isinf. gcc/ * gimple-range-op.cc (class cfn_isinf): New. (op_cfn_isinf): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_FLT_FN (BUILT_IN_ISINF). gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index a98f7db62a7..9de130b4022 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1140,6 +1140,57 @@ private: bool m_is_pos; } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); +// Implement range operator for CFN_BUILT_IN_ISINF +class cnf_isinf : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange , tree type, const frange , + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isinf ()) + { + r.set_nonzero (type); + return true; + } + +if (op1.known_isnan () + || (!real_isinf (_bound ()) + && !real_isinf (_bound ( + { + r.set_zero (type); + return true; + } + +return false; + } + virtual bool op1_range (frange , tree type, const irange , + const frange &, relation_trio) const override + { +if (lhs.zero_p ()) + { + nan_state nan (true); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +if (!range_includes_zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. + // Set range to [-INF,+INF] + r.set_varying (type); + r.clear_nan (); + return true; + } + +return false; + } +} op_cfn_isinf; // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator @@ -1232,6 +1283,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = _cfn_signbit; break; +CASE_FLT_FN (BUILT_IN_ISINF): + m_op1 = gimple_call_arg (call, 0); + m_operator = _cfn_isinf; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c new file mode 100644 index 000..468f1bcf5c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void +test1 (double x) +{ + if (x > __DBL_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test2 (float x) +{ + if (x > __FLT_MAX__ && !__builtin_isinf (x)) +link_error (); + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) +link_error (); +} + +void +test3 (double x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) +link_error (); +} + +void +test4 (float x) +{ + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__) +link_error (); + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__) +link_error (); +} + +/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */ +
Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/11 13:41, HAO CHEN GUI 写道: > Hi, > This patch tries to fix the problem when a canonical form doesn't benefit > on a specific target. The const operand of AND is and with the nonzero > bits of another operand in combine pass. It's a canonical form, but it's no > benefits for the target which has rotate and mask insns. As the mask is > truncated, it can't match the insn conditions which it originally matches. > For example, the following insn condition checks the sum of two AND masks. > When one of the mask is truncated, the condition breaks. > > (define_insn "*rotlsi3_insert_5" > [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") > (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") > (match_operand:SI 2 "const_int_operand" "n,n")) > (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") > (match_operand:SI 4 "const_int_operand" "n,n"] > "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" > ... > > This patch tries to fix the problem by comparing the rtx cost. If another > operand (varop) is not changed and rtx cost with new mask is not less than > the original one, the mask is restored to original one. > > I'm not sure if comparison of rtx cost here is proper. The outer code is > unknown and I suppose it as "SET". Also the rtx cost might not be accurate. > From my understanding, the canonical forms should always benefit as it can't > be undo in combine pass. Do we have a perfect solution for this kind of > issues? Looking forward for your advice. > > Another similar issues for canonical forms. Whether the widen mode for > lshiftrt is always good? > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html > > Thanks > Gui Haochen > > ChangeLog > Combine: Don't truncate const operand of AND if it's no benefits > > In combine pass, the canonical form is to turn off all bits in the constant > that are know to already be zero for AND. > > /* Turn off all bits in the constant that are known to already be zero. > Thus, if the AND isn't needed at all, we will have CONSTOP == > NONZERO_BITS > which is tested below. */ > > constop &= nonzero; > > But it doesn't benefit when the target has rotate and mask insert insns. > The AND mask is truncated and lost its information. Thus it can't match > the insn conditions. For example, the following insn condition checks > the sum of two AND masks. > > (define_insn "*rotlsi3_insert_5" > [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") > (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") > (match_operand:SI 2 "const_int_operand" "n,n")) > (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") > (match_operand:SI 4 "const_int_operand" "n,n"] > "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) >&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 >&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" > ... > > This patch restores the const operand of AND if the another operand is > not optimized and the truncated const operand doesn't save the rtx cost. > > gcc/ > * combine.cc (simplify_and_const_int_1): Restore the const operand > of AND if varop is not optimized and the rtx cost of the new const > operand is not reduced. > > gcc/testsuite/ > * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and > adjust the number of rotate and mask insns. > * gcc.target/powerpc/rlwimi-1.c: Likewise. > * gcc.target/powerpc/rlwimi-2.c: Likewise. > > patch.diff > diff --git a/gcc/combine.cc b/gcc/combine.cc > index a4479f8d836..16ff09ea854 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx > varop, >if (constop == nonzero) > return varop; > > - if (varop == orig_varop && constop == orig_constop) > -return NULL_RTX; > + if (varop == orig_varop) > +{ > + if (constop == orig_constop) > + return NULL_RTX; > + else > + { > + rtx tmp = simplify_
[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, This patch tries to fix the problem when a canonical form doesn't benefit on a specific target. The const operand of AND is and with the nonzero bits of another operand in combine pass. It's a canonical form, but it's no benefits for the target which has rotate and mask insns. As the mask is truncated, it can't match the insn conditions which it originally matches. For example, the following insn condition checks the sum of two AND masks. When one of the mask is truncated, the condition breaks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch tries to fix the problem by comparing the rtx cost. If another operand (varop) is not changed and rtx cost with new mask is not less than the original one, the mask is restored to original one. I'm not sure if comparison of rtx cost here is proper. The outer code is unknown and I suppose it as "SET". Also the rtx cost might not be accurate. >From my understanding, the canonical forms should always benefit as it can't be undo in combine pass. Do we have a perfect solution for this kind of issues? Looking forward for your advice. Another similar issues for canonical forms. Whether the widen mode for lshiftrt is always good? https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html Thanks Gui Haochen ChangeLog Combine: Don't truncate const operand of AND if it's no benefits In combine pass, the canonical form is to turn off all bits in the constant that are know to already be zero for AND. /* Turn off all bits in the constant that are known to already be zero. Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS which is tested below. */ constop &= nonzero; But it doesn't benefit when the target has rotate and mask insert insns. The AND mask is truncated and lost its information. Thus it can't match the insn conditions. For example, the following insn condition checks the sum of two AND masks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch restores the const operand of AND if the another operand is not optimized and the truncated const operand doesn't save the rtx cost. gcc/ * combine.cc (simplify_and_const_int_1): Restore the const operand of AND if varop is not optimized and the rtx cost of the new const operand is not reduced. gcc/testsuite/ * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and adjust the number of rotate and mask insns. * gcc.target/powerpc/rlwimi-1.c: Likewise. * gcc.target/powerpc/rlwimi-2.c: Likewise. patch.diff diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..16ff09ea854 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop, if (constop == nonzero) return varop; - if (varop == orig_varop && constop == orig_constop) -return NULL_RTX; + if (varop == orig_varop) +{ + if (constop == orig_constop) + return NULL_RTX; + else + { + rtx tmp = simplify_gen_binary (AND, mode, varop, +gen_int_mode (constop, mode)); + rtx orig = simplify_gen_binary (AND, mode, varop, + gen_int_mode (orig_constop, mode)); + if (set_src_cost (tmp, mode, optimize_this_for_speed_p) + < set_src_cost (orig, mode, optimize_this_for_speed_p)) + return tmp; + else + return NULL_RTX; + } +} /* Otherwise, return an AND. */ return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, mode)); diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c index 961be199901..d9dd4419f1d 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c @@ -2,15 +2,15 @@ /* { dg-options "-O2" } */ /* { dg-final { scan-assembler-times
[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7: r125:SI=r122:DI#0 0>>0x1f REG_DEAD r122:DI Failed to match this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x]))) This conversion blocks the further combination which combines to a SImode rotate and mask insert insn. Trying 9, 7 -> 10: 9: r127:SI=r130:DI#0&0xfffe REG_DEAD r130:DI 7: r125:SI#0=r129:DI 0>>0x1f&0x REG_DEAD r129:DI 10: r124:SI=r127:SI|r125:SI REG_DEAD r125:SI REG_DEAD r127:SI Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1])) 0))) Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x])) 0))) The root cause of the issue is if it's necessary to do the widen mode for lshiftrt when the target already has shiftrt for narrow mode and its cost is not high. My former patch tried to fix the problem but not accepted yet. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html As it's stage 4 now, I drafted this patch to fix the regression by adding subreg patterns of SImode rotate and mask insert. It actually does reversed things and narrow the mode for lshiftrt so that it can matches the SImode rotate and mask insert. The case "rlwimi-2.c" is fixed and restore the corresponding number of insns to original ones. Compared with last version, the main change is to remove changes for a testcase which was already fixed in another patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add subreg patterns for SImode rotate and mask insert In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an AND. The new pattern matches rotate and mask insert on rs6000. Thus it blocks the pattern to be further combined to a SImode rotate and mask insert pattern. This patch fixes the problem by adding two subreg pattern for SImode rotate and mask insert patterns. gcc/ PR target/93738 * config/rs6000/rs6000.md (*rotlsi3_insert_subreg): New. (*rotlsi3_insert_4_subreg): New. gcc/testsuite/ PR target/93738 * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit rotate instructions. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..996d0740faf 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert" ; difference between rlwimi and rldimi. We also might want dot forms, ; but not for rlwimi on POWER4 and similar processors. +; Subreg pattern of insn "*rotlsi3_insert" +(define_insn_and_split "*rotlsi3_insert_subreg" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI +(match_operator:SI 8 "lowpart_subreg_operator" + [(and:DI (match_operator:DI 4 "rotate_mask_operator" + [(match_operand:DI 1 "gpc_reg_operand" "r") +(match_operand:SI 2 "const_int_operand" "n")]) + (match_operand:DI 3 "const_int_operand" "n"))]) +(match_operand:SI 5 "const_int_operand" "n")) + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") + (match_operand:SI 7 "const_int_operand" "n"] + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) + && GET_CODE (operands[4]) == LSHIFTRT + && INTVAL (operands[3]) == 0x + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) +(match_dup 2)) + (match_dup 5)) + (and:SI (match_dup 6) + (match_dup 7] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset); +} + [(set_attr "type" "insert")]) + (define_insn "*rotl3_insert_2" [(set
[PATCHv2] fwprop: Avoid volatile defines to be propagated
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. The volatile asm operand might be executed for multiple times if the define insn isn't eliminated after propagation. Now set_src_cost comparison might reject such propagation. But it has the chance to be taken after replacing set_src_cost with insn cost. Actually I found the problem in testing my patch which replacing set_src_cost with insn_cost in fwprop pass. Compared to the last version, the check volatile_insn_p is replaced with volatile_refs_p in order to check volatile memory reference also. https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646482.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog fwprop: Avoid volatile defines to be propagated The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f) which introduces an exception for propagation on single set insn. The propagation which might not be profitable (checked by profitable_p) is still allowed to be propagated to single set insn. It has a potential problem that a volatile operand might be propagated to a single set insn. If the define insn is not eliminated after propagation, the volatile operand will be executed for multiple times. This patch fixes the problem by skipping volatile set source rtx in propagation. gcc/ * fwprop.cc (forward_propagate_into): Return false for volatile set source rtx. gcc/testsuite/ * gcc.target/powerpc/fwprop-1.c: New. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 7872609b336..cb6fd6700ca 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) rtx dest = SET_DEST (def_set); rtx src = SET_SRC (def_set); + if (volatile_refs_p (src)) +return false; /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c new file mode 100644 index 000..07b207f980c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */ +/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */ + +/* Verify that volatile asm operands doesn't be propagated. */ +long long foo () +{ + long long res; + __asm__ __volatile__( +"" + : "=r" (res) + : + : "memory"); + return res; +}
Re: [PATCH] fwprop: Avoid volatile defines to be propagated
Hi Jeff, 在 2024/3/4 11:37, Jeff Law 写道: > Can the same thing happen with a volatile memory load? I don't think that > will be caught by the volatile_insn_p check. Yes, I think so. If the define rtx contains volatile memory references, it may hit the same problem. We may use volatile_refs_p instead of volatile_insn_p? Thanks Gui Haochen
Re: [PATCH] fwprop: Avoid volatile defines to be propagated
Hi Jeff, Thanks for your comments. 在 2024/3/4 6:02, Jeff Law 写道: > Why specifically are you worried here? Propagation of a volatile shouldn't > in and of itself cause a problem. We're not changing the number of volatile > accesses or anything like that -- we're just moving them around a bit. If the volatile asm operand is in a parallel set, it can't be eliminated after the propagation. So the define insn and use insn will execute the volatile asm block twice. That's the problem. Here is a real case from sanitizer_linux.cpp. The insn 62 has a volatile asm operands and it is propagated into insn 60. After propagation both insn 60 and 62 has the volatile asm operand. Thus asm block will be executed for twice. It causes sanitizer behaves abnormally in my test. propagating insn 62 into insn 60, replacing: (set (reg/v:DI 119 [ res ]) (reg:DI 133 [ res ])) successfully matched this instruction: (set (reg/v:DI 119 [ res ]) (asm_operands/v:DI ("mr 28, %5 mr 27, %8 mr 3, %7 mr 5, %9 mr 6, %10 mr 7, %11 li 0, %3 sc cmpdi cr1, 3, 0 crandc cr1*4+eq, cr1*4+eq, cr0*4+so bne- cr1, 1f li29, 0 stdu 29, -8(1) stdu 1, -%12(1) std 2, %13(1) mr12, 28 mtctr 12 mr3, 27 bctrl ld2, %13(1) li 0, %4 sc 1: mr %0, 3 ") ("=r") 0 [ (reg:SI 134) (const_int 22 [0x16]) (const_int 120 [0x78]) (const_int 1 [0x1]) (reg/v:DI 3 3 [ __fn ]) (reg/v:DI 4 4 [ __cstack ]) (reg/v:SI 5 5 [ __flags ]) (reg/v:DI 6 6 [ __arg ]) (reg/v:DI 7 7 [ __ptidptr ]) (reg/v:DI 8 8 [ __newtls ]) (reg/v:DI 9 9 [ __ctidptr ]) (const_int 32 [0x20]) (const_int 24 [0x18]) [ (asm_input:SI ("0") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:DI ("r") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) ] [] /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)) rescanning insn with uid = 60. updating insn 60 in-place (insn 62 61 60 6 (parallel [ (set (reg:DI 133 [ res ]) (asm_operands/v:DI ("mr 28, %5 mr 27, %8 mr 3, %7 mr 5, %9 mr 6, %10 mr 7, %11 li 0, %3 sc cmpdi cr1, 3, 0 crandc cr1*4+eq, cr1*4+eq, cr0*4+so bne- cr1, 1f li29, 0 stdu 29, -8(1) stdu 1, -%12(1) std 2, %13(1) mr12, 28 mtctr 12 mr3, 27 bctrl ld2, %13(1) li 0, %4 sc 1: mr %0, 3 ") ("=r") 0 [ (reg:SI 134) (const_int 22 [0x16]) (const_int 120 [0x78]) (const_int 1 [0x1]) (reg/v:DI 3 3 [ __fn ]) (reg/v:DI 4 4 [ __cstack ]) (reg/v:SI 5 5 [ __flags ]) (reg/v:DI 6 6 [ __arg ]) (reg/v:DI 7 7 [ __ptidptr ]) (reg/v:DI 8 8 [ __newtls ]) (reg/v:DI 9 9 [ __ctidptr ]) (const_int 32 [0x20]) (const_int 24 [0x18]) ] [ (asm_input:SI ("0") /home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591) (asm_input:SI ("i")
[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI 7: r125:SI=r122:DI#0 0>>0x1f REG_DEAD r122:DI Failed to match this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 125 [ x ]) 0) (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x]))) This conversion blocks the further combination which combines to a SImode rotate and mask insert insn. Trying 9, 7 -> 10: 9: r127:SI=r130:DI#0&0xfffe REG_DEAD r130:DI 7: r125:SI#0=r129:DI 0>>0x1f&0x REG_DEAD r129:DI 10: r124:SI=r127:SI|r125:SI REG_DEAD r125:SI REG_DEAD r127:SI Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (zero_extract:DI (reg:DI 129) (const_int 32 [0x20]) (const_int 1 [0x1])) 0))) Failed to match this instruction: (set (reg:SI 124) (ior:SI (and:SI (subreg:SI (reg:DI 130) 0) (const_int -2 [0xfffe])) (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129) (const_int 31 [0x1f])) (const_int 4294967295 [0x])) 0))) The root cause of the issue is if it's necessary to do the widen mode for lshiftrt when the target already has the narrow mode lshiftrt and its cost is not high. My former patch tried to fix the problem but not accepted yet. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html As it's stage 4 now, I drafted this patch to fix the regression by adding subreg patterns of SImode rotate and mask insert. It actually does reversed things and narrow the mode for lshiftrt so that it can matches the SImode rotate and mask insert. The case "rlwimi-2.c" is fixed and restore the corresponding number of insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm" is replaced with 9 "rldicl" as the sequence of combine is changed. It's not a regression as the total number of insns isn't changed. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000: Add subreg patterns for SImode rotate and mask insert In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an AND. The new pattern matches rotate and mask insert on rs6000. Thus it blocks the pattern to be further combined to a SImode rotate and mask insert pattern. This patch fixes the problem by adding two subreg pattern for SImode rotate and mask insert patterns. gcc/ PR target/93738 * config/rs6000/rs6000.md (*rotlsi3_insert_9): New. (*rotlsi3_insert_8): New. gcc/testsuite/ PR target/93738 * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit rotate instructions. * gcc.target/powerpc/rlwinm-0.c: Likewise. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bc8bc6ab060..b0b40f91e3e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert" ; difference between rlwimi and rldimi. We also might want dot forms, ; but not for rlwimi on POWER4 and similar processors. +; Subreg pattern of insn "*rotlsi3_insert" +(define_insn_and_split "*rotlsi3_insert_9" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") + (ior:SI (and:SI +(match_operator:SI 8 "lowpart_subreg_operator" + [(and:DI (match_operator:DI 4 "rotate_mask_operator" + [(match_operand:DI 1 "gpc_reg_operand" "r") +(match_operand:SI 2 "const_int_operand" "n")]) + (match_operand:DI 3 "const_int_operand" "n"))]) +(match_operand:SI 5 "const_int_operand" "n")) + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0") + (match_operand:SI 7 "const_int_operand" "n"] + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode) + && GET_CODE (operands[4]) == LSHIFTRT + && INTVAL (operands[3]) == 0x + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0" + "#" + "&& 1" + [(set (match_dup 0) + (ior:SI (and:SI (lshiftrt:SI (match_dup 9) +(match_dup 2)) + (match_dup 5)) + (and:SI (match_dup 6) + (match_dup 7] +{ + int offset = BYTES_BIG_ENDIAN ? 4 : 0; + operands[9] = gen_rtx_SUBREG (SImode,
[PATCH] fwprop: Avoid volatile defines to be propagated
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. It has potential risk as the behavior is wrong. Currently set_src_cost comparison can reject such propagation. But the propagation might be taken after replacing set_src_cost with insn cost. Actually I found the problem in testing my patch which replacing et_src_cost with insn cost for fwprop. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog fwprop: Avoid volatile defines to be propagated The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f) which introduces an exception for propagation on single set insn. The propagation which might not be profitable (checked by profitable_p) is still allowed to be propagated to single set insn. It has a potential problem that a volatile asm operand will try to be propagated to a single set insn. The volatile asm operand is originally banned in profitable_p. This patch fixes the problem by skipping volatile set source in define set finding. gcc/ * fwprop.cc (forward_propagate_into): Return false for volatile set source. gcc/testsuite/ * gcc.target/powerpc/fwprop-1.c: New. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 7872609b336..89dce88b43d 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) rtx dest = SET_DEST (def_set); rtx src = SET_SRC (def_set); + if (volatile_insn_p (src)) +return false; /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c new file mode 100644 index 000..07b207f980c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */ +/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */ + +/* Verify that volatile asm operands doesn't try to be propagated. */ +long long foo () +{ + long long res; + __asm__ __volatile__( +"" + : "=r" (res) + : + : "memory"); + return res; +}
[Patch, rs6000] Enable overlap memory store for block memory clear
Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped clear. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk or next stage 1? Thanks Gui Haochen ChangeLog rs6000: Enable overlap memory store for block memory clear gcc/ * config/rs6000/rs6000-string.cc (widest_fixed_size_mode_for_block_clear): New. (smallest_fixed_size_mode_for_block_clear): New. (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to get the mode for looped memory stores and call smallest_fixed_size_mode_for_block_clear to get the mode for the last overlapped memory store. gcc/testsuite * gcc.target/powerpc/block-clear-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 133e5382af2..c2a6095a586 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -38,6 +38,49 @@ #include "profile-count.h" #include "predict.h" +/* Return the widest mode which mode size is less than or equal to the + size. */ +static fixed_size_mode +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align, + bool unaligned_vsx_ok) +{ + machine_mode mode; + + if (TARGET_ALTIVEC + && size >= 16 + && (align >= 128 + || unaligned_vsx_ok)) +mode = V4SImode; + else if (size >= 8 + && TARGET_POWERPC64 + && (align >= 64 + || !STRICT_ALIGNMENT)) +mode = DImode; + else if (size >= 4 + && (align >= 32 + || !STRICT_ALIGNMENT)) +mode = SImode; + else if (size >= 2 + && (align >= 16 + || !STRICT_ALIGNMENT)) +mode = HImode; + else +mode = QImode; + + return as_a (mode); +} + +/* Return the smallest mode which mode size is smaller than or eqaul to + the size. */ +static fixed_size_mode +smallest_fixed_size_mode_for_block_clear (unsigned int size) +{ + if (size > UNITS_PER_WORD) +return as_a (V4SImode); + + return smallest_int_mode_for_size (size * BITS_PER_UNIT); +} + /* Expand a block clear operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) HOST_WIDE_INT align; HOST_WIDE_INT bytes; int offset; - int clear_bytes; int clear_step; /* If this is not a fixed size move, just call memcpy */ @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, + unaligned_vsx_ok); + offset = 0; + rtx dest; + + do { - machine_mode mode = BLKmode; - rtx dest; + unsigned int size = GET_MODE_SIZE (mode); - if (TARGET_ALTIVEC - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) + while (bytes >= size) { - clear_bytes = 16; - mode = V4SImode; - } - else if (bytes >= 8 && TARGET_POWERPC64 - && (align >= 64 || !STRICT_ALIGNMENT)) - { - clear_bytes = 8; - mode = DImode; - if (offset == 0 && align < 64) - { - rtx addr; + dest = adjust_address (orig_dest, mode, offset); + emit_move_insn (dest, CONST0_RTX (mode)); - /* If the address form is reg+offset with offset not a -multiple of four, reload into reg indirect form here -rather than waiting for reload. This way we get one -reload, not one per store. */ - addr = XEXP (orig_dest, 0); - if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) - && CONST_INT_P (XEXP (addr, 1)) - && (INTVAL (XEXP (addr, 1)) & 3) != 0) - { - addr = copy_addr_to_reg (addr); - orig_dest = replace_equiv_address (orig_dest, addr); - } - } - } - else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT)) - { /* move 4 bytes */ - clear_bytes = 4; - mode = SImode; - } - else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT)) - { /* move 2 bytes */ - clear_bytes = 2; - mode = HImode; - } - else /* move 1 byte at a time */ - { - clear_bytes = 1; - mode = QImode; + offset += size; + bytes -= size; } - dest =
[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the optimization. The patch depends on the first generic patch which uses insn cost in fwprop. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store gcc/ PR target/113325 * config/rs6000/predicates.md (duplicate_easy_altivec_constant): New. * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index ef7d3f214c4..8ab6db630b7 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant" return false; }) +;; Return 1 if it's a duplicated easy_altivec_constant. +(define_predicate "duplicate_easy_altivec_constant" + (and (match_code "const_vector") + (match_test "easy_altivec_constant (op, mode)")) +{ + return const_vec_duplicate_p (op); +}) + ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF. (define_predicate "easy_vector_constant_add_self" (and (match_code "const_vector") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 26fa32829af..98e4be26f64 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_" "stxvd2x %x1,%y0" [(set_attr "type" "vecstore")]) +(define_insn_and_split "vsx_stxvd2x4_le_const_" + [(set (match_operand:VSX_W 0 "memory_operand" "=Z") + (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))] + "!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + && !TARGET_P9_VECTOR" + "#" + "&& 1" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (vec_select:VSX_W + (match_dup 2) + (parallel [(const_int 2) (const_int 3) +(const_int 0) (const_int 1)])))] +{ + operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) +: operands[1]; + +} + [(set_attr "type" "vecstore") + (set_attr "length" "8")]) + (define_insn "*vsx_stxvd2x8_le_V8HI" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (vec_select:V8HI diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..dff68ac0a51 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the kind of destination operand (memory or pseudo) decides the cost and rtx cost can't reflect it. The test case is added in the second target specific patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern gcc/ PR target/113325 * fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with insn_cost. patch.diff diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 0707a234726..b05b2538edc 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -467,20 +467,17 @@ try_fwprop_subst_pattern (obstack_watermark , insn_change _change, redo_changes (0); } - /* ??? In theory, it should be better to use insn costs rather than - set_src_costs here. That would involve replacing this code with - change_is_worthwhile. */ bool ok = recog (attempt, use_change); if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) -if (rtx use_set = single_set (use_rtl)) +if (single_set (use_rtl)) { bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); + auto new_cost = insn_cost (use_rtl, speed); temporarily_undo_changes (0); - auto old_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); + /* Invalidate recog data. */ + INSN_CODE (use_rtl) = -1; + auto old_cost = insn_cost (use_rtl, speed); redo_changes (0); - auto new_cost = set_src_cost (SET_SRC (use_set), - GET_MODE (SET_DEST (use_set)), speed); if (new_cost > old_cost) { if (dump_file)
[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. The test cases will be added in subsequent target specific patch. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog expand: Add const0 move checking for CLEAR_BY_PIECES optabs vec_duplicate handles duplicates of non-constant inputs. The 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move. This patch adds the checking. gcc/ * expr.cc (by_pieces_mode_supported_p): Add const0 move checking for CLEAR_BY_PIECES. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index 34f5ff90a9f..cd960349a53 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op) static bool by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) + enum insn_code icode = optab_handler (mov_optab, mode); + if (icode == CODE_FOR_nothing) return false; - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + if (op == SET_BY_PIECES && VECTOR_MODE_P (mode) && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) return false; + if (op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) +return false; + if (op == COMPARE_BY_PIECES && !can_compare_p (EQ, mode, ccp_jump)) return false;
Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions
Hi Kewen, 在 2024/1/15 14:16, Kewen.Lin 写道: > Considering it's stage 4 now and the impact of this patch, let's defer > this to next stage 1, if possible could you organize the above changes > into patches: > > 1) Refactor expand_compare_loop by splitting into two functions without >any functional changes. > 2) Remove some useless codes like 2, 4, 5. > 3) Some more enhancements like 1, 3, 6. > > ? It would be helpful for the review. Thanks! Thanks for your review comments. I will re-organize it at new stage 1.
[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64
Hi, On P9 "setb" is used to set the result of block compare. So it works with m32 and mpowerpc64. On P8, carry bit is used. So it can't work with m32 and mpowerpc64. This patch enables block compare expand for m32 and mpowerpc64 on P9. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable block compare expand on P9 with m32 and mpowerpc64 gcc/ * config/rs6000/rs6000-string.cc (expand_block_compare): Enable P9 with m32 and mpowerpc64. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64. * gcc.target/powerpc/block-cmp-4.c: Likewise. * gcc.target/powerpc/block-cmp-8.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 018b87f2501..346708071b5 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1677,11 +1677,12 @@ expand_block_compare (rtx operands[]) /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ gcc_assert (TARGET_POPCNTD); - /* This case is complicated to handle because the subtract - with carry instructions do not generate the 64-bit - carry and so we must emit code to calculate it ourselves. - We choose not to implement this yet. */ - if (TARGET_32BIT && TARGET_POWERPC64) + /* For P8, this case is complicated to handle because the subtract + with carry instructions do not generate the 64-bit carry and so + we must emit code to calculate it ourselves. We skip it on P8 + but setb works well on P9. */ + if (TARGET_32BIT && TARGET_POWERPC64 + && !TARGET_P9_MISC) return false; /* Allow this param to shut off all expansion. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c index bcf0cb2ab4f..cd076cf1dce 100644 --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-vsx" } */ +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */ /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ /* Test that it still can do expand for memcmpsi instead of calling library diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c index c86febae68a..9373b53a3a4 100644 --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c @@ -1,5 +1,6 @@ /* { dg-do compile { target be } } */ /* { dg-options "-O2 -mdejagnu-cpu=power7" } */ +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */ /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ /* Test that it does expand for memcmpsi instead of calling library on diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c new file mode 100644 index 000..b470f873973 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c @@ -0,0 +1,8 @@ +/* { dg-do run { target ilp32 } } */ +/* { dg-options "-O2 -m32 -mpowerpc64" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ +/* { dg-timeout-factor 2 } */ + +/* Verify memcmp on m32 mpowerpc64 */ + +#include "../../gcc.dg/memcmp-1.c"
Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi Richard, Thanks so much for your comments. >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 7f777666ba9..4c9b2cbeefc 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) >> } >> >>dest = adjust_address (orig_dest, mode, offset); >> - >> + /* Set the alignment of dest to the size of mode in order to >> +avoid unnecessary byte swaps on LE. */ >> + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); > > but the alignment is now wrong which might cause ripple-down > wrong-code effects, no? > > It's probably bad to hide the byte-swapping in the move patterns (I'm > just guessing > you do that) Here I just change the alignment of "dest" which is temporary used for move. The orig_dest is untouched and keep the original alignment. The subsequent insns which use orig_dest are not affected. I am not sure if it causes ripple-down effects. Do you mean the dest might be reused later? But I think the alignment is different even though the mode and offset is the same. Looking forward to your advice. Thanks Gui Haochen
[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]
Hi, This patch eliminates unnecessary byte swaps for block clear on P8 LE. For block clear, all the bytes are set to zero. The byte order doesn't make sense. So the alignment of destination could be set to the store mode size in stead of 1 byte in order to eliminates unnecessary byte swap instructions on P8 LE. The test case shows the problem. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE gcc/ PR target/113325 * config/rs6000/rs6000-string.cc (expand_block_clear): Set the alignment of destination to the size of mode. gcc/testsuite/ PR target/113325 * gcc.target/powerpc/pr113325.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 7f777666ba9..4c9b2cbeefc 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) } dest = adjust_address (orig_dest, mode, offset); - + /* Set the alignment of dest to the size of mode in order to +avoid unnecessary byte swaps on LE. */ + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); emit_move_insn (dest, CONST0_RTX (mode)); } diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c b/gcc/testsuite/gcc.target/powerpc/pr113325.c new file mode 100644 index 000..4a3cae019c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ + +void* foo (void* s1) +{ + return __builtin_memset (s1, 0, 32); +}
[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions
Hi, This patch refactors function expand_compare_loop and split it to two functions. One is for fixed length and another is for variable length. These two functions share some low level common help functions. Besides above changes, the patch also does: 1. Don't generate load and compare loop when max_bytes is less than loop bytes. 2. Remove do_load_mask_compare as it's no needed. All sub-targets entering the function should support efficient overlapping load and compare. 3. Implement an variable length overlapping load and compare for the case which remain bytes is less than the loop bytes in variable length compare. The 4k boundary test and one-byte load and compare loop are removed as they're no need now. 4. Remove the codes for "bytes > max_bytes" with fixed length as the case is already excluded by pre-checking. 5. Remove running time codes for "bytes > max_bytes" with variable length as it should jump to call library at the beginning. 6. Enhance do_overlap_load_compare to avoid overlapping load and compare when the remain bytes can be loaded and compared by a smaller unit. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Refactor expand_compare_loop and split it to two functions The original expand_compare_loop has a complicated logical as it's designed for both fixed and variable length. This patch splits it to two functions and make these two functions share common help functions. Also the 4K boundary test and corresponding one byte load and compare are replaced by variable length overlapping load and compare. The do_load_mask_compare is removed as all sub-targets entering the function has efficient overlapping load and compare so that mask load is no needed. gcc/ * config/rs6000/rs6000-string.cc (do_isel): Remove. (do_load_mask_compare): Remove. (do_reg_compare): New. (do_load_and_compare): New. (do_overlap_load_compare): Do load and compare with a small unit other than overlapping load and compare when the remain bytes can be done by one instruction. (expand_compare_loop): Remove. (get_max_inline_loop_bytes): New. (do_load_compare_rest_of_loop): New. (generate_6432_conversion): Set it to a static function and move ahead of gen_diff_handle. (gen_diff_handle): New. (gen_load_compare_loop): New. (gen_library_call): New. (expand_compare_with_fixed_length): New. (expand_compare_with_variable_length): New. (expand_block_compare): Call expand_compare_with_variable_length to expand block compare for variable length. Call expand_compare_with_fixed_length to expand block compare loop for fixed length. gcc/testsuite/ * gcc.target/powerpc/block-cmp-5.c: New. * gcc.target/powerpc/block-cmp-6.c: New. * gcc.target/powerpc/block-cmp-7.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index f707bb2727e..018b87f2501 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison, LABEL_NUSES (true_label) += 1; } -/* Emit an isel of the proper mode for DEST. - - DEST is the isel destination register. - SRC1 is the isel source if CR is true. - SRC2 is the isel source if CR is false. - CR is the condition for the isel. */ -static void -do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr) -{ - if (GET_MODE (dest) == DImode) -emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr)); - else -emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr)); -} - /* Emit a subtract of the proper mode for DEST. DEST is the destination register for the subtract. @@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2) emit_insn (gen_rotlsi3 (dest, src1, src2)); } -/* Generate rtl for a load, shift, and compare of less than a full word. - - LOAD_MODE is the machine mode for the loads. - DIFF is the reg for the difference. - CMP_REM is the reg containing the remaining bytes to compare. - DCOND is the CCUNS reg for the compare if we are doing P9 code with setb. - SRC1_ADDR is the first source address. - SRC2_ADDR is the second source address. - ORIG_SRC1 is the original first source block's address rtx. - ORIG_SRC2 is the original second source block's address rtx. */ +/* Do the compare for two registers. */ static void -do_load_mask_compare (const machine_mode load_mode, rtx diff, rtx cmp_rem, rtx dcond, - rtx src1_addr, rtx src2_addr, rtx orig_src1, rtx orig_src2) +do_reg_compare (bool use_vec, rtx vec_result, rtx diff, rtx *dcond, rtx d1, + rtx d2) { - HOST_WIDE_INT load_mode_size = GET_MODE_SIZE (load_mode); - rtx shift_amount = gen_reg_rtx (word_mode); - rtx d1 = gen_reg_rtx
[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640833.html the main change is to split optimization for size to a separate patch and add a testcase for P7 BE. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare is better than library on P7 BE when the length is from 16 bytes to 64 bytes. gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Remove P7 CPU test and let P7 BE do the expand. gcc/testsuite/ * gcc.target/powerpc/block-cmp-4.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 5149273b80e..09db57255fa 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1947,15 +1947,12 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); + if (optimize_insn_for_size_p ()) return false; - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; - /* This case is complicated to handle because the subtract with carry instructions do not generate the 64-bit carry and so we must emit code to calculate it ourselves. @@ -1963,23 +1960,19 @@ expand_block_compare (rtx operands[]) if (TARGET_32BIT && TARGET_POWERPC64) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 -return false; + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ + if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2027,14 +2020,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c new file mode 100644 index 000..c86febae68a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target be } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */ +/* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ + +/* Test that it does expand for memcmpsi instead of calling library on + P7 BE when length is less than 32 bytes. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 31); +}
[Patch, rs6000] Call library for block memory compare when optimizing for size
Hi, This patch call library function for block memory compare when it's optimized for size. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Call library for block memory compare when optimizing for size gcc/ * config/rs6000/rs6000-string.cc (expand_block_compare): Return false when optimizing for size. gcc/testsuite/ * gcc.target/powerpc/block-cm-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 05dc41622f4..5149273b80e 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1947,6 +1947,9 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + if (optimize_insn_for_size_p ()) +return false; + rtx target = operands[0]; rtx orig_src1 = operands[1]; rtx orig_src2 = operands[2]; diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html the main change is to pass alignment measured by bits to slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard the platform which is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and call slow_unaligned_access to judge if fixed point unaligned load/store is efficient or not. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Remove. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with targetm.slow_unaligned_access. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: New. * gcc.target/powerpc/block-cmp-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..05dc41622f4 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (SImode, align * BITS_PER_UNIT) && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (word_mode, align * BITS_PER_UNIT) && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, + align * BITS_PER_UNIT)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2027,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT) && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2169,8 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, + align * BITS_PER_UNIT)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..3971a56c588 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,10 +483,6 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX - /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad memory support. */ diff --git
[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html the main change is to add some comments and move the variable definition closed to its use. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Return false (call library) when it's optimized for size. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare with 16 bytes to 64 bytes length is better than library on P7 BE. gcc/testsuite/ * gcc.target/powerpc/block-cmp-3.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index cb9eeef05d8..49670cef4d7 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); - /* This case is complicated to handle because the subtract - with carry instructions do not generate the 64-bit - carry and so we must emit code to calculate it ourselves. - We choose not to implement this yet. */ - if (TARGET_32BIT && TARGET_POWERPC64) + if (optimize_insn_for_size_p ()) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 + /* This case is complicated to handle because the subtract + with carry instructions do not generate the 64-bit + carry and so we must emit code to calculate it ourselves. + We choose not to implement this yet. */ + if (TARGET_32BIT && TARGET_POWERPC64) return false; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; + + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ +if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx)) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html the main change is to replace the macro with slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard the platform which is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and call slow_unaligned_access to judge if fixed point unaligned load/store is efficient or not. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Remove. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with targetm.slow_unaligned_access. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: New. * gcc.target/powerpc/block-cmp-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..cb9eeef05d8 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (SImode, align) && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && !targetm.slow_unaligned_access (word_mode, align) && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, align)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx)) && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (!targetm.slow_unaligned_access (load_mode, align)) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..3971a56c588 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,10 +483,6 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX - /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad memory support. */ diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c new file mode 100644 index 000..bcf0cb2ab4f --- /dev/null +++
[Patch, rs6000] Clean up pre-checking of expand_block_compare
Hi, This patch cleans up pre-checking of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up pre-checking of expand_block_compare gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Return false when it's optimized for size. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. gcc/testsuite/ * gcc.target/powerpc/memcmp_for_size.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index d4030854b2a..dff69e90d0c 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + gcc_assert (TARGET_POPCNTD); + + if (optimize_insn_for_size_p ()) +return false; + + /* Allow this param to shut off all expansion. */ + if (rs6000_block_compare_inline_limit == 0) +return false; + rtx target = operands[0]; rtx orig_src1 = operands[1]; rtx orig_src2 = operands[2]; @@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[]) if (TARGET_32BIT && TARGET_POWERPC64) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - - /* Allow this param to shut off all expansion. */ - if (rs6000_block_compare_inline_limit == 0) -return false; - - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2 -return false; - - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ +if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) -return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c new file mode 100644 index 000..c7e853ad593 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } } */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 4); +}
[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a comprehensible name. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of macro of fixed point efficient unaligned Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to guard whether a platform is efficient on fixed point unaligned load/store. It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled from P8 and can be disabled by mno-vsx option. So the definition is wrong. This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which is true on P7 BE and P8 above. gcc/ * config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED): Rename to... (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to !STRICT_ALIGNMENT. * config/rs6000/rs6000-string.cc (select_block_compare_mode): Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT. (select_block_compare_mode): Likewise. (expand_block_compare_gpr): Likewise. (expand_block_compare): Likewise. (expand_strncmp_gpr_sequence): Likewise. gcc/testsuite/ * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New. * gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 44a946cd453..d4030854b2a 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, else if (bytes == GET_MODE_SIZE (QImode)) return QImode; else if (bytes < GET_MODE_SIZE (SImode) - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && offset >= GET_MODE_SIZE (SImode) - bytes) /* This matches the case were we have SImode and 3 bytes and offset >= 1 and permits us to move back one and overlap @@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset, unwanted bytes off of the input. */ return SImode; else if (word_mode_ok && bytes < UNITS_PER_WORD - && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && offset >= UNITS_PER_WORD-bytes) /* Similarly, if we can use DImode it will get matched here and can do an overlapping read that ends at the end of the block. */ @@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ @@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[]) /* The code generated for p7 and older is not faster than glibc memcmp if alignment is small and length is not short, so bail out to avoid those conditions. */ - if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED + if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT && ((base_align == 1 && bytes > 16) || (base_align == 2 && bytes > 32))) return false; @@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT bytes_to_compare, load_mode_size = GET_MODE_SIZE (load_mode); if (bytes_to_compare >= load_mode_size) cmp_bytes = load_mode_size; - else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED) + else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT) { /* Move this load back so it doesn't go past the end. P8/P9 can do this efficiently. */ diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..2f3a82942c1 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -483,9 +483,9 @@ extern int rs6000_vector_align[]; #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) -/* This wants to be set for p8 and newer. On p7, overlapping unaligned - loads are slow. */ -#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX +/* Like TARGET_EFFICIENT_UNALIGNED_VSX, indicates if unaligned fixed point + loads/stores are efficient. */ +#define TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT (!STRICT_ALIGNMENT) /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present in power7, so conditionalize them on p8 features. TImode syncs need quad diff --git a/gcc/testsuite/gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c