Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-11 Thread HAO CHEN GUI
Missing CC to Jeff Law. Sorry.

在 2024/6/12 10:41, HAO CHEN GUI 写道:
> Hi,
>   This patch replaces rtx_cost with insn_cost in forward propagation.
> In the PR, one constant vector should be propagated and replace a
> pseudo in a store insn if we know it's a duplicated constant vector.
> It reduces the insn cost but not rtx cost. In this case, the cost is
> determined by destination operand (memory or pseudo). Unfortunately,
> rtx cost can't help.
> 
>   The test case is added in the second rs6000 specific patch.
> 
>   Compared to previous version, the main changes are:
> 1. Invoke change_is_worthwhile to judge if the cost is reduced and
> the replacement is worthwhile.
> 2. Invalidate recog data before getting the insn cost for the new
> rtl as insn cost might call extract_constrain_insn_cached and
> extract_insn_cached to cache the recog data. The cache data is
> invalid for the new rtl and it causes ICE.
> 3. Check if the insn cost of new rtl is zero which means unknown
> cost. The replacement should be rejected at this situation.
> 
> Previous version
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html
> 
>   The patch causes a regression cases on i386 as the pattern cost
> regulation has a bug. Please refer the patch and discussion here.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> ChangeLog
> fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile
> 
> gcc/
>   * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile
>   to judge if a replacement is worthwhile.
>   * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Invalidate
>   recog data before getting the insn cost for the new rtl.  Check if
>   the insn cost of new rtl is unknown and fail the replacement.
> 
> patch.diff
> diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
> index de543923b92..975de0eec7f 100644
> --- a/gcc/fwprop.cc
> +++ b/gcc/fwprop.cc
> @@ -471,29 +471,19 @@ try_fwprop_subst_pattern (obstack_watermark , 
> insn_change _change,
>redo_changes (0);
>  }
> 
> -  /* ??? In theory, it should be better to use insn costs rather than
> - set_src_costs here.  That would involve replacing this code with
> - change_is_worthwhile.  */
>bool ok = recog (attempt, use_change);
> -  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
> -if (rtx use_set = single_set (use_rtl))
> -  {
> - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
> - temporarily_undo_changes (0);
> - auto old_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - redo_changes (0);
> - auto new_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - if (new_cost > old_cost
> - || (new_cost == old_cost && !prop.likely_profitable_p ()))
> -   {
> - if (dump_file)
> -   fprintf (dump_file, "change not profitable"
> -" (cost %d -> cost %d)\n", old_cost, new_cost);
> - ok = false;
> -   }
> -  }
> +  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()
> +  && single_set (use_rtl))
> +{
> +  if (!change_is_worthwhile (use_change, false)
> +   || (!prop.likely_profitable_p ()
> +   && !change_is_worthwhile (use_change, true)))
> + {
> +   if (dump_file)
> + fprintf (dump_file, "change not profitable");
> +   ok = false;
> + }
> +}
> 
>if (!ok)
>  {
> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index 11639e81bb7..9bad6c2070c 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -185,7 +185,18 @@ rtl_ssa::changes_are_worthwhile (array_slice *const> changes,
> * change->old_cost ());
>if (!change->is_deletion ())
>   {
> +   /* Invalidate recog data as insn_cost may call
> +  extract_insn_cached.  */
> +   INSN_CODE (change->rtl ()) = -1;
> change->new_cost = insn_cost (change->rtl (), for_speed);
> +   /* If the cost is unknown, replacement is not worthwhile.  */
> +   if (!change->new_cost)
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Reject replacement due to unknown insn cost.\n");
> +   return false;
> + }
> new_cost += change->new_cost;
> if (for_speed)
>   weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)


[Patch-2v2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-11 Thread HAO CHEN GUI
Hi,
  This patch creates an insn_and_split pattern which helps the duplicated
constant vector replace the source pseudo of store insn in fwprop pass.
Thus the store can be implemented by a single stxvd2x and it eliminates the
unnecessary byte swap insn on P8 LE. The test case shows the optimization.

  The patch depends on the first generic patch which uses insn cost in fwprop.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654276.html

  Compared to previous version, the main change is to remove the predict and
put the check in insn condition and gcc assertion.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store

gcc/
PR target/113325
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..89eb32a0758 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3368,6 +3368,32 @@ (define_insn "*vsx_stxvd2x4_le_"
   "stxvd2x %x1,%y0"
   [(set_attr "type" "vecstore")])

+(define_insn_and_split "vsx_stxvd2x4_le_const_"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+   (match_operand:VSX_W 1 "immediate_operand" "W"))]
+  "!BYTES_BIG_ENDIAN
+   && VECTOR_MEM_VSX_P (mode)
+   && !TARGET_P9_VECTOR
+   && const_vec_duplicate_p (operands[1])"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (vec_select:VSX_W
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+(const_int 0) (const_int 1)])))]
+{
+  /* Here all the constants must be loaded without memory.  */
+  gcc_assert (easy_altivec_constant (operands[1], mode));
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+: operands[1];
+
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "*vsx_stxvd2x8_le_V8HI"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
 (vec_select:V8HI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..3ca1fcbc9ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


[PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-11 Thread HAO CHEN GUI
Hi,
  This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the cost is
determined by destination operand (memory or pseudo). Unfortunately,
rtx cost can't help.

  The test case is added in the second rs6000 specific patch.

  Compared to previous version, the main changes are:
1. Invoke change_is_worthwhile to judge if the cost is reduced and
the replacement is worthwhile.
2. Invalidate recog data before getting the insn cost for the new
rtl as insn cost might call extract_constrain_insn_cached and
extract_insn_cached to cache the recog data. The cache data is
invalid for the new rtl and it causes ICE.
3. Check if the insn cost of new rtl is zero which means unknown
cost. The replacement should be rejected at this situation.

Previous version
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html

  The patch causes a regression cases on i386 as the pattern cost
regulation has a bug. Please refer the patch and discussion here.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

ChangeLog
fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile

gcc/
* fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile
to judge if a replacement is worthwhile.
* rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Invalidate
recog data before getting the insn cost for the new rtl.  Check if
the insn cost of new rtl is unknown and fail the replacement.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index de543923b92..975de0eec7f 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -471,29 +471,19 @@ try_fwprop_subst_pattern (obstack_watermark , 
insn_change _change,
   redo_changes (0);
 }

-  /* ??? In theory, it should be better to use insn costs rather than
- set_src_costs here.  That would involve replacing this code with
- change_is_worthwhile.  */
   bool ok = recog (attempt, use_change);
-  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
-if (rtx use_set = single_set (use_rtl))
-  {
-   bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
-   temporarily_undo_changes (0);
-   auto old_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
-   redo_changes (0);
-   auto new_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
-   if (new_cost > old_cost
-   || (new_cost == old_cost && !prop.likely_profitable_p ()))
- {
-   if (dump_file)
- fprintf (dump_file, "change not profitable"
-  " (cost %d -> cost %d)\n", old_cost, new_cost);
-   ok = false;
- }
-  }
+  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()
+  && single_set (use_rtl))
+{
+  if (!change_is_worthwhile (use_change, false)
+ || (!prop.likely_profitable_p ()
+ && !change_is_worthwhile (use_change, true)))
+   {
+ if (dump_file)
+   fprintf (dump_file, "change not profitable");
+ ok = false;
+   }
+}

   if (!ok)
 {
diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 11639e81bb7..9bad6c2070c 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -185,7 +185,18 @@ rtl_ssa::changes_are_worthwhile (array_slice changes,
  * change->old_cost ());
   if (!change->is_deletion ())
{
+ /* Invalidate recog data as insn_cost may call
+extract_insn_cached.  */
+ INSN_CODE (change->rtl ()) = -1;
  change->new_cost = insn_cost (change->rtl (), for_speed);
+ /* If the cost is unknown, replacement is not worthwhile.  */
+ if (!change->new_cost)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Reject replacement due to unknown insn cost.\n");
+ return false;
+   }
  new_cost += change->new_cost;
  if (for_speed)
weighted_new_cost += (cfg_bb->count.to_sreal_scale (entry_count)


Re: [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-05 Thread HAO CHEN GUI
Hi Kewen,

在 2024/6/5 17:00, Kewen.Lin 写道:
> This predicate can be moved to its only use (define_insn part condition).
> The const_vector match_code check is redundant as const_vec_duplicate_p
> already checks that, I wonder if we really need easy_altivec_constant?
> Even if one vector constant doesn't meet easy_altivec_constant, but if
> it matches the desired duplicated pattern, it doesn't need the swapping
> either, no?

Thanks for your comments.
I think we need easy_altivec_constant as the constant will be directly
moved to a vector register after split. It might fail if it's not a easy
alitvec constant?

  [(set (match_dup 2)
(match_dup 1))

Thanks
Gui Haochen


Ping [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-04 Thread HAO CHEN GUI
Hi,
  Gently ping the patch.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html

Thanks
Gui Haochen


在 2024/1/26 9:17, HAO CHEN GUI 写道:
> Hi,
>   This patch creates an insn_and_split pattern which helps the duplicated
> constant vector replace the source pseudo of store insn in fwprop pass.
> Thus the store can be implemented by a single stxvd2x and it eliminates the
> unnecessary byte swap insn on P8 LE. The test case shows the optimization.
> 
>   The patch depends on the first generic patch which uses insn cost in fwprop.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store
> 
> gcc/
>   PR target/113325
>   * config/rs6000/predicates.md (duplicate_easy_altivec_constant): New.
>   * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.
> 
> gcc/testsuite/
>   PR target/113325
>   * gcc.target/powerpc/pr113325.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index ef7d3f214c4..8ab6db630b7 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant"
>return false;
>  })
> 
> +;; Return 1 if it's a duplicated easy_altivec_constant.
> +(define_predicate "duplicate_easy_altivec_constant"
> +  (and (match_code "const_vector")
> +   (match_test "easy_altivec_constant (op, mode)"))
> +{
> +  return const_vec_duplicate_p (op);
> +})
> +
>  ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF.
>  (define_predicate "easy_vector_constant_add_self"
>(and (match_code "const_vector")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 26fa32829af..98e4be26f64 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_"
>"stxvd2x %x1,%y0"
>[(set_attr "type" "vecstore")])
> 
> +(define_insn_and_split "vsx_stxvd2x4_le_const_"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> + (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))]
> +  "!BYTES_BIG_ENDIAN
> +   && VECTOR_MEM_VSX_P (mode)
> +   && !TARGET_P9_VECTOR"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 2)
> + (match_dup 1))
> +   (set (match_dup 0)
> + (vec_select:VSX_W
> +   (match_dup 2)
> +   (parallel [(const_int 2) (const_int 3)
> +  (const_int 0) (const_int 1)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> +  : operands[1];
> +
> +}
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "8")])
> +
>  (define_insn "*vsx_stxvd2x8_le_V8HI"
>[(set (match_operand:V8HI 0 "memory_operand" "=Z")
>  (vec_select:V8HI
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 000..dff68ac0a51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}


Re: [PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-04 Thread HAO CHEN GUI
Hi Jeff,

在 2024/6/4 22:14, Jeff Law 写道:
> 
> 
> On 1/25/24 6:16 PM, HAO CHEN GUI wrote:
>> Hi,
>>    This patch replaces rtx_cost with insn_cost in forward propagation.
>> In the PR, one constant vector should be propagated and replace a
>> pseudo in a store insn if we know it's a duplicated constant vector.
>> It reduces the insn cost but not rtx cost. In this case, the kind of
>> destination operand (memory or pseudo) decides the cost and rtx cost
>> can't reflect it.
>>
>>    The test case is added in the second target specific patch.
>>
>>    Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for next stage 1?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern
>>
>> gcc/
>> PR target/113325
>> * fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with
>> insn_cost.
> Testcase?  I don't care of it's ppc specific.
> 
> I think we generally want to move from rtx_cost to insn_cost, so I think the 
> change itself is fine.  We just want to make sure a test covers the change in 
> some manner.
> 
> Also note this a change to generic code and could likely trigger failures on 
> various targets that have assembler scanning tests.  So once you've got a 
> testcase and the full patch is ack'd we'll need to watch closely for 
> regressions reported on other targets.
> 
> 
> So ACK'd once you add a testcase.
> 
> Jeff
Thanks for your comments.

The test case is in this rs6000 patch. The patch is still under review.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html

I have sent the second version of the patch. The main change is to detect the
zero cost returned by insn_cost as it means the cost is unknown.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html

I have already tested the patch on other targets. I have found some regression
on x86 due to the wrong cost conversion from set_src_cost to pattern_cost. I
have sent another patch for this issue. Reviewers have different thoughts on
it. It's pending now.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html


Ping [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-02 Thread HAO CHEN GUI
Hi,
  Gently ping the series of patches.
[PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html
[PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652594.html
[PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652595.html

Thanks
Gui Haochen

在 2024/5/24 14:02, HAO CHEN GUI 写道:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main change is to narrow
> down the predict for float operand according to review's advice.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>   (isinf2 for IEEE128): New expand.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..08cce11da60 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>operands[4] = CONST0_RTX (SImode);
>  })
> 
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
> +  DONE;
> +})
> +
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
> (0x30)));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> new file mode 100644
> index 000..c1c4f64ee8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +
> +int test1 (double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test3 (float x)
> +{
> +  return __builtin_isinff (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> new file mode 100644
> index 000..ed305e8572e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (long double x)
> +{
> +  return __builtin_isinfl (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


Ping [PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-06-02 Thread HAO CHEN GUI
Hi,
  All issues were addressed. Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653001.html

Thanks
Gui Haochen


在 2024/5/29 14:36, HAO CHEN GUI 写道:
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
> 
>   The subsequent patches will implement the expand on rs6000.
> 
>   Compared to previous version, the main change is to specify return
> value of the optab should be either 0 or 1.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
> 
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
>   for isnormal builtin.
>   * optabs.def (isnormal_optab): New.
>   * doc/md.texi (isnormal): Document.
> 
> 
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 53e9d210541..89ba56abf17 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>builtin_optab = isfinite_optab;
>break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab;
> +  break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 3eb4216141e..4fd7da095fe 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point 
> number and 0
>  otherwise.  @var{m} is a scalar floating point mode.  Operand 0
>  has mode @code{SImode}, and operand 1 has mode @var{m}.
> 
> +@cindex @code{isnormal@var{m}2} instruction pattern
> +@item @samp{isnormal@var{m}2}
> +Return 1 if operand 1 is a normal floating point number and 0
> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> +has mode @code{SImode}, and operand 1 has mode @var{m}.
> +
>  @end table
> 
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Ping [PATCHv5] Optab: add isfinite_optab for __builtin_isfinite

2024-06-02 Thread HAO CHEN GUI
Hi,
  All issues were addressed. Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652991.html

Thanks
Gui Haochen

在 2024/5/29 14:36, HAO CHEN GUI 写道:
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
> 
>   The subsequent patches will implement the expand on rs6000.
> 
>   Compared to previous version, the main change is to specify return
> value of the optab should be either 0 or 1.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
> 
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>   for isfinite builtin.
>   * optabs.def (isfinite_optab): New.
>   * doc/md.texi (isfinite): Document.
> 
> 
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..53e9d210541 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab;
> +  break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..3eb4216141e 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
> 
>  This pattern is not allowed to @code{FAIL}.
> 
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Return 1 if operand 1 is a finite floating point number and 0
> +otherwise.  @var{m} is a scalar floating point mode.  Operand 0
> +has mode @code{SImode}, and operand 1 has mode @var{m}.
> +
>  @end table
> 
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


[PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-05-30 Thread HAO CHEN GUI
Hi,
  This patch optimizes vector construction with two vector doubleword loads.
It generates an optimal insn sequence as "xxlor" has lower latency than
"mtvsrdd" on Power10.

  Compared with previous version, the main change is to use "isa" attribute
to guard "lxsd" and "lxsdx".
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Optimize vector construction with two vector doubleword loads

When constructing a vector by two doublewords from memory, originally it
does
ld 10,0(3)
ld 9,0(4)
mtvsrdd 34,9,10

An optimal sequence on Power10 should be
lxsd 0,0(4)
lxvrdx 1,0,3
xxlor 34,1,32

This patch does this optimization by insn combine and split.

gcc/
PR target/103568
* config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn
pattern.
(vsx_ld_highpart_zero_): New insn pattern.
(vsx_concat_mem_): New insn_and_split pattern.

gcc/testsuite/
PR target/103568
* gcc.target/powerpc/pr103568.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..f9a2a260e89 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])

+(define_insn "vsx_ld_lowpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "zero_constant" "j,j")))]
+  ""
+  "@
+   lxsd %0,%1
+   lxsdx %x0,%y1"
+  [(set_attr "type" "vecload,vecload")
+   (set_attr "isa" "p9v,p7v")])
+
+(define_insn "vsx_ld_highpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "zero_constant" "j")
+ (match_operand: 2 "memory_operand" "Z")))]
+  "TARGET_POWER10"
+  "lxvrdx %x0,%y2"
+  [(set_attr "type" "vecload")])
+
 (define_insn "vsx_ld_elemrev_v1ti"
   [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
 (vec_select:V1TI
@@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_"
 }
   [(set_attr "type" "vecperm,vecmove")])

+(define_insn_and_split "vsx_concat_mem_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "memory_operand" "Z,Z")))]
+  "TARGET_POWER10 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX 
(mode),
+ operands[1]));
+  emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2],
+CONST0_RTX (mode)));
+  emit_insn (gen_ior3 (operands[0], tmp1, tmp2));
+  DONE;
+})
+
 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
 (define_insn "*vsx_concat__1"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c 
b/gcc/testsuite/gcc.target/powerpc/pr103568.c
new file mode 100644
index 000..b2a06fb2162
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+vector double test (double *a, double *b)
+{
+  return (vector double) {*a, *b};
+}
+
+vector long long test1 (long long *a, long long *b)
+{
+  return (vector long long) {*a, *b};
+}
+
+/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */
+


[PATCH, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch optimizes vector construction with two vector doubleword loads.
It generates an optimal insn sequence as "xxlor" has lower latency than
"mtvsrdd" on Power10.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Optimize vector construction with two vector doubleword loads

When constructing a vector by two doublewords from memory, originally it
does
ld 10,0(3)
ld 9,0(4)
mtvsrdd 34,9,10

An optimal sequence on Power10 should be
lxsd 0,0(4)
lxvrdx 1,0,3
xxlor 34,1,32

This patch does this optimization by insn combine and split.

gcc/
PR target/103568
* config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn
pattern.
(vsx_ld_highpart_zero_): New insn pattern.
(vsx_concat_mem_): New insn_and_split pattern.

gcc/testsuite/
PR target/103568
* gcc.target/powerpc/pr103568.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..3c98e3d4e13 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1395,6 +1395,26 @@ (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])

+(define_insn "vsx_ld_lowpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "zero_constant" "j,j")))]
+  "TARGET_P9_VECTOR"
+  "@
+   lxsd %0,%1
+   lxsdx %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "vsx_ld_highpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "zero_constant" "j")
+ (match_operand: 2 "memory_operand" "Z")))]
+  "TARGET_POWER10"
+  "lxvrdx %x0,%y2"
+  [(set_attr "type" "vecload")])
+
 (define_insn "vsx_ld_elemrev_v1ti"
   [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
 (vec_select:V1TI
@@ -3063,6 +3083,26 @@ (define_insn "vsx_concat_"
 }
   [(set_attr "type" "vecperm,vecmove")])

+(define_insn_and_split "vsx_concat_mem_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "memory_operand" "Z,Z")))]
+  "TARGET_POWER10 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX 
(mode),
+ operands[1]));
+  emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2],
+CONST0_RTX (mode)));
+  emit_insn (gen_ior3 (operands[0], tmp1, tmp2));
+  DONE;
+})
+
 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
 (define_insn "*vsx_concat__1"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c 
b/gcc/testsuite/gcc.target/powerpc/pr103568.c
new file mode 100644
index 000..b2a06fb2162
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+vector double test (double *a, double *b)
+{
+  return (vector double) {*a, *b};
+}
+
+vector long long test1 (long long *a, long long *b)
+{
+  return (vector long long) {*a, *b};
+}
+
+/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */
+


Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-29 Thread HAO CHEN GUI
Hi Kewen,

在 2024/5/29 13:26, Kewen.Lin 写道:
> I can understand re-using "unordered" and "eq" will save some efforts than
> doing with unspecs, but they are actually RTL codes instead of bits on the
> specific hardware CR, a downside is that people who isn't aware of this
> design point can have some misunderstanding when reading/checking the code
> or dumping, from this perspective unspecs (with reasonable name) can be
> more meaningful.  Normally adopting RTL code is better since they have the
> chance to be considered (optimized) in generic pass/code, but it isn't the
> case here as we just use the code itself but not be with the same semantic
> (meaning).  Looking forward to others' opinions on this, if we want to adopt
> "unordered" and "eq" like what this patch does, I think we should at least
> emphasize such points in rs6000-modes.def.

Thanks so much for your comments. IMHO, the core is if we can re-define
"unordered" or "eq" for certain CC mode on a specific target. If we can't or
it's unsafe, we have to use the unspecs. In this case, I just want to define
the code "unordered" on CCBCD as testing if the bit 3 is set on this CR field.
Actually rs6000 already use "lt" code to test if bit 0 is set for vector
compare instructions. The following expand is an example.

(define_expand "vector_ae__p"
  [(parallel
[(set (reg:CC CR6_REGNO)
  (unspec:CC [(ne:CC (match_operand:VI 1 "vlogical_operand")
 (match_operand:VI 2 "vlogical_operand"))]
   UNSPEC_PREDICATE))
 (set (match_dup 3)
  (ne:VI (match_dup 1)
 (match_dup 2)))])
   (set (match_operand:SI 0 "register_operand" "=r")
(lt:SI (reg:CC CR6_REGNO)
   (const_int 0)))
   (set (match_dup 0)
(xor:SI (match_dup 0)
(const_int 1)))]

I think the "lt" on CC just doesn't mean it compares if CC value is less than an
integer. It just tests the "lt" bit (bit 0) is set or not on this CC.

  Looking forward to your and Segher's further invaluable comments.

Thanks
Gui Haochen


[PATCH-1v3] Value Range: Add range op for builtin isinf

2024-05-29 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Compared with previous version, the main change is to set the range to
1 if it's infinite number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..4e60a42eaac 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1175,6 +1175,63 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (_bound ())
+   && !real_isinf (_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+


[PATCH-3v2] Value Range: Add range op for builtin isnormal

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isnormal. It also adds two
help function in frange to detect range of normal floating-point and
range of subnormal or zero.

  Compared to previous version, the main change is to set the range to
1 if it's normal number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isnormal

The former patch adds optab for builtin isnormal. Thus builtin isnormal
might not be folded at front end.  So the range op for isnormal is needed
for value range analysis.  This patch adds range op for builtin isnormal.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 5ec5c828fa4..6787f532f11 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1289,6 +1289,61 @@ public:
   }
 } op_cfn_isfinite;

+//Implement range operator for CFN_BUILT_IN_ISNORMAL
+class cfn_isnormal :  public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isnormal ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ()
+   || op1.known_isdenormal_or_zero ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isnormal;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isfinite;
   break;

+case CFN_BUILT_IN_ISNORMAL:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isnormal;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
new file mode 100644
index 000..c4df4d839b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 37ce91dc52d..1443d1906e5 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -588,6 +588,8 @@ public:
   bool maybe_isinf () const;
   bool signbit_p (bool ) const;
   bool nan_signbit_p (bool ) const;
+  bool known_isnormal () const;
+  bool known_isdenormal_or_zero () const;

 protected:
   virtual bool contains_p (tree cst) const override;
@@ -1650,6 +1652,33 @@ frange::known_isfinite () const
   return (!maybe_isnan () && !real_isinf (_min) && !real_isinf (_max));
 }

+// Return TRUE if range is known to be normal.
+
+inline bool
+frange::known_isnormal () const
+{
+  if (!known_isfinite ())
+return false;
+
+  machine_mode mode = 

[PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isfinite.

  Compared to previous version, the main change is to set the range to
1 if it's finite number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite
might not be folded at front end.  So the range op for isfinite is needed
for value range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 4e60a42eaac..5ec5c828fa4 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1233,6 +1233,62 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify return
value of the optab should be either 0 or 1.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 53e9d210541..89ba56abf17 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   builtin_optab = isfinite_optab;
   break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab;
+  break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 3eb4216141e..4fd7da095fe 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point number 
and 0
 otherwise.  @var{m} is a scalar floating point mode.  Operand 0
 has mode @code{SImode}, and operand 1 has mode @var{m}.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Return 1 if operand 1 is a normal floating point number and 0
+otherwise.  @var{m} is a scalar floating point mode.  Operand 0
+has mode @code{SImode}, and operand 1 has mode @var{m}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv5] Optab: add isfinite_optab for __builtin_isfinite

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify return
value of the optab should be either 0 or 1.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..53e9d210541 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab;
+  break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..3eb4216141e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Return 1 if operand 1 is a finite floating point number and 0
+otherwise.  @var{m} is a scalar floating point mode.  Operand 0
+has mode @code{SImode}, and operand 1 has mode @var{m}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv4] Optab: add isnormal_optab for __builtin_isnormal

2024-05-28 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify acceptable
input and output modes for the optab.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652814.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8432f84020..ccd57fce522 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7be0c75baf9..491cd09c620 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8563,6 +8563,12 @@ Set operand 0 to nonzero if operand 1 is a finite 
floating point
 number and to 0 otherwise.  Input mode should be a scalar floating
 point mode and output mode should be @code{SImode}.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Set operand 0 to nonzero if operand 1 is a normal floating point
+number and to 0 otherwise.  Input mode should be a scalar floating
+point mode and return mode should be @code{SImode}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv4] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify acceptable
input and output modes for the optab.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652813.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b8432f84020 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..7be0c75baf9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Set operand 0 to nonzero if operand 1 is a finite floating point
+number and to 0 otherwise.  Input mode should be a scalar floating
+point mode and output mode should be @code{SImode}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv3] Optab: add isnormal_optab for __builtin_isnormal

2024-05-27 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify acceptable
modes for the optab.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652172.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8432f84020..ccd57fce522 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index bc67324872f..7de9c2b5b70 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8566,6 +8566,15 @@ and to 0 otherwise.
 If this pattern @code{FAIL}, a call to the library function
 @code{isfinite} is used.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Set operand 0 to nonzero if operand 1 is a normal @code{SFmode},
+@code{DFmode}, or @code{TFmode} floating point number and to 0
+otherwise.
+
+If this pattern @code{FAIL}, a call to the library function
+@code{isnormal} is used.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-27 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify acceptable
modes for the optab.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b8432f84020 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..67407fad37d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
+@code{DFmode}, or @code{TFmode} floating point number and to 0
+otherwise.
+
+If this pattern @code{FAIL}, a call to the library function
+@code{isfinite} is used.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


Re: [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-27 Thread HAO CHEN GUI
Hi Kewen,
  Thanks for your comments.

在 2024/5/27 11:18, Kewen.Lin 写道:
> Does this require "This pattern is not allowed to FAIL."?
> 
> I guess yes?  Since if it's decided to go with this pattern
> expanding, there is no fall back?

  The builtin is inline folded if the optab doesn't exist on
the target. Otherwise, it is expanded by target specific
insns. If it fails at expand, the library is called. It can't
fall back to inline folding when it fails at expand. I am not
sure whether it should be marked "allowed to FAIL" or not.

  Could anyone advice me?

Thanks
Gui Haochen


Ping^2 [Patch, rs6000] Enable overlap memory store for block memory clear

2024-05-26 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html

Thanks
Gui Haochen

在 2024/5/8 9:55, HAO CHEN GUI 写道:
> Hi,
>   As now it's stage 1, gently ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/2/26 10:25, HAO CHEN GUI 写道:
>> Hi,
>>   This patch enables overlap memory store for block memory clear which
>> saves the number of store instructions. The expander calls
>> widest_fixed_size_mode_for_block_clear to get the mode for looped block
>> clear and calls widest_fixed_size_mode_for_block_clear to get the mode
>> for last overlapped clear.
>>
>> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk or next stage 1?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> rs6000: Enable overlap memory store for block memory clear
>>
>> gcc/
>>  * config/rs6000/rs6000-string.cc
>>  (widest_fixed_size_mode_for_block_clear): New.
>>  (smallest_fixed_size_mode_for_block_clear): New.
>>  (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
>>  get the mode for looped memory stores and call
>>  smallest_fixed_size_mode_for_block_clear to get the mode for the last
>>  overlapped memory store.
>>
>> gcc/testsuite
>>  * gcc.target/powerpc/block-clear-1.c: New.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-string.cc 
>> b/gcc/config/rs6000/rs6000-string.cc
>> index 133e5382af2..c2a6095a586 100644
>> --- a/gcc/config/rs6000/rs6000-string.cc
>> +++ b/gcc/config/rs6000/rs6000-string.cc
>> @@ -38,6 +38,49 @@
>>  #include "profile-count.h"
>>  #include "predict.h"
>>
>> +/* Return the widest mode which mode size is less than or equal to the
>> +   size.  */
>> +static fixed_size_mode
>> +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int 
>> align,
>> +bool unaligned_vsx_ok)
>> +{
>> +  machine_mode mode;
>> +
>> +  if (TARGET_ALTIVEC
>> +  && size >= 16
>> +  && (align >= 128
>> +  || unaligned_vsx_ok))
>> +mode = V4SImode;
>> +  else if (size >= 8
>> +   && TARGET_POWERPC64
>> +   && (align >= 64
>> +   || !STRICT_ALIGNMENT))
>> +mode = DImode;
>> +  else if (size >= 4
>> +   && (align >= 32
>> +   || !STRICT_ALIGNMENT))
>> +mode = SImode;
>> +  else if (size >= 2
>> +   && (align >= 16
>> +   || !STRICT_ALIGNMENT))
>> +mode = HImode;
>> +  else
>> +mode = QImode;
>> +
>> +  return as_a  (mode);
>> +}
>> +
>> +/* Return the smallest mode which mode size is smaller than or eqaul to
>> +   the size.  */
>> +static fixed_size_mode
>> +smallest_fixed_size_mode_for_block_clear (unsigned int size)
>> +{
>> +  if (size > UNITS_PER_WORD)
>> +return as_a  (V4SImode);
>> +
>> +  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
>> +}
>> +
>>  /* Expand a block clear operation, and return 1 if successful.  Return 0
>> if we should let the compiler generate normal code.
>>
>> @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
>>HOST_WIDE_INT align;
>>HOST_WIDE_INT bytes;
>>int offset;
>> -  int clear_bytes;
>>int clear_step;
>>
>>/* If this is not a fixed size move, just call memcpy */
>> @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])
>>
>>bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);
>>
>> -  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
>> +  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
>> +  unaligned_vsx_ok);
>> +  offset = 0;
>> +  rtx dest;
>> +
>> +  do
>>  {
>> -  machine_mode mode = BLKmode;
>> -  rtx dest;
>> +  unsigned int size = GET_MODE_SIZE (mode);
>>
>> -  if (TARGET_ALTIVEC
>> -  && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
>> +  while (bytes >= size)
>>  {
>> -  clear_bytes = 16;
>> -  mode = V4SImode;
>> -}
>> -  else if (bytes >= 8 && TARGET_POWERPC64

Ping [PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-26 Thread HAO CHEN GUI
Hi,
  Gently ping the series of patches which add range op.

[PATCH-1v2] Value Range: Add range op for builtin isinf
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html
[PATCH-2v3] Value Range: Add range op for builtin isfinite
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html
[PATCH-3] Value Range: Add range op for builtin isnormal
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html

Thanks
Gui Haochen

在 2024/5/21 10:52, HAO CHEN GUI 写道:
> Hi,
>   The builtin isinf is not folded at front end if the corresponding optab
> exists. It causes the range evaluation failed on the targets which has
> optab_isinf. For instance, range-sincos.c will fail on the targets which
> has optab_isinf as it calls builtin_isinf.
> 
>   This patch fixed the problem by adding range op for builtin isinf.
> 
>   Compared with previous version, the main change is to set varying if
> nothing is known about the range.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> Value Range: Add range op for builtin isinf
> 
> The builtin isinf is not folded at front end if the corresponding optab
> exists.  So the range op for isinf is needed for value range analysis.
> This patch adds range op for builtin isinf.
> 
> gcc/
>   * gimple-range-op.cc (class cfn_isinf): New.
>   (op_cfn_isinf): New variables.
>   (gimple_range_op_handler::maybe_builtin_call): Handle
>   CASE_FLT_FN (BUILT_IN_ISINF).
> 
> gcc/testsuite/
>   * gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.
> 
> patch.diff
> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index 55dfbb23ce2..eb1b0aff77c 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -1175,6 +1175,62 @@ private:
>bool m_is_pos;
>  } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
> 
> +// Implement range operator for CFN_BUILT_IN_ISINF
> +class cfn_isinf : public range_operator
> +{
> +public:
> +  using range_operator::fold_range;
> +  using range_operator::op1_range;
> +  virtual bool fold_range (irange , tree type, const frange ,
> +const irange &, relation_trio) const override
> +  {
> +if (op1.undefined_p ())
> +  return false;
> +
> +if (op1.known_isinf ())
> +  {
> + r.set_nonzero (type);
> + return true;
> +  }
> +
> +if (op1.known_isnan ()
> + || (!real_isinf (_bound ())
> + && !real_isinf (_bound (
> +  {
> + r.set_zero (type);
> + return true;
> +  }
> +
> +r.set_varying (type);
> +return true;
> +  }
> +  virtual bool op1_range (frange , tree type, const irange ,
> +   const frange &, relation_trio) const override
> +  {
> +if (lhs.undefined_p ())
> +  return false;
> +
> +if (lhs.zero_p ())
> +  {
> + nan_state nan (true);
> + r.set (type, real_min_representable (type),
> +real_max_representable (type), nan);
> + return true;
> +  }
> +
> +if (!range_includes_zero_p (lhs))
> +  {
> + // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
> + // Set range to [-INF,+INF]
> + r.set_varying (type);
> + r.clear_nan ();
> + return true;
> +  }
> +
> +r.set_varying (type);
> +return true;
> +  }
> +} op_cfn_isinf;
> 
>  // Implement range operator for CFN_BUILT_IN_
>  class cfn_parity : public range_operator
> @@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>m_operator = _cfn_signbit;
>break;
> 
> +CASE_FLT_FN (BUILT_IN_ISINF):
> +  m_op1 = gimple_call_arg (call, 0);
> +  m_operator = _cfn_isinf;
> +  break;
> +
>  CASE_CFN_COPYSIGN_ALL:
>m_op1 = gimple_call_arg (call, 0);
>m_op2 = gimple_call_arg (call, 1);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
> new file mode 100644
> index 000..468f1bcf5c7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-evrp" } */
> +
> +#include 
> +void link_error();
> +
> +void
> +test1 (double x)
> +{
> +  if (x > __DBL_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +}
> +
> +void
> +test2

Ping [PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-26 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652172.html

Thanks
Gui Haochen

在 2024/5/20 16:15, HAO CHEN GUI 写道:
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
> 
>   The subsequent patches will implement the expand on rs6000.
> 
>   Compared to previous version, the main change is to document isnormal
> in md.texi.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
> 
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
>   for isnormal builtin.
>   * optabs.def (isnormal_optab): New.
>   * doc/md.texi (isnormal): Document.
> 
> 
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index b8432f84020..ccd57fce522 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 8ed70b3feea..b81b9dec18a 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}.
>  Set operand 0 to nonzero if operand 1 is a finite floating-point
>  number and to 0 otherwise.
> 
> +@cindex @code{isnormal@var{m}2} instruction pattern
> +@item @samp{isnormal@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a normal floating-point
> +number and to 0 otherwise.
> +
>  @end table
> 
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Ping [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-26 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html

Thanks
Gui Haochen

在 2024/5/20 16:15, HAO CHEN GUI 写道:
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
> 
>   The subsequent patches will implement the expand on rs6000.
> 
>   Compared to previous version, the main change is to document isfinite
> in md.texi.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
> 
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>   for isfinite builtin.
>   * optabs.def (isfinite_optab): New.
>   * doc/md.texi (isfinite): Document.
> 
> 
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..b8432f84020 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..8ed70b3feea 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
> 
>  This pattern is not allowed to @code{FAIL}.
> 
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite floating-point
> +number and to 0 otherwise.
> +
>  @end table
> 
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Ping^2 [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-26 Thread HAO CHEN GUI
Hi,
  Gently ping them.

Thanks
Gui Haochen

在 2024/5/13 9:56, HAO CHEN GUI 写道:
> Hi,
>   Gently ping the series of patches.
> [PATCH-1, rs6000]Add a new type of CC mode - CCBCD for bcd insns [PR100736, 
> PR114732]
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650217.html
> [PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650218.html
> [PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650219.html
> [PATCH-4, rs6000] Optimize single cc bit reverse implementation
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650220.html
> [PATCH-5, rs6000] Replace explicit CC bit reverse with common format
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html
> [PATCH-6, rs6000] Split setcc to two insns after reload
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/4/30 15:18, HAO CHEN GUI 写道:
>> Hi,
>>   It's the first patch of a series of patches optimizing CC modes on
>> rs6000.
>>
>>   bcd insns set all four bits of a CR field. But it has different single
>> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
>> to indict overflow or invalid number. It's not a bit for unordered test.
>> So the "le" test should be reversed to "gt" not "ungt". The "ge" test
>> should be reversed to "lt" not "unlt". That's the root cause of PR100736
>> and PR114732.
>>
>>   This patch fixes the issue by adding a new type of CC mode - CCBCD for
>> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
>> be merged to a uniform pattern which is for all CC modes in sequential
>> patch.
>>
>>   The rtl code "unordered" is still used for testing overflow or
>> invalid number. IMHO, the "unordered" on a CC mode can be considered as
>> testing the forth bit of a CR field setting or not. The "eq" on a CC mode
>> can be considered as testing the third bit setting or not. Thus we avoid
>> creating lots of unspecs for the CR bit testing.
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>>
>> ChangeLog
>> rs6000: Add a new type of CC mode - CCBCD for bcd insns
>>
>> gcc/
>>  PR target/100736
>>  PR target/114732
>>  * config/rs6000/altivec.md (bcd_): Replace CCFP
>>  with CCBCD.
>>  (*bcd_test_): Likewise.
>>  (*bcd_test2_): Likewise.
>>  (bcd__): Likewise.
>>  (*bcdinvalid_): Likewise.
>>  (bcdinvalid_): Likewise.
>>  (bcdshift_v16qi): Likewise.
>>  (bcdmul10_v16qi): Likewise.
>>  (bcddiv10_v16qi): Likewise.
>>  (peephole for bcd_add/sub): Likewise.
>>  * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
>>  and its supported comparison codes.
>>  * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
>>  * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
>>  assertion.
>>  * config/rs6000/rs6000.md (CC_any): Add CCBCD.
>>  (ccbcd_rev): New code iterator.
>>  (*_cc): New insn and split pattern for CCBCD reverse
>>  compare.
>>
>> gcc/testsuite/
>>  PR target/100736
>>  PR target/114732
>>  * gcc.target/powerpc/pr100736.c: New.
>>  * gcc.target/powerpc/pr114732.c: New.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index bb20441c096..9fa8cf89f61 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -4443,7 +4443,7 @@ (define_insn "bcd_"
>>(match_operand:VBCD 2 "register_operand" "v")
>>(match_operand:QI 3 "const_0_to_1_operand" "n")]
>>   UNSPEC_BCD_ADD_SUB))
>> -   (clobber (reg:CCFP CR6_REGNO))]
>> +   (clobber (reg:CCBCD CR6_REGNO))]
>>"TARGET_P8_VECTOR"
>>"bcd. %0,%1,%2,%3"
>>[(set_attr "type" "vecsimple")])
>> @@ -4454,8 +4454,8 @@ (define_insn "bcd_"
>>  ;; probably should be one that can go in the VMX (Altivec) registers, so we
>>  ;; can't use DDmode or DFmode.
>>  (define_insn "*bcd_test_"
>> -  [(set (reg:CCFP CR6_REGNO)
>> -(compare:CCFP
>> +  [(set (reg:CCBCD CR6_REGNO)
&g

[PATCH-3v3, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-05-24 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to narrow
down the predict for float operand according to review's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652130.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isnormal for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 95214d732f0..d4d98543912 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5353,6 +5353,28 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..2df472e35d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..00478dbf3ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-2v3, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-05-24 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isfinite for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is to narrow
down the predict for float operand according to review's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652129.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isfinite for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2 for SFDF): New expand.
(isfinite2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 08cce11da60..95214d732f0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5331,6 +5331,28 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..01faa962bd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..0e106b9f23a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-24 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SFDF and IEEE128 by test
data class instructions.

  Compared with previous version, the main change is to narrow
down the predict for float operand according to review's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isinf2 for SFDF): New expand.
(isinf2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..08cce11da60 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..c1c4f64ee8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..ed305e8572e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


Re: [PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-23 Thread HAO CHEN GUI
Hi Peter,
  Thanks for your comments.

在 2024/5/23 5:58, Peter Bergner 写道:
> Is there a reason not to use the vsx_register_operand predicate for op1
> which matches the predicate for the operand of the xststdcp pattern
> we're passing op1 to?

No, I will fix them.

Thanks
Gui Haochen


[PATCH-3] Value Range: Add range op for builtin isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isnormal. It also adds two
help function in frange to detect range of normal floating-point and
range of subnormal or zero.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isnormal

The former patch adds optab for builtin isnormal. Thus builtin isnormal
might not be folded at front end.  So the range op for isnormal is needed
for value range analysis.  This patch adds range op for builtin isnormal.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index d69900d1f56..4c3f9c98282 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1281,6 +1281,60 @@ public:
   }
 } op_cfn_isfinite;

+//Implement range operator for CFN_BUILT_IN_ISNORMAL
+class cfn_isnormal :  public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isnormal ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ()
+   || op1.known_isdenormal_or_zero ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isnormal;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1383,6 +1437,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isfinite;
   break;

+case CFN_BUILT_IN_ISNORMAL:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isnormal;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
new file mode 100644
index 000..c4df4d839b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 37ce91dc52d..1443d1906e5 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -588,6 +588,8 @@ public:
   bool maybe_isinf () const;
   bool signbit_p (bool ) const;
   bool nan_signbit_p (bool ) const;
+  bool known_isnormal () const;
+  bool known_isdenormal_or_zero () const;

 protected:
   virtual bool contains_p (tree cst) const override;
@@ -1650,6 +1652,33 @@ frange::known_isfinite () const
   return (!maybe_isnan () && !real_isinf (_min) && !real_isinf (_max));
 }

+// Return TRUE if range is known to be normal.
+
+inline bool
+frange::known_isnormal () const
+{
+  if (!known_isfinite ())
+return false;
+
+  machine_mode mode = TYPE_MODE (type ());
+  return (!real_isdenormal (_min, mode) && !real_isdenormal (_max, mode)
+ && !real_iszero (_min) && !real_iszero (_max)
+ && (!real_isneg (_min) || real_isneg (_max)));
+}
+
+// Return TRUE if 

[PATCH-2v3] Value Range: Add range op for builtin isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isfinite.

  Compared to previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite
might not be folded at front end.  So the range op for isfinite is needed
for value range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 922ee7bf0f7..49b6d7abde1 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1229,6 +1229,61 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1326,6 +1381,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-20 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Compared with previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..eb1b0aff77c 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1175,6 +1175,62 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (_bound ())
+   && !real_isinf (_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+


[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isnormal
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8432f84020..ccd57fce522 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ed70b3feea..b81b9dec18a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}.
 Set operand 0 to nonzero if operand 1 is a finite floating-point
 number and to 0 otherwise.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Set operand 0 to nonzero if operand 1 is a normal floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isfinite
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b8432f84020 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..8ed70b3feea 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Set operand 0 to nonzero if operand 1 is a finite floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-19 Thread HAO CHEN GUI
Hi Andrew,

在 2024/5/19 3:42, Andrew Pinski 写道:
> This is missing adding documentation for the new optab.
> It should be documented in md.texi under `Standard Pattern Names For
> Generation` section.

Thanks for your reminder. I will add ones for all patches.

Thanks
Gui Haochen


[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is not to test
if pseudo can be created in expand and modify dg-options and
dg-finals of test cases according to reviewer's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649368.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isnormal for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index ab17178e0a8..cae30dc431e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5353,6 +5353,28 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..2df472e35d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..0416970b89b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isfinite for SFDF and IEEE128 by
test data class instructions.

  Compared with previous version, the main change is not to test
if pseudo can be created in expand and modify dg-options and
dg-finals of test cases according to reviewer's advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isfinite for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2 for SFDF): New expand.
(isfinite2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f0cc02f7e7b..cbb538d6d86 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5333,6 +5333,28 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..01faa962bd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..5fc98084274
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SFDF and IEEE128 by test
data class instructions.

  Compared with previous version, the main change is to modify
the dg-options and dg-finals of test cases according to reviewer's
advice.
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFDF and IEEE128

gcc/
PR target/97786
* config/rs6000/vsx.md (isinf2 for SFDF): New expand.
(isinf2 for IEEE128): New expand.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..fa20fb4df91 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..c1c4f64ee8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmp} } } */
+/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..21d90868268
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread HAO CHEN GUI
Hi Segher,
  Thanks for your review comments. I will modify it and resend. Just
one question on the insn condition.

在 2024/5/17 1:25, Segher Boessenkool 写道:
>> +(define_expand "isnormal2"
>> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
>> +(use (match_operand:SFDF 1 "gpc_reg_operand"))]
>> +  "TARGET_HARD_FLOAT
>> +   && TARGET_P9_VECTOR"
> Please put the condition on just one line if it is as simple and short
> as this.
> 
> Why is TARGET_P9_VECTOR the correct condition?

This expand calls gen_xststdcp which is a P9 vector instruction and
relies on "TARGET_P9_VECTOR". So I set the condition.


Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-15 Thread HAO CHEN GUI
Hi Andrew,
  Thanks so much for your explanation. I got it. I will address the issue.

Thanks
Gui Haochen

在 2024/5/15 2:45, Andrew MacLeod 写道:
> 
> On 5/9/24 04:47, HAO CHEN GUI wrote:
>> Hi Mikael,
>>
>>    Thanks for your comments.
>>
>> 在 2024/5/9 16:03, Mikael Morin 写道:
>>> I think the canonical API behaviour sets R to varying and returns true 
>>> instead of just returning false if nothing is known about the range.
>>>
>>> I'm not sure whether it makes any difference; Aldy can probably tell. But 
>>> if the type is bool, varying is [0,1] which is better than unknown range.
>> Should the varying be set by caller when fold_range returns false?
>> Just like following codes in value-query.cc.
>>
>>    if (!op.fold_range (r, type, r0, r1))
>>  r.set_varying (type);
>>
> This would be dangerous in the general case.  fold_range may have returned 
> false because 'type' is an unsupported range type. Generally this is why we 
> prefer range-ops to return TRUE and VARYING rather than FALSE for unknown 
> values.   When FALSE is returned, we should stop working with ranges because 
> something is amok.
> 
> Andrew
> 


Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Jakub,
  Thanks for your review comments.

在 2024/5/14 23:57, Jakub Jelinek 写道:
> BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins,
> would be nice to handle the others as well.
> 
> E.g. isnormal/isnan/isinf, fpclassify etc.
> 
Yes, I already sent the patches which add range op for isnormal/isnan/isinf
for review. I will modify them according to review comments and submit them
again.

> Note, the man page says for e.g. isnormal that it returns nonzero or zero,
> but in reality I think we implement it always inline and can check if
> it always returns [0,1].
> Some others like isinf return [-1,1] though I think and fpclassify
> returns union of all the passed int values.

The gcc inline code always returns 0 or 1 for isnormal/isnan/isinf. But I
wonder if all targets' expand can promise it. The rs6000 has an instruction
for isnormal/isnan/isinf. So we're making the patch not to call inline codes
and expand them by ourselves. Though rs6000 instruction returns 0 or 1 for
them, not sure if other targets are the same.

Thanks
Gui Haochen



Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Mikael,

  Thanks for your comments.

在 2024/5/9 16:03, Mikael Morin 写道:
> I think the canonical API behaviour sets R to varying and returns true 
> instead of just returning false if nothing is known about the range.
> 
> I'm not sure whether it makes any difference; Aldy can probably tell. But if 
> the type is bool, varying is [0,1] which is better than unknown range.

Should the varying be set by caller when fold_range returns false?
Just like following codes in value-query.cc.

  if (!op.fold_range (r, type, r0, r1))
r.set_varying (type);

Thanks
Gui Haochen


Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-14 Thread HAO CHEN GUI
Hi,

在 2024/5/10 20:50, Richard Biener 写道:
> IMO give we're dispatching to the rtx_cost hook eventually it needs
> documenting there or alternatively catching zero and adjusting its
> result there.  Of course cost == 0 ? 1 : cost is wrong as it makes
> zero vs. one the same cost - using cost + 1 when from rtx_cost
> might be more correct, at least preserving relative costs.

I tested the draft patch which sets "cost > 0 ? cost + 1 : 1;". Some
regression cases are found on x86. The main problems are:

The cost compare with COSTS_N_INSNS (1) doesn't works any more with
the patch. As all costs are added with 1, the following compare
returns true when the cost is 5 but false originally.
  if (cost > COSTS_N_INSNS (1))

Another problem is the cost is from set_src_cost, it doesn't take dest
into consideration. For example, the cost of a store "[`x']=r109:SI"
is set to 1 as it only measure the cost of set_src. It seems
unreasonable.

IMHO, the cost less than COSTS_N_INSNS (1) is meaningful in rtx_cost
calculation but unreasonable for an insn. Should the minimum cost of
an insn be set to COSTS_N_INSNS (1)?

Thanks
Gui Haochen


Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-13 Thread HAO CHEN GUI
Hi Aldy,
  Thanks for your review comments.

在 2024/5/13 19:18, Aldy Hernandez 写道:
> On Thu, May 9, 2024 at 10:05 AM Mikael Morin  wrote:
>>
>> Hello,
>>
>> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit :
>>> Hi,
>>>The former patch adds isfinite optab for __builtin_isfinite.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
>>>
>>>Thus the builtin might not be folded at front end. The range op for
>>> isfinite is needed for value range analysis. This patch adds them.
>>>
>>>Compared to last version, this version fixes a typo.
>>>
>>>Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> Value Range: Add range op for builtin isfinite
>>>
>>> The former patch adds optab for builtin isfinite. Thus builtin isfinite 
>>> might
>>> not be folded at front end.  So the range op for isfinite is needed for 
>>> value
>>> range analysis.  This patch adds range op for builtin isfinite.
>>>
>>> gcc/
>>>   * gimple-range-op.cc (class cfn_isfinite): New.
>>>   (op_cfn_finite): New variables.
>>>   (gimple_range_op_handler::maybe_builtin_call): Handle
>>>   CFN_BUILT_IN_ISFINITE.
>>>
>>> gcc/testsuite/
>>>   * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.
>>>
>>> patch.diff
>>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>>> index 9de130b4022..99c511728d3 100644
>>> --- a/gcc/gimple-range-op.cc
>>> +++ b/gcc/gimple-range-op.cc
>>> @@ -1192,6 +1192,56 @@ public:
>>> }
>>>   } op_cfn_isinf;
>>>
>>> +//Implement range operator for CFN_BUILT_IN_ISFINITE
>>> +class cfn_isfinite : public range_operator
>>> +{
>>> +public:
>>> +  using range_operator::fold_range;
>>> +  using range_operator::op1_range;
>>> +  virtual bool fold_range (irange , tree type, const frange ,
>>> +const irange &, relation_trio) const override
>>> +  {
>>> +if (op1.undefined_p ())
>>> +  return false;
>>> +
>>> +if (op1.known_isfinite ())
>>> +  {
>>> + r.set_nonzero (type);
>>> + return true;
>>> +  }
>>> +
>>> +if (op1.known_isnan ()
>>> + || op1.known_isinf ())
>>> +  {
>>> + r.set_zero (type);
>>> + return true;
>>> +  }
>>> +
>>> +return false;
>> I think the canonical API behaviour sets R to varying and returns true
>> instead of just returning false if nothing is known about the range.
> 
> Correct.  If we know it's varying, we just set varying and return
> true.  Returning false is usually reserved for "I have no idea".
> However, every caller of fold_range() should know to ignore a return
> of false, so you should be safe.

So it's better to set varying here and return true?
> 
>>
>> I'm not sure whether it makes any difference; Aldy can probably tell.
>> But if the type is bool, varying is [0,1] which is better than unknown
>> range.
> 
> Also, I see you're setting zero/nonzero.  Is the return type known to
> be boolean, because if so, we usually prefer to one of:
The return type is int. For __builtin_isfinite, the result is nonzero when
the float is a finite number, 0 otherwise.

> 
> r = range_true ()
> r = range_false ()
> r = range_true_and_false ();
> 
> It doesn't matter either way, but it's probably best to use these as
> they force boolean_type_node automatically.
> 
> I don't have a problem with this patch, but I would prefer the
> floating point savvy people to review this, as there are no members of
> the ranger team that are floating point experts :).
> 
> Also, I see you mention in your original post that this patch was
> needed as a follow-up to this one:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
> 
> I don't see the above patch in the source tree currently:
Sorry, I may not express it clear. I sent a series of patches for review.
Some patches depend on others. The patch I mentioned is a patch also
under review.

Here is the list of the series of patches. Some of them are generic, and
others are rs6000 specific.

[PATCH] Value Range: Add range op for builtin isinf
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html

[patch, rs6000] Implement optab_isinf for SFmo

Ping [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-12 Thread HAO CHEN GUI
Hi,
  Gently ping the series of patches.
[PATCH-1, rs6000]Add a new type of CC mode - CCBCD for bcd insns [PR100736, 
PR114732]
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650217.html
[PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650218.html
[PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650219.html
[PATCH-4, rs6000] Optimize single cc bit reverse implementation
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650220.html
[PATCH-5, rs6000] Replace explicit CC bit reverse with common format
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html
[PATCH-6, rs6000] Split setcc to two insns after reload
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html

Thanks
Gui Haochen

在 2024/4/30 15:18, HAO CHEN GUI 写道:
> Hi,
>   It's the first patch of a series of patches optimizing CC modes on
> rs6000.
> 
>   bcd insns set all four bits of a CR field. But it has different single
> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
> to indict overflow or invalid number. It's not a bit for unordered test.
> So the "le" test should be reversed to "gt" not "ungt". The "ge" test
> should be reversed to "lt" not "unlt". That's the root cause of PR100736
> and PR114732.
> 
>   This patch fixes the issue by adding a new type of CC mode - CCBCD for
> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
> be merged to a uniform pattern which is for all CC modes in sequential
> patch.
> 
>   The rtl code "unordered" is still used for testing overflow or
> invalid number. IMHO, the "unordered" on a CC mode can be considered as
> testing the forth bit of a CR field setting or not. The "eq" on a CC mode
> can be considered as testing the third bit setting or not. Thus we avoid
> creating lots of unspecs for the CR bit testing.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add a new type of CC mode - CCBCD for bcd insns
> 
> gcc/
>   PR target/100736
>   PR target/114732
>   * config/rs6000/altivec.md (bcd_): Replace CCFP
>   with CCBCD.
>   (*bcd_test_): Likewise.
>   (*bcd_test2_): Likewise.
>   (bcd__): Likewise.
>   (*bcdinvalid_): Likewise.
>   (bcdinvalid_): Likewise.
>   (bcdshift_v16qi): Likewise.
>   (bcdmul10_v16qi): Likewise.
>   (bcddiv10_v16qi): Likewise.
>   (peephole for bcd_add/sub): Likewise.
>   * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
>   and its supported comparison codes.
>   * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
>   * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
>   assertion.
>   * config/rs6000/rs6000.md (CC_any): Add CCBCD.
>   (ccbcd_rev): New code iterator.
>   (*_cc): New insn and split pattern for CCBCD reverse
>   compare.
> 
> gcc/testsuite/
>   PR target/100736
>   PR target/114732
>   * gcc.target/powerpc/pr100736.c: New.
>   * gcc.target/powerpc/pr114732.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index bb20441c096..9fa8cf89f61 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -4443,7 +4443,7 @@ (define_insn "bcd_"
> (match_operand:VBCD 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_1_operand" "n")]
>UNSPEC_BCD_ADD_SUB))
> -   (clobber (reg:CCFP CR6_REGNO))]
> +   (clobber (reg:CCBCD CR6_REGNO))]
>"TARGET_P8_VECTOR"
>"bcd. %0,%1,%2,%3"
>[(set_attr "type" "vecsimple")])
> @@ -4454,8 +4454,8 @@ (define_insn "bcd_"
>  ;; probably should be one that can go in the VMX (Altivec) registers, so we
>  ;; can't use DDmode or DFmode.
>  (define_insn "*bcd_test_"
> -  [(set (reg:CCFP CR6_REGNO)
> - (compare:CCFP
> +  [(set (reg:CCBCD CR6_REGNO)
> + (compare:CCBCD
>(unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")
>  (match_operand:VBCD 2 "register_operand" "v")
>  (match_operand:QI 3 "const_0_to_1_operand" "i")]
> @@ -4472,8 +4472,8 @@ (define_insn "*bcd_test2_"
> (match_operand:VBCD 2 "register_operand" "v")
>  

[PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-12 Thread HAO CHEN GUI
Hi,
   The cost return from set_src_cost might be zero. Zero for
pattern_cost means unknown cost. So the regularization converts the zero
to COSTS_N_INSNS (1).

   // pattern_cost
   cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
   return cost > 0 ? cost : COSTS_N_INSNS (1);

   But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's
untouched and just returned by pattern_cost. Thus "zero" from set_src_cost
is higher than "one" from set_src_cost.

  For instance, i386 returns cost "one" for zero_extend op.
//ix86_rtx_costs
case ZERO_EXTEND:
  /* The zero extensions is often completely free on x86_64, so make
 it as cheap as possible.  */
  if (TARGET_64BIT && mode == DImode
  && GET_MODE (XEXP (x, 0)) == SImode)
*total = 1;

  This patch fixes the problem by converting all costs which are less than
COSTS_N_INSNS (1) to COSTS_N_INSNS (1).

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rtlanal: Correct cost regularization in pattern_cost

For the pattern_cost (insn_cost), the smallest known cost is
COSTS_N_INSNS (1) and zero means the cost is unknown.  The method calls
set_src_cost which might returns 0 or a value less than COSTS_N_INSNS (1).
For these cases, pattern_cost should always return COSTS_N_INSNS (1).
Current regularization is wrong and a value less than COSTS_N_INSNS (1)
but larger than 0 will be returned.  This patch corrects it.

gcc/
* rtlanal.cc (pattern_cost): Return COSTS_N_INSNS (1) when the cost
is less than COSTS_N_INSNS (1).

patch.diff
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 4158a531bdd..f7b3d7d72ce 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -5762,7 +5762,7 @@ pattern_cost (rtx pat, bool speed)
 return 0;

   cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
-  return cost > 0 ? cost : COSTS_N_INSNS (1);
+  return cost > COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);
 }

 /* Calculate the cost of a single instruction.  A return value of zero


Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread HAO CHEN GUI
Hi Richard,
  Thanks for your comments.

在 2024/5/10 15:16, Richard Biener 写道:
> But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
> longer meaningful.  So shouldn't it instead be
> 
>   return cost > 0 ? cost : 1;
Yes, it's better.

> 
> ?  Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost
> is invalid and thus the target is at fault (I do think that making zero the
> unknown value is quite bad since that makes it impossible to have zero
> as cost represented).
> 
> It seems the check is to aovid pattern_cost return zero (unknown), so the
> comment holds to pattern_cost the same (it returns an 'int' so the better
> exceptional value would have been -1, avoiding the compare).
But sometime it adds an insn cost. If the unknown cost is -1, the total cost
might be distorted.

> 
> Richard.

Thanks
Gui Haochen


[PATCHv2] rs6000: Enable overlapped by-pieces operations

2024-05-10 Thread HAO CHEN GUI
Hi,
  This patch enables overlapped by-piece operations. On rs6000, default
move/set/clear ratio is 2. So the overlap is only enabled with compare
by-pieces.

  Compared to previous version, the change is to remove power8
requirement from test case.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651045.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable overlapped by-pieces operations

This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
ratio is 2.  So the overlap is only enabled with compare by-pieces.

gcc/
* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-9.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 117999613d8..e713a1e1d57 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1776,6 +1776,9 @@ static const scoped_attribute_specs *const 
rs6000_attribute_table[] =
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x8000

+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
+
 

 /* Processor table.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
new file mode 100644
index 000..f16429c2ffb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
+
+/* Test if by-piece overlap compare is enabled and following case is
+   implemented by two overlap word loads and compares.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 7) == 0;
+}


[PATCH-1v2] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-05-09 Thread HAO CHEN GUI
Hi,
  This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the cost is
determined by destination operand (memory or pseudo). Unfortunately,
rtx cost can't help.

  The test case is added in the second target specific patch.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643995.html

  Compared to previous version, the main change is not to do
substitution if either new or old insn cost is zero. The zero means
the cost is unknown.

 Previous version
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643994.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

ChangeLog
fwprop: Replace set_src_cost with insn_cost in try_fwprop_subst_pattern

gcc/
* fwprop.cc (try_fwprop_subst_pattern): Replace set_src_cost with
insn_cost.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index cb6fd6700ca..184a22678b7 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -470,21 +470,19 @@ try_fwprop_subst_pattern (obstack_watermark , 
insn_change _change,
   redo_changes (0);
 }

-  /* ??? In theory, it should be better to use insn costs rather than
- set_src_costs here.  That would involve replacing this code with
- change_is_worthwhile.  */
   bool ok = recog (attempt, use_change);
   if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
-if (rtx use_set = single_set (use_rtl))
+if (single_set (use_rtl))
   {
bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
+   auto new_cost = insn_cost (use_rtl, speed);
temporarily_undo_changes (0);
-   auto old_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
+   /* Invalide recog data.  */
+   INSN_CODE (use_rtl) = -1;
+   auto old_cost = insn_cost (use_rtl, speed);
redo_changes (0);
-   auto new_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
-   if (new_cost > old_cost
+   if (new_cost == 0 || old_cost == 0
+   || new_cost > old_cost
|| (new_cost == old_cost && !prop.likely_profitable_p ()))
  {
if (dump_file)


Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-09 Thread HAO CHEN GUI
Hi Kewen,

在 2024/5/9 13:44, Kewen.Lin 写道:
> Why does it need power8 forced here?

I think it over. It's no need. For the sub-targets which library is
called, l[hb]z won't be generated too.

Thanks
Gui Haochen


Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-09 Thread HAO CHEN GUI
Hi Kewen,
  Thanks for your comments.

在 2024/5/9 13:44, Kewen.Lin 写道:
> Hi,
> 
> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>> Hi,
>>   This patch enables overlapped by-piece operations. On rs6000, default
>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>> by-pieces.
> 
> Thanks for enabling this, did you evaluate if it can help some benchmark?

Tested it with SPEC2017. No obvious performance impact. I think memory
compare might not be hot enough.

Tested it with my micro benchmark. 5-10% performance gain when compare
length is 7.

> 
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> rs6000: Enable overlapped by-pieces operations
>>
>> This patch enables overlapped by-piece operations by defining
>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>
>> gcc/
>>  * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>
>> gcc/testsuite/
>>  * gcc.target/powerpc/block-cmp-9.c: New.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 6b9a40fcc66..2b5f5cf1d86 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
>> rs6000_attribute_table[] =
>>  #undef TARGET_CONST_ANCHOR
>>  #define TARGET_CONST_ANCHOR 0x8000
>>
>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>> +
>>  
>>
>>  /* Processor table.  */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> new file mode 100644
>> index 000..b5f51affbb7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> Why does it need power8 forced here?

I just want to exclude P7 LE as targetm.slow_unaligned_access return false
for it and the expand cmpmemsi won't be invoked.

> 
> BR,
> Kewen
> 
>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>> +
>> +/* Test if by-piece overlap compare is enabled and following case is
>> +   implemented by two overlap word loads and compares.  */
>> +
>> +int foo (const char* s1, const char* s2)
>> +{
>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>> +}
> 

Thanks
Gui Haochen


[PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-08 Thread HAO CHEN GUI
Hi,
  This patch enables overlapped by-piece operations. On rs6000, default
move/set/clear ratio is 2. So the overlap is only enabled with compare
by-pieces.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable overlapped by-pieces operations

This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
ratio is 2.  So the overlap is only enabled with compare by-pieces.

gcc/
* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-9.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6b9a40fcc66..2b5f5cf1d86 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
rs6000_attribute_table[] =
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x8000

+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
+
 

 /* Processor table.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
new file mode 100644
index 000..b5f51affbb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
+
+/* Test if by-piece overlap compare is enabled and following case is
+   implemented by two overlap word loads and compares.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 7) == 0;
+}


Ping^3 [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]

2024-05-07 Thread HAO CHEN GUI
Hi,
  As now it's stage-1, gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html

Gui Haochen
Thanks

在 2023/4/24 13:35, HAO CHEN GUI 写道:
> Hi,
>   Gently ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html
> 
> Thanks
> Gui Haochen
> 
> 在 2023/2/20 10:10, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping this:
>>   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html
>>
>> Gui Haochen
>> Thanks
>>
>> 在 2023/2/8 13:08, HAO CHEN GUI 写道:
>>> Hi,
>>>   The logical operations for TImode is split after reload pass right now. 
>>> Some
>>> potential optimizations miss as the split is too late. This patch removes
>>> TImode from "AND", "IOR", "XOR" and "NOT" expander so that these logical
>>> operations can be split at expand pass. The new test case illustrates the
>>> optimization.
>>>
>>>   Two test cases of pr92398 are merged into one as all sub-targets generates
>>> the same sequence of instructions with the patch.
>>>
>>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>>
>>> ChangeLog
>>> 2023-02-08  Haochen Gui 
>>>
>>> gcc/
>>> PR target/100694
>>> * config/rs6000/rs6000.md (BOOL_128_V): New mode iterator for 128-bit
>>> vector types.
>>> (and3): Replace BOOL_128 with BOOL_128_V.
>>> (ior3): Likewise.
>>> (xor3): Likewise.
>>> (one_cmpl2 expander): New expander with BOOL_128_V.
>>> (one_cmpl2 insn_and_split): Rename to ...
>>> (*one_cmpl2): ... this.
>>>
>>> gcc/testsuite/
>>> PR target/100694
>>> * gcc.target/powerpc/pr100694.c: New.
>>> * gcc.target/powerpc/pr92398.c: New.
>>> * gcc.target/powerpc/pr92398.h: Remove.
>>> * gcc.target/powerpc/pr92398.p9-.c: Remove.
>>> * gcc.target/powerpc/pr92398.p9+.c: Remove.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>> index 4bd1dfd3da9..455b7329643 100644
>>> --- a/gcc/config/rs6000/rs6000.md
>>> +++ b/gcc/config/rs6000/rs6000.md
>>> @@ -743,6 +743,15 @@ (define_mode_iterator BOOL_128 [TI
>>>  (V2DF  "TARGET_ALTIVEC")
>>>  (V1TI  "TARGET_ALTIVEC")])
>>>
>>> +;; Mode iterator for logical operations on 128-bit vector types
>>> +(define_mode_iterator BOOL_128_V   [(V16QI "TARGET_ALTIVEC")
>>> +(V8HI  "TARGET_ALTIVEC")
>>> +(V4SI  "TARGET_ALTIVEC")
>>> +(V4SF  "TARGET_ALTIVEC")
>>> +(V2DI  "TARGET_ALTIVEC")
>>> +(V2DF  "TARGET_ALTIVEC")
>>> +(V1TI  "TARGET_ALTIVEC")])
>>> +
>>>  ;; For the GPRs we use 3 constraints for register outputs, two that are the
>>>  ;; same as the output register, and a third where the output register is an
>>>  ;; early clobber, so we don't have to deal with register overlaps.  For the
>>> @@ -7135,23 +7144,23 @@ (define_expand "subti3"
>>>  ;; 128-bit logical operations expanders
>>>
>>>  (define_expand "and3"
>>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>> -   (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>> - (match_operand:BOOL_128 2 "vlogical_operand")))]
>>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>>> +   (and:BOOL_128_V (match_operand:BOOL_128_V 1 "vlogical_operand")
>>> +   (match_operand:BOOL_128_V 2 "vlogical_operand")))]
>>>""
>>>"")
>>>
>>>  (define_expand "ior3"
>>> -  [(set (match_operand:BOOL_128 0 "vlogical_operand")
>>> -(ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
>>> - (match_operand:BOOL_128 2 "vlogical_operand")))]
>>> +  [(set (match_operand:BOOL_128_V 0 "vlogical_operand")
>>> +   (ior:BOOL_128_V (match

Ping [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-05-07 Thread HAO CHEN GUI
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html

Thanks
Gui Haochen

在 2024/3/18 17:10, HAO CHEN GUI 写道:
> Hi,
>   Gently ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/3/11 13:41, HAO CHEN GUI 写道:
>> Hi,
>>   This patch tries to fix the problem when a canonical form doesn't benefit
>> on a specific target. The const operand of AND is and with the nonzero
>> bits of another operand in combine pass. It's a canonical form, but it's no
>> benefits for the target which has rotate and mask insns. As the mask is
>> truncated, it can't match the insn conditions which it originally matches.
>> For example, the following insn condition checks the sum of two AND masks.
>> When one of the mask is truncated, the condition breaks.
>>
>> (define_insn "*rotlsi3_insert_5"
>>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>>  (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>>  (match_operand:SI 2 "const_int_operand" "n,n"))
>>  (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>>  (match_operand:SI 4 "const_int_operand" "n,n"]
>>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
>> ...
>>
>>   This patch tries to fix the problem by comparing the rtx cost. If another
>> operand (varop) is not changed and rtx cost with new mask is not less than
>> the original one, the mask is restored to original one.
>>
>>   I'm not sure if comparison of rtx cost here is proper. The outer code is
>> unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
>> From my understanding, the canonical forms should always benefit as it can't
>> be undo in combine pass. Do we have a perfect solution for this kind of
>> issues? Looking forward for your advice.
>>
>>   Another similar issues for canonical forms. Whether the widen mode for
>> lshiftrt is always good?
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> Combine: Don't truncate const operand of AND if it's no benefits
>>
>> In combine pass, the canonical form is to turn off all bits in the constant
>> that are know to already be zero for AND.
>>
>>   /* Turn off all bits in the constant that are known to already be zero.
>>  Thus, if the AND isn't needed at all, we will have CONSTOP == 
>> NONZERO_BITS
>>  which is tested below.  */
>>
>>   constop &= nonzero;
>>
>> But it doesn't benefit when the target has rotate and mask insert insns.
>> The AND mask is truncated and lost its information.  Thus it can't match
>> the insn conditions.  For example, the following insn condition checks
>> the sum of two AND masks.
>>
>> (define_insn "*rotlsi3_insert_5"
>>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>>  (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>>  (match_operand:SI 2 "const_int_operand" "n,n"))
>>  (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>>  (match_operand:SI 4 "const_int_operand" "n,n"]
>>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
>> ...
>>
>> This patch restores the const operand of AND if the another operand is
>> not optimized and the truncated const operand doesn't save the rtx cost.
>>
>> gcc/
>>  * combine.cc (simplify_and_const_int_1): Restore the const operand
>>  of AND if varop is not optimized and the rtx cost of the new const
>>  operand is not reduced.
>>
>> gcc/testsuite/
>>  * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
>>  adjust the number of rotate and mask insns.
>>  * gcc.target/powerpc/rlwimi-1.c: Likewise.
>>  * gcc.target/powerpc/rlwimi-2.c: Likewise.
>>
>> patch.diff
>> diff --git a/gcc/combine.cc b/gcc/combine.cc
>> index a4479f8d836..16ff

Re: [Patch, rs6000] Enable overlap memory store for block memory clear

2024-05-07 Thread HAO CHEN GUI
Hi,
  As now it's stage 1, gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html

Thanks
Gui Haochen

在 2024/2/26 10:25, HAO CHEN GUI 写道:
> Hi,
>   This patch enables overlap memory store for block memory clear which
> saves the number of store instructions. The expander calls
> widest_fixed_size_mode_for_block_clear to get the mode for looped block
> clear and calls widest_fixed_size_mode_for_block_clear to get the mode
> for last overlapped clear.
> 
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk or next stage 1?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Enable overlap memory store for block memory clear
> 
> gcc/
>   * config/rs6000/rs6000-string.cc
>   (widest_fixed_size_mode_for_block_clear): New.
>   (smallest_fixed_size_mode_for_block_clear): New.
>   (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
>   get the mode for looped memory stores and call
>   smallest_fixed_size_mode_for_block_clear to get the mode for the last
>   overlapped memory store.
> 
> gcc/testsuite
>   * gcc.target/powerpc/block-clear-1.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 133e5382af2..c2a6095a586 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -38,6 +38,49 @@
>  #include "profile-count.h"
>  #include "predict.h"
> 
> +/* Return the widest mode which mode size is less than or equal to the
> +   size.  */
> +static fixed_size_mode
> +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int 
> align,
> + bool unaligned_vsx_ok)
> +{
> +  machine_mode mode;
> +
> +  if (TARGET_ALTIVEC
> +  && size >= 16
> +  && (align >= 128
> +   || unaligned_vsx_ok))
> +mode = V4SImode;
> +  else if (size >= 8
> +&& TARGET_POWERPC64
> +&& (align >= 64
> +|| !STRICT_ALIGNMENT))
> +mode = DImode;
> +  else if (size >= 4
> +&& (align >= 32
> +|| !STRICT_ALIGNMENT))
> +mode = SImode;
> +  else if (size >= 2
> +&& (align >= 16
> +|| !STRICT_ALIGNMENT))
> +mode = HImode;
> +  else
> +mode = QImode;
> +
> +  return as_a  (mode);
> +}
> +
> +/* Return the smallest mode which mode size is smaller than or eqaul to
> +   the size.  */
> +static fixed_size_mode
> +smallest_fixed_size_mode_for_block_clear (unsigned int size)
> +{
> +  if (size > UNITS_PER_WORD)
> +return as_a  (V4SImode);
> +
> +  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
> +}
> +
>  /* Expand a block clear operation, and return 1 if successful.  Return 0
> if we should let the compiler generate normal code.
> 
> @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
>HOST_WIDE_INT align;
>HOST_WIDE_INT bytes;
>int offset;
> -  int clear_bytes;
>int clear_step;
> 
>/* If this is not a fixed size move, just call memcpy */
> @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])
> 
>bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);
> 
> -  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
> +  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
> +   unaligned_vsx_ok);
> +  offset = 0;
> +  rtx dest;
> +
> +  do
>  {
> -  machine_mode mode = BLKmode;
> -  rtx dest;
> +  unsigned int size = GET_MODE_SIZE (mode);
> 
> -  if (TARGET_ALTIVEC
> -   && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
> +  while (bytes >= size)
>   {
> -   clear_bytes = 16;
> -   mode = V4SImode;
> - }
> -  else if (bytes >= 8 && TARGET_POWERPC64
> -&& (align >= 64 || !STRICT_ALIGNMENT))
> - {
> -   clear_bytes = 8;
> -   mode = DImode;
> -   if (offset == 0 && align < 64)
> - {
> -   rtx addr;
> +   dest = adjust_address (orig_dest, mode, offset);
> +   emit_move_insn (dest, CONST0_RTX (mode));
> 
> -   /* If the address form is reg+offset with offset not a
> -  multiple of four, reload into reg indirect form here
> -  rather than waiting for reload.  This way we get one
> -  reload, not one per store.  */
> -

[PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-06 Thread HAO CHEN GUI
Hi,
  The former patch adds isfinite optab for __builtin_isfinite.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Thus the builtin might not be folded at front end. The range op for
isfinite is needed for value range analysis. This patch adds them.

  Compared to last version, this version fixes a typo.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite might
not be folded at front end.  So the range op for isfinite is needed for value
range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 9de130b4022..99c511728d3 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1192,6 +1192,56 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+return false;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p ())
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+return false;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1288,6 +1338,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH-6, rs6000] Split setcc to two insns after reload

2024-05-06 Thread HAO CHEN GUI
Hi,
  It's the sixth patch of a series of patches optimizing CC modes on
rs6000.

  This patch splits setcc to two separate insns after reload so that
other insns can be inserted between them. It should increase the
parallelism.

  The rotate_cr pattern still needs the info of the number of cr fields
as the pass pro_and_epilogue might change the cr register.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Split setcc to two insns after reload

This patch splits setcc to two separate insns after reload so that other
insns can be inserted between them.

gcc/
* config/rs6000/rs6000.md (c_enum unpsec): Add UNSPEC_MFCR and
UNSPEC_ROTATE_CR.
(*move_from_cr): New.
(insn set_cc): Remove.
(*rotate_cr): New.
(insn_and_split set_cc): New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ccf392b6409..0ad08e3111e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -159,6 +159,8 @@ (define_c_enum "unspec"
UNSPEC_XXSPLTIW_CONST
UNSPEC_FMAX
UNSPEC_FMIN
+   UNSPEC_MFCR
+   UNSPEC_ROTATE_CR
   ])

 ;;
@@ -12744,26 +12746,51 @@ (define_insn_and_split "*cmp_internal2"
 }
 })
 
-;; Now we have the scc insns.  We can do some combinations because of the
-;; way the machine works.
-;;
-;; Note that this is probably faster if we can put an insn between the
-;; mfcr and rlinm, but this is tricky.  Let's leave it for now.  In most
-;; cases the insns below which don't use an intermediate CR field will
-;; be used instead.
-(define_insn "set_cc"
+
+(define_insn "*move_from_cr"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
-   (match_operator:GPR 1 "scc_comparison_operator"
-   [(match_operand 2 "cc_reg_operand" "y")
-(const_int 0)]))]
+   (unspec:GPR [(match_operand 1 "cc_reg_operand" "y")]
+   UNSPEC_MFCR))]
   ""
-  "mfcr %0%Q2\;rlwinm %0,%0,%J1,1"
+  "mfcr %0%Q1"
   [(set (attr "type")
  (cond [(match_test "TARGET_MFCRF")
(const_string "mfcrf")
   ]
-   (const_string "mfcr")))
-   (set_attr "length" "8")])
+   (const_string "mfcr")))])
+
+;; Split the insn after reload so that other insns can be inserted
+;; between mfcr and rlinm.
+(define_insn_and_split "set_cc"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (match_operator:GPR 1 "scc_comparison_operator"
+   [(match_operand 2 "cc_reg_operand" "y")
+(const_int 0)]))]
+  "!TARGET_POWER10
+   || (GET_MODE (operands[2]) != CCmode
+   && GET_MODE (operands[2]) != CCUNSmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (unspec:GPR [(match_dup 2)]
+   UNSPEC_MFCR))
+   (set (match_dup 0)
+   (unspec:GPR [(match_dup 0)
+(match_dup 1)]
+   UNSPEC_ROTATE_CR))]
+  ""
+  [(set_attr "length" "8")])
+
+(define_insn "*rotate_cr"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (unspec:GPR [(match_operand:GPR 3 "gpc_reg_operand" "r")
+(match_operator:GPR 1 "scc_comparison_operator"
+   [(match_operand 2 "cc_reg_operand" "y")
+(const_int 0)])]
+   UNSPEC_ROTATE_CR))]
+  ""
+  "rlwinm %0,%3,%J1,1"
+)

 (define_insn_and_split "*set_rev"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")


[PATCH-5, rs6000] Replace explicit CC bit reverse with common format

2024-05-06 Thread HAO CHEN GUI
Hi,
  It's the fifth patch of a series of patches optimizing CC modes on
rs6000.

  There are some explicit CR6 bit reverse (mfcr/xor) expand in vector.md.
As the forth patch optimized CC bit reverse implement, the patch changes
the explicit format to the common format (testing if the bit is not set).
With the common format, it can matches different implement on different
sub-targets. On Power10, it should be setbcr. On Power9, it's isel. On
Power8 and below, it's mfcr/xor.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Replace explicit CC bit reverse with common format

This patch replaces explicit CC bit reverse (mfcr/xor) with the common format
so that it can match setbcr on Power 10, isel on Power 9 and mfcr/xor on
other sub-targets.

gcc/
* config/rs6000/vector.md (vector_ae__p): Replace explicit CC
bit reverse with common format.
(vector_ae_v2di_p): Likewise.
(vector_ae_v1ti_p): Likewise.
(vector_ae__p): Likewise.
(cr6_test_for_zero): Likewise.
(cr6_test_for_lt): Likewise.

gcc/testsuite/
* gcc.target/powerpc/vsu/vec-any-eq-10.c: Replace rlwinm with isel.
* gcc.target/powerpc/vsu/vec-any-eq-14.c: Replace rlwinm with isel.
* gcc.target/powerpc/vsu/vec-any-eq-7.c: Replace rlwinm with isel.
* gcc.target/powerpc/vsu/vec-any-eq-8.c: Replace rlwinm with isel.
* gcc.target/powerpc/vsu/vec-any-eq-9.c: Replace rlwinm with isel.

patch.diff
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index f86c1f2990e..b1bbf9bac2d 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -942,11 +942,8 @@ (define_expand "vector_ae__p"
  (ne:VI (match_dup 1)
 (match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (lt:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ge:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_P9_VECTOR"
 {
   operands[3] = gen_reg_rtx (mode);
@@ -1027,11 +1024,8 @@ (define_expand "vector_ae_v2di_p"
  (eq:V2DI (match_dup 1)
   (match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (eq:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ne:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_P9_VECTOR"
 {
   operands[3] = gen_reg_rtx (V2DImode);
@@ -1048,11 +1042,8 @@ (define_expand "vector_ae_v1ti_p"
  (eq:V1TI (match_dup 1)
   (match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (eq:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ne:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_POWER10"
 {
   operands[3] = gen_reg_rtx (V1TImode);
@@ -1095,11 +1086,8 @@ (define_expand "vector_ae__p"
  (eq:VEC_F (match_dup 1)
(match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (eq:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ne:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_P9_VECTOR"
 {
   operands[3] = gen_reg_rtx (mode);
@@ -1172,11 +1160,8 @@ (define_expand "cr6_test_for_zero"
 ;; integer constant first argument equals one (aka __CR6_EQ_REV in altivec.h).
 (define_expand "cr6_test_for_zero_reverse"
   [(set (match_operand:SI 0 "register_operand" "=r")
-   (eq:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ne:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_ALTIVEC || TARGET_VSX"
   "")

@@ -1198,11 +1183,8 @@ (define_expand "cr6_test_for_lt"
 ;; (aka __CR6_LT_REV in altivec.h).
 (define_expand "cr6_test_for_lt_reverse"
   [(set (match_operand:SI 0 "register_operand" "=r")
-   (lt:SI (reg:CCLTEQ CR6_REGNO)
-  (const_int 0)))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (const_int 1)))]
+   (ge:SI (reg:CCLTEQ CR6_REGNO)
+  (const_int 0)))]
   "TARGET_ALTIVEC || TARGET_VSX"
   "")

diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c 
b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c
index 30dfc83a97b..9743a496fb5 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eq-10.c
@@ -15,4 +15,4 @@ test_any_equal (vector unsigned long long *arg1_p,
 }

 /* { dg-final { scan-assembler "vcmpequd." } } */
-/* { dg-final { scan-assembler "rlwinm 

[PATCH-4, rs6000] Optimize single cc bit reverse implementation

2024-04-30 Thread HAO CHEN GUI
Hi,
  It's the forth patch of a series of patches optimizing CC modes on
rs6000.

  The single CC bit reverse can be implemented by setbcr on Power10 or
isel on Power9 or mfcr on Power8 and below. Originally CCFP is not
supported for isel and setbcr as bcd insns use CCFP and its bit reverse
is not the same as normal CCFP mode. Previous patches add new CC modes
according to the usage of CC bits. So now single CC bit reverse can be
supported on all CC modes with a uniform pattern.

  This patch removes unordered and ordered from codes list of CCFP with
finite_math_only set. These two are no needed as bcd insns use a separate
CC mode now. reverse_condition is replaced with rs6000_reverse_condition
as all CC modes can be reversed. A new isel version single CC bit reverse
pattern is added. fp and bcd CC reverse pattern are removed and a uniform
single CC bit reverse pattern is added, which is mfcr version.

  The new test cases illustrate the different implementation of single cc
bit reverse test.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Optimize single cc bit reverse implementation

This patch implements single cc bit reverse by mfcr (on Power8 and below)
or isel (on Power9) or setbcr (on Power10) with all CC modes.

gcc/
* config/rs6000/predicates.md (branch_comparison_operator): Remove
unordered and ordered from CCFP with finite_math_only.
(scc_comparison_operator): Add unle and unge.
* config/rs6000/rs6000.md (CCANY): Add CCFP, CCBCD, CCLTEQ and CCEQ.
(*isel_reversed__): Replace reverse_condition
with rs6000_reverse_condition.
(*set_rev): New insn_and_split pattern for
single cc bit reverse P9 version.
(fp_rev, ccbcd_rev): Remove.
(*_cc): Remove the pattern for CCFP and CCBCD.  Merge
them to...
(*set_rev): ...this, the new insn_and_split
pattern for single cc bit reverse P8 and below version.

gcc/testsuite/
* gcc.target/powerpc/cc_rev.h: New.
* gcc.target/powerpc/cc_rev_1.c: New.
* gcc.target/powerpc/cc_rev_2.c: New.
* gcc.target/powerpc/cc_rev_3.c: New.

patch.diff
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 322e7639fd4..ddb46799bff 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1348,7 +1348,7 @@ (define_predicate "branch_comparison_operator"
(match_test "GET_MODE_CLASS (GET_MODE (XEXP (op, 0))) == MODE_CC")
(if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCFPmode")
  (if_then_else (match_test "flag_finite_math_only")
-   (match_code "lt,le,gt,ge,eq,ne,unordered,ordered")
+   (match_code "lt,le,gt,ge,eq,ne")
(match_code "lt,gt,eq,unordered,unge,unle,ne,ordered"))
  (if_then_else (match_test "GET_MODE (XEXP (op, 0)) == CCBCDmode")
(match_code "lt,le,gt,ge,eq,ne,unordered,ordered")
@@ -1397,7 +1397,7 @@ (define_predicate "scc_comparison_operator"
 ;; an SCC insn.
 (define_predicate "scc_rev_comparison_operator"
   (and (match_operand 0 "branch_comparison_operator")
-   (match_code "ne,le,ge,leu,geu,ordered")))
+   (match_code "ne,le,ge,leu,geu,ordered,unle,unge")))

 ;; Return 1 if OP is a comparison operator suitable for floating point
 ;; vector/scalar comparisons that generate a -1/0 mask.
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2c6255395d1..ccf392b6409 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5509,7 +5509,7 @@ (define_expand "movcc"
 ;; leave out the mode in operand 4 and use one pattern, but reload can
 ;; change the mode underneath our feet and then gets confused trying
 ;; to reload the value.
-(define_mode_iterator CCANY [CC CCUNS])
+(define_mode_iterator CCANY [CC CCUNS CCFP CCBCD CCLTEQ CCEQ])
 (define_insn "isel__"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r")
(if_then_else:GPR
@@ -5536,7 +5536,8 @@ (define_insn "*isel_reversed__"
 (match_operand:GPR 3 "reg_or_zero_operand" "O,b")))]
   "TARGET_ISEL"
 {
-  PUT_CODE (operands[1], reverse_condition (GET_CODE (operands[1])));
+  PUT_CODE (operands[1], rs6000_reverse_condition (mode,
+  GET_CODE (operands[1])));
   return "isel %0,%3,%2,%j1";
 }
   [(set_attr "type" "isel")])
@@ -12764,6 +12765,27 @@ (define_insn "set_cc"
(const_string "mfcr")))
(set_attr "length" "8")])

+(define_insn_and_split "*set_rev"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (match_operator:GPR 1 "scc_rev_comparison_operator"
+   [(match_operand:CCANY 2 "cc_reg_operand" "y")
+(const_int 0)]))]
+  "TARGET_ISEL
+   && !TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+   (const_int 1))
+   (set (match_dup 0)
+   (if_then_else:GPR
+ 

[PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ

2024-04-30 Thread HAO CHEN GUI
Hi,
  It's the third patch of a series of patches optimizing CC modes on
rs6000.

  This patch sets CC mode of vector string isolate insns to CCEQ instead
of CCFP as these insns only set/check CR bit 2.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Set CC mode of vector string isolate insns to CCEQ

gcc/
* config/rs6000/altivec.md (vstrir_p_direct_): Replace CCFP
with CCEQ.
(vstril_p_direct_): Likewise.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index bd79a3f9e84..a883a814a82 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -932,9 +932,9 @@ (define_insn "vstrir_p_direct_"
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))
-   (set (reg:CC CR6_REGNO)
-   (unspec:CC [(match_dup 1)]
-  UNSPEC_VSTRIR))]
+   (set (reg:CCEQ CR6_REGNO)
+   (unspec:CCEQ [(match_dup 1)]
+UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstrir. %0,%1"
   [(set_attr "type" "vecsimple")])
@@ -984,9 +984,9 @@ (define_insn "vstril_p_direct_"
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))
-   (set (reg:CC CR6_REGNO)
-   (unspec:CC [(match_dup 1)]
-  UNSPEC_VSTRIR))]
+   (set (reg:CCEQ CR6_REGNO)
+   (unspec:CCEQ [(match_dup 1)]
+UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstril. %0,%1"
   [(set_attr "type" "vecsimple")])



[PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ

2024-04-30 Thread HAO CHEN GUI
Hi,
  It's the second patch of a series of patches optimizing CC modes on
rs6000.

  This patch adds a new type of CC mode - CCLTEQ used for the case which
only set CR bit 0 and 2. The bit 1 and 3 are not used. The vector compare
and test data class instructions are the cases.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add a new type of CC mode - CCLTEQ

The new mode is used for the case which only checks cr bit 0 and 2.

gcc/
* config/rs6000/altivec.md (altivec_vcmpequ_p): Replace
CCFP with CCLTEQ.
(altivec_vcmpequt_p): Likewise.
(*altivec_vcmpgts_p): Likewise.
(*altivec_vcmpgtst_p): Likewise.
(*altivec_vcmpgtu_p): Likewise.
(*altivec_vcmpgtut_p): Likewise.
(*altivec_vcmpeqfp_p): Likewise.
(*altivec_vcmpgtfp_p): Likewise.
(*altivec_vcmpgefp_p): Likewise.
(altivec_vcmpbfp_p): Likewise.
* config/rs6000/predicates.md (branch_comparison_operator): Add
CCLTEQ and its supported comparison codes.
* config/rs6000/rs6000-modes.def (CC_MODE): Add CCLTEQ.
* config/rs6000/rs6000.cc (validate_condition_mode): Add assertion
for CCLTEQ.
* config/rs6000/rs6000.md (CC_any): Add CCLTEQ.
* config/rs6000/vector.md (vector_eq__p): Replace CCFP with
CCLTEQ.
(vector_eq_v1ti_p): Likewise.
(vector_ne__p): Likewise.
(vector_ae__p): Likewise.
(vector_nez__p): Likewise.
(vector_ne_v2di_p): Likewise.
(vector_ne_v1ti_p): Likewise.
(vector_ae_v2di_p): Likewise.
(vector_ae_v1ti_p): Likewise.
(vector_ne__p): Likewise.
(vector_ae__p): Likewise.
(vector_gt__p): Likewise.
(vector_gt_v1ti_p): Likewise.
(vector_ge__p): Likewise.
(vector_gtu__p): Likewise.
(cr6_test_for_zero): Likewise.
(cr6_test_for_zero_reverse): Likewise.
(cr6_test_for_lt): Likewise.
(cr6_test_for_lt_reverse): Likewise.
* config/rs6000/vsx.md (*vsx_eq__p): Likewise.
(*vsx_gt__p): Likewise.
(*vsx_ge__p): Likewise.
(xststdcqp_): Likewise.
(xststdcp): Likewise.
(xststdcnegqp_): Likewise.
(xststdcnegp): Likewise.
(*xststdcqp_): Likewise.
(*xststdcp): Likewise.
(*vsx_ne__p): Likewise.
(*vector_nez__p): Likewise.
(vcmpnezb_p): Likewise.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 9fa8cf89f61..bd79a3f9e84 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2650,10 +2650,10 @@ (define_expand "cbranchv16qi4"
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ_p"
-  [(set (reg:CC CR6_REGNO)
-   (unspec:CC [(eq:CC (match_operand:VI2 1 "register_operand" "v")
-  (match_operand:VI2 2 "register_operand" "v"))]
-  UNSPEC_PREDICATE))
+  [(set (reg:CCLTEQ CR6_REGNO)
+   (unspec:CCLTEQ [(eq:CC (match_operand:VI2 1 "register_operand" "v")
+  (match_operand:VI2 2 "register_operand" "v"))]
+  UNSPEC_PREDICATE))
(set (match_operand:VI2 0 "register_operand" "=v")
(eq:VI2 (match_dup 1)
(match_dup 2)))]
@@ -2662,10 +2662,11 @@ (define_insn "altivec_vcmpequ_p"
   [(set_attr "type" "veccmpfx")])

 (define_insn "altivec_vcmpequt_p"
-  [(set (reg:CC CR6_REGNO)
-   (unspec:CC [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
-  (match_operand:V1TI 2 "altivec_register_operand" 
"v"))]
-  UNSPEC_PREDICATE))
+  [(set (reg:CCLTEQ CR6_REGNO)
+   (unspec:CCLTEQ
+ [(eq:CC (match_operand:V1TI 1 "altivec_register_operand" "v")
+ (match_operand:V1TI 2 "altivec_register_operand" "v"))]
+ UNSPEC_PREDICATE))
(set (match_operand:V1TI 0 "altivec_register_operand" "=v")
(eq:V1TI (match_dup 1)
 (match_dup 2)))]
@@ -2686,10 +2687,10 @@ (define_expand "altivec_vcmpne_"
   })

 (define_insn "*altivec_vcmpgts_p"
-  [(set (reg:CC CR6_REGNO)
-   (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
-  (match_operand:VI2 2 "register_operand" "v"))]
-  UNSPEC_PREDICATE))
+  [(set (reg:CCLTEQ CR6_REGNO)
+   (unspec:CCLTEQ [(gt:CC (match_operand:VI2 1 "register_operand" "v")
+  (match_operand:VI2 2 "register_operand" "v"))]
+  UNSPEC_PREDICATE))
(set (match_operand:VI2 0 "register_operand" "=v")
(gt:VI2 (match_dup 1)
(match_dup 2)))]
@@ -2698,10 +2699,10 @@ (define_insn "*altivec_vcmpgts_p"
   [(set_attr "type" "veccmpfx")])

 (define_insn "*altivec_vcmpgtst_p"
-  [(set (reg:CC CR6_REGNO)
-   (unspec:CC [(gt:CC 

[PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-04-30 Thread HAO CHEN GUI
Hi,
  It's the first patch of a series of patches optimizing CC modes on
rs6000.

  bcd insns set all four bits of a CR field. But it has different single
bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
to indict overflow or invalid number. It's not a bit for unordered test.
So the "le" test should be reversed to "gt" not "ungt". The "ge" test
should be reversed to "lt" not "unlt". That's the root cause of PR100736
and PR114732.

  This patch fixes the issue by adding a new type of CC mode - CCBCD for
all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
be merged to a uniform pattern which is for all CC modes in sequential
patch.

  The rtl code "unordered" is still used for testing overflow or
invalid number. IMHO, the "unordered" on a CC mode can be considered as
testing the forth bit of a CR field setting or not. The "eq" on a CC mode
can be considered as testing the third bit setting or not. Thus we avoid
creating lots of unspecs for the CR bit testing.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Add a new type of CC mode - CCBCD for bcd insns

gcc/
PR target/100736
PR target/114732
* config/rs6000/altivec.md (bcd_): Replace CCFP
with CCBCD.
(*bcd_test_): Likewise.
(*bcd_test2_): Likewise.
(bcd__): Likewise.
(*bcdinvalid_): Likewise.
(bcdinvalid_): Likewise.
(bcdshift_v16qi): Likewise.
(bcdmul10_v16qi): Likewise.
(bcddiv10_v16qi): Likewise.
(peephole for bcd_add/sub): Likewise.
* config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
and its supported comparison codes.
* config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
* config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
assertion.
* config/rs6000/rs6000.md (CC_any): Add CCBCD.
(ccbcd_rev): New code iterator.
(*_cc): New insn and split pattern for CCBCD reverse
compare.

gcc/testsuite/
PR target/100736
PR target/114732
* gcc.target/powerpc/pr100736.c: New.
* gcc.target/powerpc/pr114732.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index bb20441c096..9fa8cf89f61 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4443,7 +4443,7 @@ (define_insn "bcd_"
  (match_operand:VBCD 2 "register_operand" "v")
  (match_operand:QI 3 "const_0_to_1_operand" "n")]
 UNSPEC_BCD_ADD_SUB))
-   (clobber (reg:CCFP CR6_REGNO))]
+   (clobber (reg:CCBCD CR6_REGNO))]
   "TARGET_P8_VECTOR"
   "bcd. %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
@@ -4454,8 +4454,8 @@ (define_insn "bcd_"
 ;; probably should be one that can go in the VMX (Altivec) registers, so we
 ;; can't use DDmode or DFmode.
 (define_insn "*bcd_test_"
-  [(set (reg:CCFP CR6_REGNO)
-   (compare:CCFP
+  [(set (reg:CCBCD CR6_REGNO)
+   (compare:CCBCD
 (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")
   (match_operand:VBCD 2 "register_operand" "v")
   (match_operand:QI 3 "const_0_to_1_operand" "i")]
@@ -4472,8 +4472,8 @@ (define_insn "*bcd_test2_"
  (match_operand:VBCD 2 "register_operand" "v")
  (match_operand:QI 3 "const_0_to_1_operand" "i")]
 UNSPEC_BCD_ADD_SUB))
-   (set (reg:CCFP CR6_REGNO)
-   (compare:CCFP
+   (set (reg:CCBCD CR6_REGNO)
+   (compare:CCBCD
 (unspec:V2DF [(match_dup 1)
   (match_dup 2)
   (match_dup 3)]
@@ -4566,8 +4566,8 @@ (define_insn "vclrrb"
[(set_attr "type" "vecsimple")])

 (define_expand "bcd__"
-  [(parallel [(set (reg:CCFP CR6_REGNO)
-  (compare:CCFP
+  [(parallel [(set (reg:CCBCD CR6_REGNO)
+  (compare:CCBCD
(unspec:V2DF [(match_operand:VBCD 1 "register_operand")
  (match_operand:VBCD 2 "register_operand")
  (match_operand:QI 3 "const_0_to_1_operand")]
@@ -4575,7 +4575,7 @@ (define_expand "bcd__"
(match_dup 4)))
  (clobber (match_scratch:VBCD 5))])
(set (match_operand:SI 0 "register_operand")
-   (BCD_TEST:SI (reg:CCFP CR6_REGNO)
+   (BCD_TEST:SI (reg:CCBCD CR6_REGNO)
 (const_int 0)))]
   "TARGET_P8_VECTOR"
 {
@@ -4583,8 +4583,8 @@ (define_expand "bcd__"
 })

 (define_insn "*bcdinvalid_"
-  [(set (reg:CCFP CR6_REGNO)
-   (compare:CCFP
+  [(set (reg:CCBCD CR6_REGNO)
+   (compare:CCBCD
 (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")]
  UNSPEC_BCDSUB)
 (match_operand:V2DF 2 "zero_constant" "j")))
@@ -4594,14 +4594,14 @@ (define_insn "*bcdinvalid_"
   [(set_attr 

Re: [PATCH] Value range: Add range op for __builtin_isfinite

2024-04-23 Thread HAO CHEN GUI
Yes, it's my typo.

Thanks.
Gui Haochen

在 2024/4/23 17:10, rep.dot@gmail.com 写道:
> On 12 April 2024 07:30:10 CEST, HAO CHEN GUI  wrote:
> 
> 
>>
>>
>> patch.diff
>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>> index 9de130b4022..99c511728d3 100644
>> --- a/gcc/gimple-range-op.cc
>> +++ b/gcc/gimple-range-op.cc
>> @@ -1192,6 +1192,56 @@ public:
>>   }
>> } op_cfn_isinf;
>>
>> +//Implement range operator for CFN_BUILT_IN_ISFINITE
>> +class cnf_isfinite : public range_operator
>> +{
> 
> 
> s/cnf/cfn/g
> I guess.
> thanks


[PATCH, rs6000] Use bcdsub. instead of bcdadd. for bcd invalid number checking

2024-04-17 Thread HAO CHEN GUI
Hi,
  This patch replace bcdadd. with bcdsub. for bcd invalid number checking.
bcdadd on two same numbers might cause overflow which also set
overflow/invalid bit so that we can't distinguish it's invalid or overflow.
The bcdsub doesn't have the problem as subtracting on two same number never
causes overflow.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Use bcdsub. instead of bcdadd. for bcd invalid number checking

bcdadd. might causes overflow which also set the overflow/invalid bit.
bcdsub. doesn't have the issue when do subtracting on two same bcd number.

gcc/
* config/rs6000/altivec.md (*bcdinvalid_): Replace bcdadd
with bcdsub.
(bcdinvalid_): Likewise.

gcc/testsuite/
* gcc.target/powerpc/bcd-4.c: Adjust the number of bcdadd and
bcdsub.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 4d4c94ff0a0..bb20441c096 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4586,18 +4586,18 @@ (define_insn "*bcdinvalid_"
   [(set (reg:CCFP CR6_REGNO)
(compare:CCFP
 (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")]
- UNSPEC_BCDADD)
+ UNSPEC_BCDSUB)
 (match_operand:V2DF 2 "zero_constant" "j")))
(clobber (match_scratch:VBCD 0 "=v"))]
   "TARGET_P8_VECTOR"
-  "bcdadd. %0,%1,%1,0"
+  "bcdsub. %0,%1,%1,0"
   [(set_attr "type" "vecsimple")])

 (define_expand "bcdinvalid_"
   [(parallel [(set (reg:CCFP CR6_REGNO)
   (compare:CCFP
(unspec:V2DF [(match_operand:VBCD 1 "register_operand")]
-UNSPEC_BCDADD)
+UNSPEC_BCDSUB)
(match_dup 2)))
  (clobber (match_scratch:VBCD 3))])
(set (match_operand:SI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-4.c 
b/gcc/testsuite/gcc.target/powerpc/bcd-4.c
index 2c7041c4d32..6d2c59ef792 100644
--- a/gcc/testsuite/gcc.target/powerpc/bcd-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bcd-4.c
@@ -2,8 +2,8 @@
 /* { dg-require-effective-target int128 } */
 /* { dg-require-effective-target p9vector_hw } */
 /* { dg-options "-mdejagnu-cpu=power9 -O2 -save-temps" } */
-/* { dg-final { scan-assembler-times {\mbcdadd\M} 7 } } */
-/* { dg-final { scan-assembler-times {\mbcdsub\M} 18 } } */
+/* { dg-final { scan-assembler-times {\mbcdadd\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mbcdsub\M} 20 } } */
 /* { dg-final { scan-assembler-times {\mbcds\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mdenbcdq\M} 1 } } */



[PATCH, rs6000] Fix test case bcd4.c

2024-04-16 Thread HAO CHEN GUI
Hi,
  This patch fixes loss of return statement in maxbcd of bcd-4.c. Without
return statement, it returns an invalid bcd number and make the test
noneffective. The patch also enables test to run on Power9 and Big Endian,
as all bcd instructions are supported from Power9.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Fix bcd test case

gcc/testsuite/
* gcc.target/powerpc/bcd-4.c: Enable the case to be tested on Power9.
Enable the case to be run on big endian.  Fix function maxbcd and
other misc. problems.


patch.diff
diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-4.c 
b/gcc/testsuite/gcc.target/powerpc/bcd-4.c
index 2c8554dfe82..8c0bac2720f 100644
--- a/gcc/testsuite/gcc.target/powerpc/bcd-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bcd-4.c
@@ -1,7 +1,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target int128 } */
-/* { dg-require-effective-target power10_hw } */
-/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2 -save-temps" } */
 /* { dg-final { scan-assembler-times {\mbcdadd\M} 7 } } */
 /* { dg-final { scan-assembler-times {\mbcdsub\M} 18 } } */
 /* { dg-final { scan-assembler-times {\mbcds\M} 2 } } */
@@ -44,10 +44,20 @@ vector unsigned char maxbcd(unsigned int sign)
   vector unsigned char result;
   int i;

+#ifdef _BIG_ENDIAN
+  for (i = 0; i < 15; i++)
+#else
   for (i = 15; i > 0; i--)
+#endif
 result[i] = 0x99;

-  result[0] = sign << 4 | 0x9;
+#ifdef _BIG_ENDIAN
+  result[15] = 0x90 | sign;
+#else
+  result[0] = 0x90 | sign;
+#endif
+
+  return result;
 }

 vector unsigned char num2bcd(long int a, int encoding)
@@ -70,9 +80,17 @@ vector unsigned char num2bcd(long int a, int encoding)

   hi = a % 10;   // 1st digit
   a = a / 10;
+#ifdef _BIG_ENDIAN
+  result[15] = hi << 4| sign;
+#else
   result[0] = hi << 4| sign;
+#endif

+#ifdef _BIG_ENDIAN
+  for (i = 14; i >= 0; i--)
+#else
   for (i = 1; i < 16; i++)
+#endif
 {
   low = a % 10;
   a = a / 10;
@@ -117,7 +135,11 @@ int main ()
 }

   /* result should be positive */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_POS0)
+#else
   if ((result[0] & 0xF) != BCD_POS0)
+#endif
 #if DEBUG
   printf("ERROR: __builtin_bcdadd sign of result is %d.  Does not match "
 "expected_result = %d\n",
@@ -150,7 +172,11 @@ int main ()
 }

   /* Result should be positive, alternate encoding.  */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_POS1)
+#else
   if ((result[0] & 0xF) != BCD_POS1)
+#endif
 #if DEBUG
 printf("ERROR: __builtin_bcdadd sign of result is %d.  Does not "
   "match expected_result = %d\n",
@@ -183,7 +209,11 @@ int main ()
 }

   /* result should be negative */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_NEG)
+#else
   if ((result[0] & 0xF) != BCD_NEG)
+#endif
 #if DEBUG
 printf("ERROR: __builtin_bcdadd sign, neg of result is %d.  Does not "
   "match expected_result = %d\n",
@@ -217,7 +247,11 @@ int main ()
 }

   /* result should be positive, alt encoding */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_NEG)
+#else
   if ((result[0] & 0xF) != BCD_NEG)
+#endif
 #if DEBUG
 printf("ERROR: __builtin_bcdadd sign, of result is %d.  Does not match "
   "expected_result = %d\n",
@@ -250,7 +284,11 @@ int main ()
 }

   /* result should be positive */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_POS1)
+#else
   if ((result[0] & 0xF) != BCD_POS1)
+#endif
 #if DEBUG
 printf("ERROR: __builtin_bcdsub sign, result is %d.  Does not match "
   "expected_result = %d\n",
@@ -283,7 +321,7 @@ int main ()
 abort();
 #endif

-  a = maxbcd(BCD_NEG);
+  a = maxbcd(BCD_POS0);
   b = maxbcd(BCD_NEG);

   if (__builtin_bcdsub_ofl (a, b, 0) == 0)
@@ -462,8 +500,12 @@ int main ()
 }

   /* result should be positive */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_POS0)
+#else
   if ((result[0] & 0xF) != BCD_POS0)
-#if 0
+#endif
+#if DEBUG
 printf("ERROR: __builtin_bcdmul10 sign, result is %d.  Does not match "
   "expected_result = %d\n",
   result[0] & 0xF, BCD_POS1);
@@ -492,7 +534,11 @@ int main ()
 }

   /* result should be positive */
+#ifdef _BIG_ENDIAN
+  if ((result[15] & 0xF) != BCD_POS0)
+#else
   if ((result[0] & 0xF) != BCD_POS0)
+#endif
 #if DEBUG
 printf("ERROR: __builtin_bcddiv10 sign, result is %d.  Does not match "
   "expected_result = %d\n",


[PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test
data class instructions.

  This patch relies on former patch which adds optab_isnormal.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Implement optab_isnormal for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isnormal2): New expand for SFmode and
DFmode.
(isnormal2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-7.c: New test.
* gcc.target/powerpc/pr97786-8.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index a6c72ae33b0..d1c9ef5447c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5357,6 +5357,30 @@ (define_expand "isfinite2"
   DONE;
 })

+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isnormal2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+

 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
new file mode 100644
index 000..a0d848497b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isnormal (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isnormal (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
new file mode 100644
index 000..d591073d281
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isnormal (x);
+}
+
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-04-12 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for next stage-1?

Thanks
Gui Haochen
ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 3174f52ebe8..defb39de95f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isfinite on IBM long double to builtin_isfinite on
double type. The former patch
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html
implemented the DFmode isfinite_optab.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double

For IBM long double, INF and NAN is encoded in the high-order double value
only.  So the builtin_isfinite on IBM long double can be folded to
builtin_isfinite on double type.  As former patch implemented DFmode
isfinite_optab, this patch converts builtin_isfinite on IBM long double to
builtin_isfinite on double type if the DFmode isfinite_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double
isfinite call to double isfinite call when DFmode isfinite_optab
exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-6.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 5262aa01660..3174f52ebe8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9605,6 +9605,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+   tree const isfinite_fn = builtin_decl_explicit (BUILT_IN_ISFINITE);
+   if (interclass_mathfn_icode (arg, isfinite_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isfinite_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-6.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
new file mode 100644
index 000..c86c765651d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler {\mxststdcdp\M} } } */


[PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_finite for SF/DF/TFmode by rs6000 test
data class instructions.

  This patch relies on former patch which adds optab_finite.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Implement optab_isfinite for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isfinite2): New expand for SFmode and
DFmode.
(isfinite2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-4.c: New test.
* gcc.target/powerpc/pr97786-5.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f0cc02f7e7b..a6c72ae33b0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5333,6 +5333,31 @@ (define_expand "isinf2"
   DONE;
 })

+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcp (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+(define_expand "isfinite2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x70)));
+  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
+  DONE;
+})
+
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
new file mode 100644
index 000..55b5ff507b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isfinite (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
new file mode 100644
index 000..5b5a89681fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler {\mxststdcqp\M} } } */


[PATCH] Value range: Add range op for __builtin_isfinite

2024-04-11 Thread HAO CHEN GUI
Hi,
  The former patch adds isfinite optab for __builtin_isfinite.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Thus the builtin might not be folded at front end. The range op for
isfinite is needed for value range analysis. This patch adds them.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite might
not be folded at front end.  So the range op for isfinite is needed for value
range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.


patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 9de130b4022..99c511728d3 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1192,6 +1192,56 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cnf_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+return false;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p ())
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+return false;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1288,6 +1338,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-04-11 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for next stage-1?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index d2786f207b8..5262aa01660 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]

2024-03-27 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isinf on IBM long double to builtin_isinf on
double type. The former patch
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html
implemented the DFmode isinf_optab.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double

For IBM long double, Inf is encoded in the high-order double value only.
So the builtin_isinf on IBM long double can be folded to builtin_isinf on
double type.  As former patch implemented DFmode isinf_optab, this patch
converts builtin_isinf on IBM long double to builtin_isinf on double type
if the DFmode isinf_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold IBM long double
isinf call to double isinf call when DFmode isinf_optab exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-3.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eda8bea9c4b..d2786f207b8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9574,6 +9574,12 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+   tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF);
+   if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isinf_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
new file mode 100644
index 000..1c816921e1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */


[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]

2024-03-24 Thread HAO CHEN GUI
Hi,
  This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test
data class instructions.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen

ChangeLog
rs6000: Implement optab_isinf for SFmode, DFmode and TFmode

gcc/
PR target/97786
* config/rs6000/vsx.md (isinf2): New expand for SFmode and
DFmode.
(isinf2): New expand for TFmode.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-1.c: New test.
* gcc.target/powerpc/pr97786-2.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..f0cc02f7e7b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5313,6 +5313,26 @@ (define_expand "xststdcp"
   operands[4] = CONST0_RTX (SImode);
 })

+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
+(define_expand "isinf2"
+  [(use (match_operand:SI 0 "gpc_reg_operand"))
+   (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+   && TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT (0x30)));
+  DONE;
+})
+
 ;; The VSX Scalar Test Negative Quad-Precision
 (define_expand "xststdcnegqp_"
   [(set (match_dup 2)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
new file mode 100644
index 000..1b1e6d642de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
+
+int test1 (double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (float x)
+{
+  return __builtin_isinf (x);
+}
+
+int test3 (float x)
+{
+  return __builtin_isinff (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
new file mode 100644
index 000..de7f2d67c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target ppc_float128_sw } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
-Wno-psabi" } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */


[PATCH] Value Range: Add range op for builtin isinf

2024-03-24 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op fro isinf is needed for value range analysis.  This
patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a98f7db62a7..9de130b4022 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1140,6 +1140,57 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cnf_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange , tree type, const frange ,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (_bound ())
+   && !real_isinf (_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+return false;
+  }
+  virtual bool op1_range (frange , tree type, const irange ,
+ const frange &, relation_trio) const override
+  {
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+return false;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1232,6 +1283,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = _cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = _cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+



Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread HAO CHEN GUI
Hi,
  Gently ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html

Thanks
Gui Haochen

在 2024/3/11 13:41, HAO CHEN GUI 写道:
> Hi,
>   This patch tries to fix the problem when a canonical form doesn't benefit
> on a specific target. The const operand of AND is and with the nonzero
> bits of another operand in combine pass. It's a canonical form, but it's no
> benefits for the target which has rotate and mask insns. As the mask is
> truncated, it can't match the insn conditions which it originally matches.
> For example, the following insn condition checks the sum of two AND masks.
> When one of the mask is truncated, the condition breaks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
>   This patch tries to fix the problem by comparing the rtx cost. If another
> operand (varop) is not changed and rtx cost with new mask is not less than
> the original one, the mask is restored to original one.
> 
>   I'm not sure if comparison of rtx cost here is proper. The outer code is
> unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
> From my understanding, the canonical forms should always benefit as it can't
> be undo in combine pass. Do we have a perfect solution for this kind of
> issues? Looking forward for your advice.
> 
>   Another similar issues for canonical forms. Whether the widen mode for
> lshiftrt is always good?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> Combine: Don't truncate const operand of AND if it's no benefits
> 
> In combine pass, the canonical form is to turn off all bits in the constant
> that are know to already be zero for AND.
> 
>   /* Turn off all bits in the constant that are known to already be zero.
>  Thus, if the AND isn't needed at all, we will have CONSTOP == 
> NONZERO_BITS
>  which is tested below.  */
> 
>   constop &= nonzero;
> 
> But it doesn't benefit when the target has rotate and mask insert insns.
> The AND mask is truncated and lost its information.  Thus it can't match
> the insn conditions.  For example, the following insn condition checks
> the sum of two AND masks.
> 
> (define_insn "*rotlsi3_insert_5"
>   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>   (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
>   (match_operand:SI 2 "const_int_operand" "n,n"))
>   (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
>   (match_operand:SI 4 "const_int_operand" "n,n"]
>   "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
>&& UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
>&& UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
> ...
> 
> This patch restores the const operand of AND if the another operand is
> not optimized and the truncated const operand doesn't save the rtx cost.
> 
> gcc/
>   * combine.cc (simplify_and_const_int_1): Restore the const operand
>   of AND if varop is not optimized and the rtx cost of the new const
>   operand is not reduced.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
>   adjust the number of rotate and mask insns.
>   * gcc.target/powerpc/rlwimi-1.c: Likewise.
>   * gcc.target/powerpc/rlwimi-2.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a4479f8d836..16ff09ea854 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
> varop,
>if (constop == nonzero)
>  return varop;
> 
> -  if (varop == orig_varop && constop == orig_constop)
> -return NULL_RTX;
> +  if (varop == orig_varop)
> +{
> +  if (constop == orig_constop)
> + return NULL_RTX;
> +  else
> + {
> +   rtx tmp = simplify_

[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-10 Thread HAO CHEN GUI
Hi,
  This patch tries to fix the problem when a canonical form doesn't benefit
on a specific target. The const operand of AND is and with the nonzero
bits of another operand in combine pass. It's a canonical form, but it's no
benefits for the target which has rotate and mask insns. As the mask is
truncated, it can't match the insn conditions which it originally matches.
For example, the following insn condition checks the sum of two AND masks.
When one of the mask is truncated, the condition breaks.

(define_insn "*rotlsi3_insert_5"
  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
  "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
   && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
   && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

  This patch tries to fix the problem by comparing the rtx cost. If another
operand (varop) is not changed and rtx cost with new mask is not less than
the original one, the mask is restored to original one.

  I'm not sure if comparison of rtx cost here is proper. The outer code is
unknown and I suppose it as "SET". Also the rtx cost might not be accurate.
>From my understanding, the canonical forms should always benefit as it can't
be undo in combine pass. Do we have a perfect solution for this kind of
issues? Looking forward for your advice.

  Another similar issues for canonical forms. Whether the widen mode for
lshiftrt is always good?
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Thanks
Gui Haochen

ChangeLog
Combine: Don't truncate const operand of AND if it's no benefits

In combine pass, the canonical form is to turn off all bits in the constant
that are know to already be zero for AND.

  /* Turn off all bits in the constant that are known to already be zero.
 Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
 which is tested below.  */

  constop &= nonzero;

But it doesn't benefit when the target has rotate and mask insert insns.
The AND mask is truncated and lost its information.  Thus it can't match
the insn conditions.  For example, the following insn condition checks
the sum of two AND masks.

(define_insn "*rotlsi3_insert_5"
  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
(match_operand:SI 2 "const_int_operand" "n,n"))
(and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0")
(match_operand:SI 4 "const_int_operand" "n,n"]
  "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode)
   && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0
   && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0"
...

This patch restores the const operand of AND if the another operand is
not optimized and the truncated const operand doesn't save the rtx cost.

gcc/
* combine.cc (simplify_and_const_int_1): Restore the const operand
of AND if varop is not optimized and the rtx cost of the new const
operand is not reduced.

gcc/testsuite/
* gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and
adjust the number of rotate and mask insns.
* gcc.target/powerpc/rlwimi-1.c: Likewise.
* gcc.target/powerpc/rlwimi-2.c: Likewise.

patch.diff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..16ff09ea854 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx 
varop,
   if (constop == nonzero)
 return varop;

-  if (varop == orig_varop && constop == orig_constop)
-return NULL_RTX;
+  if (varop == orig_varop)
+{
+  if (constop == orig_constop)
+   return NULL_RTX;
+  else
+   {
+ rtx tmp = simplify_gen_binary (AND, mode, varop,
+gen_int_mode (constop, mode));
+ rtx orig = simplify_gen_binary (AND, mode, varop,
+ gen_int_mode (orig_constop, mode));
+ if (set_src_cost (tmp, mode, optimize_this_for_speed_p)
+ < set_src_cost (orig, mode, optimize_this_for_speed_p))
+   return tmp;
+ else
+   return NULL_RTX;
+   }
+}

   /* Otherwise, return an AND.  */
   return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, mode));
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c 
b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
index 961be199901..d9dd4419f1d 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c
@@ -2,15 +2,15 @@
 /* { dg-options "-O2" } */

 /* { dg-final { scan-assembler-times 

[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-08 Thread HAO CHEN GUI
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has shiftrt for narrow mode and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones.

  Compared with last version, the main change is to remove changes for a
testcase which was already fixed in another patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_subreg): New.
(*rotlsi3_insert_4_subreg): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..996d0740faf 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_subreg"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
+}
+  [(set_attr "type" "insert")])
+
 (define_insn "*rotl3_insert_2"
   [(set 

[PATCHv2] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi,
  This patch tries to fix a potential problem which is raised by the patch
for PR111267. The volatile asm operand tries to be propagated to a single
set insn with the patch for PR111267. The volatile asm operand might be
executed for multiple times if the define insn isn't eliminated after
propagation. Now set_src_cost comparison might reject such propagation.
But it has the chance to be taken after replacing set_src_cost with insn
cost. Actually I found the problem in testing my patch which replacing
set_src_cost with insn_cost in fwprop pass.

  Compared to the last version, the check volatile_insn_p is replaced with
volatile_refs_p in order to check volatile memory reference also.
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646482.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
fwprop: Avoid volatile defines to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile operand might be propagated to a single set insn.  If the
define insn is not eliminated after propagation, the volatile operand will
be executed for multiple times.  This patch fixes the problem by skipping
volatile set source rtx in propagation.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source rtx.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..cb6fd6700ca 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = 
false)

   rtx dest = SET_DEST (def_set);
   rtx src = SET_SRC (def_set);
+  if (volatile_refs_p (src))
+return false;

   /* Allow propagations into a loop only for reg-to-reg copies, since
  replacing one register by another shouldn't increase the cost.
diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
new file mode 100644
index 000..07b207f980c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
+/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
+
+/* Verify that volatile asm operands doesn't be propagated.  */
+long long foo ()
+{
+  long long res;
+  __asm__ __volatile__(
+""
+  : "=r" (res)
+  :
+  : "memory");
+  return res;
+}



Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi Jeff,

在 2024/3/4 11:37, Jeff Law 写道:
> Can the same thing happen with a volatile memory load?  I don't think that 
> will be caught by the volatile_insn_p check.

Yes, I think so. If the define rtx contains volatile memory references, it
may hit the same problem. We may use volatile_refs_p instead of
volatile_insn_p?

Thanks
Gui Haochen


Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-03 Thread HAO CHEN GUI
Hi Jeff,
  Thanks for your comments.

在 2024/3/4 6:02, Jeff Law 写道:
> Why specifically are you worried here?  Propagation of a volatile shouldn't 
> in and of itself cause a problem.  We're not changing the number of volatile 
> accesses or anything like that -- we're just moving them around a bit.

If the volatile asm operand is in a parallel set, it can't be eliminated
after the propagation. So the define insn and use insn will execute the
volatile asm block twice. That's the problem.

Here is a real case from sanitizer_linux.cpp. The insn 62 has a volatile
asm operands and it is propagated into insn 60. After propagation both
insn 60 and 62 has the volatile asm operand. Thus asm block will be
executed for twice. It causes sanitizer behaves abnormally in my test.

propagating insn 62 into insn 60, replacing:
(set (reg/v:DI 119 [ res ])
(reg:DI 133 [ res ]))
successfully matched this instruction:
(set (reg/v:DI 119 [ res ])
(asm_operands/v:DI ("mr 28, %5
mr 27, %8
mr 3, %7
mr 5, %9
mr 6, %10
mr 7, %11
li 0, %3
sc
cmpdi  cr1, 3, 0
crandc cr1*4+eq, cr1*4+eq, cr0*4+so
bne-   cr1, 1f
li29, 0
stdu  29, -8(1)
stdu  1, -%12(1)
std   2, %13(1)
mr12, 28
mtctr 12
mr3, 27
bctrl
ld2, %13(1)
li 0, %4
sc
1:
mr %0, 3
") ("=r") 0 [
(reg:SI 134)
(const_int 22 [0x16])
(const_int 120 [0x78])
(const_int 1 [0x1])
(reg/v:DI 3 3 [ __fn ])
(reg/v:DI 4 4 [ __cstack ])
(reg/v:SI 5 5 [ __flags ])
(reg/v:DI 6 6 [ __arg ])
(reg/v:DI 7 7 [ __ptidptr ])
(reg/v:DI 8 8 [ __newtls ])
(reg/v:DI 9 9 [ __ctidptr ])
(const_int 32 [0x20])
(const_int 24 [0x18])
 [
(asm_input:SI ("0") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:DI ("r") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
]
 [] 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591))
rescanning insn with uid = 60.
updating insn 60 in-place

(insn 62 61 60 6 (parallel [
(set (reg:DI 133 [ res ])
(asm_operands/v:DI ("mr 28, %5
mr 27, %8
mr 3, %7
mr 5, %9
mr 6, %10
mr 7, %11
li 0, %3
sc
cmpdi  cr1, 3, 0
crandc cr1*4+eq, cr1*4+eq, cr0*4+so
bne-   cr1, 1f
li29, 0
stdu  29, -8(1)
stdu  1, -%12(1)
std   2, %13(1)
mr12, 28
mtctr 12
mr3, 27
bctrl
ld2, %13(1)
li 0, %4
sc
1:
mr %0, 3
") ("=r") 0 [
(reg:SI 134)
(const_int 22 [0x16])
(const_int 120 [0x78])
(const_int 1 [0x1])
(reg/v:DI 3 3 [ __fn ])
(reg/v:DI 4 4 [ __cstack ])
(reg/v:SI 5 5 [ __flags ])
(reg/v:DI 6 6 [ __arg ])
(reg/v:DI 7 7 [ __ptidptr ])
(reg/v:DI 8 8 [ __newtls ])
(reg/v:DI 9 9 [ __ctidptr ])
(const_int 32 [0x20])
(const_int 24 [0x18])
]
 [
(asm_input:SI ("0") 
/home/guihaoc/gcc/gcc-mainline-base/libsanitizer/sanitizer_common/sanitizer_linux.cpp:1591)
(asm_input:SI ("i") 

[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-02-29 Thread HAO CHEN GUI
Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has the narrow mode lshiftrt and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
a regression as the total number of insns isn't changed.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
(*rotlsi3_insert_8): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.
* gcc.target/powerpc/rlwinm-0.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..b0b40f91e3e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_9"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode, 

[PATCH] fwprop: Avoid volatile defines to be propagated

2024-02-25 Thread HAO CHEN GUI
Hi,
  This patch tries to fix a potential problem which is raised by the patch
for PR111267. The volatile asm operand tries to be propagated to a single
set insn with the patch for PR111267. It has potential risk as the behavior
is wrong. Currently set_src_cost comparison can reject such propagation.
But the propagation might be taken after replacing set_src_cost with insn
cost. Actually I found the problem in testing my patch which replacing
et_src_cost with insn cost for fwprop.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
fwprop: Avoid volatile defines to be propagated

The patch for PR111267 (commit id 86de9b66480b710202a2898cf513db105d8c432f)
which introduces an exception for propagation on single set insn.  The
propagation which might not be profitable (checked by profitable_p) is still
allowed to be propagated to single set insn.  It has a potential problem
that a volatile asm operand will try to be propagated to a single set insn.
The volatile asm operand is originally banned in profitable_p.  This patch
fixes the problem by skipping volatile set source in define set finding.

gcc/
* fwprop.cc (forward_propagate_into): Return false for volatile set
source.

gcc/testsuite/
* gcc.target/powerpc/fwprop-1.c: New.

patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 7872609b336..89dce88b43d 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -854,6 +854,8 @@ forward_propagate_into (use_info *use, bool reg_prop_only = 
false)

   rtx dest = SET_DEST (def_set);
   rtx src = SET_SRC (def_set);
+  if (volatile_insn_p (src))
+return false;

   /* Allow propagations into a loop only for reg-to-reg copies, since
  replacing one register by another shouldn't increase the cost.
diff --git a/gcc/testsuite/gcc.target/powerpc/fwprop-1.c 
b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
new file mode 100644
index 000..07b207f980c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fwprop-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-fwprop1-details" } */
+/* { dg-final { scan-rtl-dump-not "propagating insn" "fwprop1" } } */
+
+/* Verify that volatile asm operands doesn't try to be propagated.  */
+long long foo ()
+{
+  long long res;
+  __asm__ __volatile__(
+""
+  : "=r" (res)
+  :
+  : "memory");
+  return res;
+}



[Patch, rs6000] Enable overlap memory store for block memory clear

2024-02-25 Thread HAO CHEN GUI
Hi,
  This patch enables overlap memory store for block memory clear which
saves the number of store instructions. The expander calls
widest_fixed_size_mode_for_block_clear to get the mode for looped block
clear and calls widest_fixed_size_mode_for_block_clear to get the mode
for last overlapped clear.

Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk or next stage 1?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable overlap memory store for block memory clear

gcc/
* config/rs6000/rs6000-string.cc
(widest_fixed_size_mode_for_block_clear): New.
(smallest_fixed_size_mode_for_block_clear): New.
(expand_block_clear): Call widest_fixed_size_mode_for_block_clear to
get the mode for looped memory stores and call
smallest_fixed_size_mode_for_block_clear to get the mode for the last
overlapped memory store.

gcc/testsuite
* gcc.target/powerpc/block-clear-1.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 133e5382af2..c2a6095a586 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -38,6 +38,49 @@
 #include "profile-count.h"
 #include "predict.h"

+/* Return the widest mode which mode size is less than or equal to the
+   size.  */
+static fixed_size_mode
+widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align,
+   bool unaligned_vsx_ok)
+{
+  machine_mode mode;
+
+  if (TARGET_ALTIVEC
+  && size >= 16
+  && (align >= 128
+ || unaligned_vsx_ok))
+mode = V4SImode;
+  else if (size >= 8
+  && TARGET_POWERPC64
+  && (align >= 64
+  || !STRICT_ALIGNMENT))
+mode = DImode;
+  else if (size >= 4
+  && (align >= 32
+  || !STRICT_ALIGNMENT))
+mode = SImode;
+  else if (size >= 2
+  && (align >= 16
+  || !STRICT_ALIGNMENT))
+mode = HImode;
+  else
+mode = QImode;
+
+  return as_a  (mode);
+}
+
+/* Return the smallest mode which mode size is smaller than or eqaul to
+   the size.  */
+static fixed_size_mode
+smallest_fixed_size_mode_for_block_clear (unsigned int size)
+{
+  if (size > UNITS_PER_WORD)
+return as_a  (V4SImode);
+
+  return smallest_int_mode_for_size (size * BITS_PER_UNIT);
+}
+
 /* Expand a block clear operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -55,7 +98,6 @@ expand_block_clear (rtx operands[])
   HOST_WIDE_INT align;
   HOST_WIDE_INT bytes;
   int offset;
-  int clear_bytes;
   int clear_step;

   /* If this is not a fixed size move, just call memcpy */
@@ -89,62 +131,36 @@ expand_block_clear (rtx operands[])

   bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);

-  for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
+  auto mode = widest_fixed_size_mode_for_block_clear (bytes, align,
+ unaligned_vsx_ok);
+  offset = 0;
+  rtx dest;
+
+  do
 {
-  machine_mode mode = BLKmode;
-  rtx dest;
+  unsigned int size = GET_MODE_SIZE (mode);

-  if (TARGET_ALTIVEC
- && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok)))
+  while (bytes >= size)
{
- clear_bytes = 16;
- mode = V4SImode;
-   }
-  else if (bytes >= 8 && TARGET_POWERPC64
-  && (align >= 64 || !STRICT_ALIGNMENT))
-   {
- clear_bytes = 8;
- mode = DImode;
- if (offset == 0 && align < 64)
-   {
- rtx addr;
+ dest = adjust_address (orig_dest, mode, offset);
+ emit_move_insn (dest, CONST0_RTX (mode));

- /* If the address form is reg+offset with offset not a
-multiple of four, reload into reg indirect form here
-rather than waiting for reload.  This way we get one
-reload, not one per store.  */
- addr = XEXP (orig_dest, 0);
- if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
- && CONST_INT_P (XEXP (addr, 1))
- && (INTVAL (XEXP (addr, 1)) & 3) != 0)
-   {
- addr = copy_addr_to_reg (addr);
- orig_dest = replace_equiv_address (orig_dest, addr);
-   }
-   }
-   }
-  else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT))
-   {   /* move 4 bytes */
- clear_bytes = 4;
- mode = SImode;
-   }
-  else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT))
-   {   /* move 2 bytes */
- clear_bytes = 2;
- mode = HImode;
-   }
-  else /* move 1 byte at a time */
-   {
- clear_bytes = 1;
- mode = QImode;
+ offset += size;
+ bytes -= size;
}

-  dest = 

[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi,
  This patch creates an insn_and_split pattern which helps the duplicated
constant vector replace the source pseudo of store insn in fwprop pass.
Thus the store can be implemented by a single stxvd2x and it eliminates the
unnecessary byte swap insn on P8 LE. The test case shows the optimization.

  The patch depends on the first generic patch which uses insn cost in fwprop.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store

gcc/
PR target/113325
* config/rs6000/predicates.md (duplicate_easy_altivec_constant): New.
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.


patch.diff
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index ef7d3f214c4..8ab6db630b7 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant"
   return false;
 })

+;; Return 1 if it's a duplicated easy_altivec_constant.
+(define_predicate "duplicate_easy_altivec_constant"
+  (and (match_code "const_vector")
+   (match_test "easy_altivec_constant (op, mode)"))
+{
+  return const_vec_duplicate_p (op);
+})
+
 ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF.
 (define_predicate "easy_vector_constant_add_self"
   (and (match_code "const_vector")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 26fa32829af..98e4be26f64 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_"
   "stxvd2x %x1,%y0"
   [(set_attr "type" "vecstore")])

+(define_insn_and_split "vsx_stxvd2x4_le_const_"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+   (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))]
+  "!BYTES_BIG_ENDIAN
+   && VECTOR_MEM_VSX_P (mode)
+   && !TARGET_P9_VECTOR"
+  "#"
+  "&& 1"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (vec_select:VSX_W
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+(const_int 0) (const_int 1)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+: operands[1];
+
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "*vsx_stxvd2x8_le_V8HI"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
 (vec_select:V8HI
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..dff68ac0a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi,
  This patch replaces rtx_cost with insn_cost in forward propagation.
In the PR, one constant vector should be propagated and replace a
pseudo in a store insn if we know it's a duplicated constant vector.
It reduces the insn cost but not rtx cost. In this case, the kind of
destination operand (memory or pseudo) decides the cost and rtx cost
can't reflect it.

  The test case is added in the second target specific patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for next stage 1?

Thanks
Gui Haochen


ChangeLog
fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern

gcc/
PR target/113325
* fwprop.cc (try_fwprop_subst_pattern): Replace rtx_cost with
insn_cost.


patch.diff
diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0707a234726..b05b2538edc 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -467,20 +467,17 @@ try_fwprop_subst_pattern (obstack_watermark , 
insn_change _change,
   redo_changes (0);
 }

-  /* ??? In theory, it should be better to use insn costs rather than
- set_src_costs here.  That would involve replacing this code with
- change_is_worthwhile.  */
   bool ok = recog (attempt, use_change);
   if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
-if (rtx use_set = single_set (use_rtl))
+if (single_set (use_rtl))
   {
bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
+   auto new_cost = insn_cost (use_rtl, speed);
temporarily_undo_changes (0);
-   auto old_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
+   /* Invalidate recog data.  */
+   INSN_CODE (use_rtl) = -1;
+   auto old_cost = insn_cost (use_rtl, speed);
redo_changes (0);
-   auto new_cost = set_src_cost (SET_SRC (use_set),
- GET_MODE (SET_DEST (use_set)), speed);
if (new_cost > old_cost)
  {
if (dump_file)


[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-01-15 Thread HAO CHEN GUI
Hi,
  This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

  The test cases will be added in subsequent target specific patch.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
expand: Add const0 move checking for CLEAR_BY_PIECES optabs

vec_duplicate handles duplicates of non-constant inputs.  The 0 is a
constant.  So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move.  This patch adds
the checking.

gcc/
* expr.cc (by_pieces_mode_supported_p): Add const0 move checking
for CLEAR_BY_PIECES.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 34f5ff90a9f..cd960349a53 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op)
 static bool
 by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
-  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+  enum insn_code icode = optab_handler (mov_optab, mode);
+  if (icode == CODE_FOR_nothing)
 return false;

-  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  if (op == SET_BY_PIECES
   && VECTOR_MODE_P (mode)
   && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
 return false;

+  if (op == CLEAR_BY_PIECES
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing
+  && !insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+return false;
+
   if (op == COMPARE_BY_PIECES
   && !can_compare_p (EQ, mode, ccp_jump))
 return false;


Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-15 Thread HAO CHEN GUI
Hi Kewen,

在 2024/1/15 14:16, Kewen.Lin 写道:
> Considering it's stage 4 now and the impact of this patch, let's defer
> this to next stage 1, if possible could you organize the above changes
> into patches:
> 
> 1) Refactor expand_compare_loop by splitting into two functions without
>any functional changes.
> 2) Remove some useless codes like 2, 4, 5.
> 3) Some more enhancements like 1, 3, 6.
> 
> ?  It would be helpful for the review.  Thanks!

Thanks for your review comments. I will re-organize it at new stage 1.


[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-11 Thread HAO CHEN GUI
Hi,
  On P9 "setb" is used to set the result of block compare. So it works
with m32 and mpowerpc64. On P8, carry bit is used. So it can't work
with m32 and mpowerpc64. This patch enables block compare expand for
m32 and mpowerpc64 on P9.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Enable block compare expand on P9 with m32 and mpowerpc64

gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Enable
P9 with m32 and mpowerpc64.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64.
* gcc.target/powerpc/block-cmp-4.c: Likewise.
* gcc.target/powerpc/block-cmp-8.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 018b87f2501..346708071b5 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1677,11 +1677,12 @@ expand_block_compare (rtx operands[])
   /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
   gcc_assert (TARGET_POPCNTD);

-  /* This case is complicated to handle because the subtract
- with carry instructions do not generate the 64-bit
- carry and so we must emit code to calculate it ourselves.
- We choose not to implement this yet.  */
-  if (TARGET_32BIT && TARGET_POWERPC64)
+  /* For P8, this case is complicated to handle because the subtract
+ with carry instructions do not generate the 64-bit carry and so
+ we must emit code to calculate it ourselves.  We skip it on P8
+ but setb works well on P9.  */
+  if (TARGET_32BIT && TARGET_POWERPC64
+  && !TARGET_P9_MISC)
 return false;

   /* Allow this param to shut off all expansion.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
index bcf0cb2ab4f..cd076cf1dce 100644
--- a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-vsx" } */
+/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
 /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */

 /* Test that it still can do expand for memcmpsi instead of calling library
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
index c86febae68a..9373b53a3a4 100644
--- a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target be } } */
 /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
 /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */

 /* Test that it does expand for memcmpsi instead of calling library on
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
new file mode 100644
index 000..b470f873973
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
@@ -0,0 +1,8 @@
+/* { dg-do run { target ilp32 } } */
+/* { dg-options "-O2 -m32 -mpowerpc64" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+/* { dg-timeout-factor 2 } */
+
+/* Verify memcmp on m32 mpowerpc64 */
+
+#include "../../gcc.dg/memcmp-1.c"


Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi Richard,
   Thanks so much for your comments.


>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-string.cc 
>> b/gcc/config/rs6000/rs6000-string.cc
>> index 7f777666ba9..4c9b2cbeefc 100644
>> --- a/gcc/config/rs6000/rs6000-string.cc
>> +++ b/gcc/config/rs6000/rs6000-string.cc
>> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>> }
>>
>>dest = adjust_address (orig_dest, mode, offset);
>> -
>> +  /* Set the alignment of dest to the size of mode in order to
>> +avoid unnecessary byte swaps on LE.  */
>> +  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
> 
> but the alignment is now wrong which might cause ripple-down
> wrong-code effects, no?
> 
> It's probably bad to hide the byte-swapping in the move patterns (I'm
> just guessing
> you do that)

Here I just change the alignment of "dest" which is temporary used for
move. The orig_dest is untouched and keep the original alignment. The
subsequent insns which use orig_dest are not affected. I am not sure if
it causes ripple-down effects. Do you mean the dest might be reused
later? But I think the alignment is different even though the mode and
offset is the same.

Looking forward to your advice.

Thanks
Gui Haochen


[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi,
  This patch eliminates unnecessary byte swaps for block clear on P8
LE. For block clear, all the bytes are set to zero. The byte order
doesn't make sense. So the alignment of destination could be set to
the store mode size in stead of 1 byte in order to eliminates
unnecessary byte swap instructions on P8 LE. The test case shows the
problem.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE

gcc/
PR target/113325
* config/rs6000/rs6000-string.cc (expand_block_clear): Set the
alignment of destination to the size of mode.

gcc/testsuite/
PR target/113325
* gcc.target/powerpc/pr113325.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 7f777666ba9..4c9b2cbeefc 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
}

   dest = adjust_address (orig_dest, mode, offset);
-
+  /* Set the alignment of dest to the size of mode in order to
+avoid unnecessary byte swaps on LE.  */
+  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
   emit_move_insn (dest, CONST0_RTX (mode));
 }

diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
b/gcc/testsuite/gcc.target/powerpc/pr113325.c
new file mode 100644
index 000..4a3cae019c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
+
+void* foo (void* s1)
+{
+  return __builtin_memset (s1, 0, 32);
+}


[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-09 Thread HAO CHEN GUI
Hi,
  This patch refactors function expand_compare_loop and split it to two
functions. One is for fixed length and another is for variable length.
These two functions share some low level common help functions.

  Besides above changes, the patch also does:
1. Don't generate load and compare loop when max_bytes is less than
loop bytes.
2. Remove do_load_mask_compare as it's no needed. All sub-targets
entering the function should support efficient overlapping load and
compare.
3. Implement an variable length overlapping load and compare for the
case which remain bytes is less than the loop bytes in variable length
compare. The 4k boundary test and one-byte load and compare loop are
removed as they're no need now.
4. Remove the codes for "bytes > max_bytes" with fixed length as the
case is already excluded by pre-checking.
5. Remove running time codes for "bytes > max_bytes" with variable length
as it should jump to call library at the beginning.
6. Enhance do_overlap_load_compare to avoid overlapping load and compare
when the remain bytes can be loaded and compared by a smaller unit.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Refactor expand_compare_loop and split it to two functions

The original expand_compare_loop has a complicated logical as it's
designed for both fixed and variable length.  This patch splits it to
two functions and make these two functions share common help functions.
Also the 4K boundary test and corresponding one byte load and compare
are replaced by variable length overlapping load and compare.  The
do_load_mask_compare is removed as all sub-targets entering the function
has efficient overlapping load and compare so that mask load is no needed.

gcc/
* config/rs6000/rs6000-string.cc (do_isel): Remove.
(do_load_mask_compare): Remove.
(do_reg_compare): New.
(do_load_and_compare): New.
(do_overlap_load_compare): Do load and compare with a small unit
other than overlapping load and compare when the remain bytes can
be done by one instruction.
(expand_compare_loop): Remove.
(get_max_inline_loop_bytes): New.
(do_load_compare_rest_of_loop): New.
(generate_6432_conversion): Set it to a static function and move
ahead of gen_diff_handle.
(gen_diff_handle): New.
(gen_load_compare_loop): New.
(gen_library_call): New.
(expand_compare_with_fixed_length): New.
(expand_compare_with_variable_length): New.
(expand_block_compare): Call expand_compare_with_variable_length
to expand block compare for variable length.  Call
expand_compare_with_fixed_length to expand block compare loop for
fixed length.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-5.c: New.
* gcc.target/powerpc/block-cmp-6.c: New.
* gcc.target/powerpc/block-cmp-7.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index f707bb2727e..018b87f2501 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison,
   LABEL_NUSES (true_label) += 1;
 }

-/* Emit an isel of the proper mode for DEST.
-
-   DEST is the isel destination register.
-   SRC1 is the isel source if CR is true.
-   SRC2 is the isel source if CR is false.
-   CR is the condition for the isel.  */
-static void
-do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr)
-{
-  if (GET_MODE (dest) == DImode)
-emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr));
-  else
-emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr));
-}
-
 /* Emit a subtract of the proper mode for DEST.

DEST is the destination register for the subtract.
@@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2)
 emit_insn (gen_rotlsi3 (dest, src1, src2));
 }

-/* Generate rtl for a load, shift, and compare of less than a full word.
-
-   LOAD_MODE is the machine mode for the loads.
-   DIFF is the reg for the difference.
-   CMP_REM is the reg containing the remaining bytes to compare.
-   DCOND is the CCUNS reg for the compare if we are doing P9 code with setb.
-   SRC1_ADDR is the first source address.
-   SRC2_ADDR is the second source address.
-   ORIG_SRC1 is the original first source block's address rtx.
-   ORIG_SRC2 is the original second source block's address rtx.  */
+/* Do the compare for two registers.  */
 static void
-do_load_mask_compare (const machine_mode load_mode, rtx diff, rtx cmp_rem, rtx 
dcond,
- rtx src1_addr, rtx src2_addr, rtx orig_src1, rtx 
orig_src2)
+do_reg_compare (bool use_vec, rtx vec_result, rtx diff, rtx *dcond, rtx d1,
+   rtx d2)
 {
-  HOST_WIDE_INT load_mode_size = GET_MODE_SIZE (load_mode);
-  rtx shift_amount = gen_reg_rtx (word_mode);
-  rtx d1 = gen_reg_rtx 

[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-20 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checkings of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Remove P7 processor test as only P7 above can enter this function and
P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
performance of expand is better than the performance of library when
the length is long.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640833.html
the main change is to split optimization for size to a separate patch
and add a testcase for P7 BE.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up the pre-checkings of expand_block_compare

Remove P7 CPU test as only P7 above can enter this function and P7 LE is
excluded by the checking of targetm.slow_unaligned_access on word_mode.
Also performance test shows the expand of block compare is better than
library on P7 BE when the length is from 16 bytes to 64 bytes.

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Remove P7 CPU test and let
P7 BE do the expand.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-4.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 5149273b80e..09db57255fa 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1947,15 +1947,12 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POPCNTD);
+
   if (optimize_insn_for_size_p ())
 return false;

-  rtx target = operands[0];
-  rtx orig_src1 = operands[1];
-  rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
-
   /* This case is complicated to handle because the subtract
  with carry instructions do not generate the 64-bit
  carry and so we must emit code to calculate it ourselves.
@@ -1963,23 +1960,19 @@ expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
   /* Allow this param to shut off all expansion.  */
   if (rs6000_block_compare_inline_limit == 0)
 return false;

-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
-return false;
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];

-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+  if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2027,14 +2020,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT)
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
new file mode 100644
index 000..c86febae68a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target be } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+/* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */
+
+/* Test that it does expand for memcmpsi instead of calling library on
+   P7 BE when length is less than 32 bytes.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 31);
+}


[Patch, rs6000] Call library for block memory compare when optimizing for size

2023-12-20 Thread HAO CHEN GUI
Hi,
  This patch call library function for block memory compare when it's
optimized for size.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Call library for block memory compare when optimizing for size

gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Return
false when optimizing for size.

gcc/testsuite/
* gcc.target/powerpc/block-cm-3.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 05dc41622f4..5149273b80e 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1947,6 +1947,9 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  if (optimize_insn_for_size_p ())
+return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}



[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-20 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of
slow_unaligned_access.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html
the main change is to pass alignment measured by bits to
slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard the platform which is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and call slow_unaligned_access to judge if
fixed point unaligned load/store is efficient or not.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Remove.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
targetm.slow_unaligned_access.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: New.
* gcc.target/powerpc/block-cmp-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..05dc41622f4 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (SImode, align * BITS_PER_UNIT)
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (word_mode, align * BITS_PER_UNIT)
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,8 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode,
+  align * BITS_PER_UNIT))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2027,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT)
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2169,8 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode,
+  align * BITS_PER_UNIT))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..3971a56c588 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,10 +483,6 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
-
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
memory support.  */
diff --git 

[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-17 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checkings of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 processor test as only P7 above can enter this function and
P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
performance of expand is better than the performance of library when
the length is long.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html
the main change is to add some comments and move the variable definition
closed to its use.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up the pre-checkings of expand_block_compare

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Return false (call library)
when it's optimized for size.  Remove P7 CPU test as only P7 above
can enter this function and P7 LE is excluded by the checking of
targetm.slow_unaligned_access on word_mode.  Also performance test
shows the expand of block compare with 16 bytes to 64 bytes length
is better than library on P7 BE.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-3.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index cb9eeef05d8..49670cef4d7 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  rtx target = operands[0];
-  rtx orig_src1 = operands[1];
-  rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
+  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POPCNTD);

-  /* This case is complicated to handle because the subtract
- with carry instructions do not generate the 64-bit
- carry and so we must emit code to calculate it ourselves.
- We choose not to implement this yet.  */
-  if (TARGET_32BIT && TARGET_POWERPC64)
+  if (optimize_insn_for_size_p ())
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
   /* Allow this param to shut off all expansion.  */
   if (rs6000_block_compare_inline_limit == 0)
 return false;

-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
+  /* This case is complicated to handle because the subtract
+ with carry instructions do not generate the 64-bit
+ carry and so we must emit code to calculate it ourselves.
+ We choose not to implement this yet.  */
+  if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}


[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-17 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of
slow_unaligned_access.

  Compared with last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html
the main change is to replace the macro with slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard the platform which is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and call slow_unaligned_access to judge if
fixed point unaligned load/store is efficient or not.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Remove.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
targetm.slow_unaligned_access.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: New.
* gcc.target/powerpc/block-cmp-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..cb9eeef05d8 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (SImode, align)
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && !targetm.slow_unaligned_access (word_mode, align)
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode, align))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (!targetm.slow_unaligned_access (load_mode, align))
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..3971a56c588 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,10 +483,6 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
-
 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
memory support.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
new file mode 100644
index 000..bcf0cb2ab4f
--- /dev/null
+++ 

[Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-10 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checking of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 CPU test as only P7 above can enter this function and P7
LE is excluded by targetm.slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up pre-checking of expand_block_compare

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Return false when it's
optimized for size.  Remove P7 CPU test as only P7 above can enter
this function and P7 LE is excluded by the checking of
targetm.slow_unaligned_access on word_mode.

gcc/testsuite/
* gcc.target/powerpc/memcmp_for_size.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index d4030854b2a..dff69e90d0c 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  gcc_assert (TARGET_POPCNTD);
+
+  if (optimize_insn_for_size_p ())
+return false;
+
+  /* Allow this param to shut off all expansion.  */
+  if (rs6000_block_compare_inline_limit == 0)
+return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
@@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
-  /* Allow this param to shut off all expansion.  */
-  if (rs6000_block_compare_inline_limit == 0)
-return false;
-
-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
-return false;
-
-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c 
b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}


[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-10 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a
comprehensible name.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard whether a platform is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which
is true on P7 BE and P8 above.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Rename to...
(TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to
!STRICT_ALIGNMENT.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT.
(select_block_compare_mode): Likewise.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New.
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..d4030854b2a 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..2f3a82942c1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,9 +483,9 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
+/* Like TARGET_EFFICIENT_UNALIGNED_VSX, indicates if unaligned fixed point
+   loads/stores are efficient.  */
+#define TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT (!STRICT_ALIGNMENT)

 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
diff --git 
a/gcc/testsuite/gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c 

  1   2   3   4   5   >