RE: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-27 Thread Jirui Wu via Gcc-patches
Hi all,

I now use the type based on the specification of the intrinsic
instead of type based on formal argument. 

I use signed Int vector types because the outputs of the neon builtins
that I am lowering is always signed. In addition, fcode and stmt
does not have information on whether the result is signed.

Because I am replacing the stmt with new_stmt,
a VIEW_CONVERT_EXPR cast is already in the code if needed.
As a result, the result assembly code is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, September 16, 2021 2:59 PM
> To: Jirui Wu 
> Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; i...@airs.com; Richard
> Sandiford 
> Subject: Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to
> gimple
> 
> On Thu, 16 Sep 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch lowers the vld1 and vst1 variants of the store and load
> > neon builtins functions to gimple.
> >
> > The changes in this patch covers:
> > * Replaces calls to the vld1 and vst1 variants of the builtins
> > * Uses MEM_REF gimple assignments to generate better code
> > * Updates test cases to prevent over optimization
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   fold_build2 (MEM_REF,
> +   TREE_TYPE
> +   (gimple_call_lhs (stmt)),
> +   args[0], build_int_cst
> +   (TREE_TYPE (args[0]), 0)));
> 
> you are using TBAA info based on the formal argument type that might have
> pointer conversions stripped.  Instead you should use a type based on the
> specification of the intrinsics (or the builtins).
> 
> Likewise for the type of the access (mind alignment info there!).
> 
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-builtins.c
> (aarch64_general_gimple_fold_builtin):
> > lower vld1 and vst1 variants of the neon builtins
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/fmla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/fmul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mla_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mls_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/mul_intrinsic_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/simd/vmul_elem_1.c:
> > prevent over optimization
> > * gcc.target/aarch64/vclz.c:
> > replace macro with function to prevent over optimization
> > * gcc.target/aarch64/vneg_s.c:
> > replace macro with function to prevent over optimization
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendï¿œrffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
119f67d4e4c9e70e9ab1de773b42a171fbdf423e..124fd35caa01ef4a83dae0626f83efb62c053bd1
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -46,6 +46,7 @@
 #include "emit-rtl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple-fold.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -2387,6 +2388,59 @@ aarch64_general_fold_builtin (unsigned int fcode, tree 
type,
   return NULL_TREE;
 }
 
+enum aarch64_simd_type
+get_mem_type_for_load_store (unsigned int fcode)
+{
+  switch (fcode)
+  {
+VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
+VAR1 (STORE1, st1 , 0, STORE, v8qi)
+  return Int8x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
+VAR1 (STORE1, st1 , 0, STORE, v16qi)
+  return Int8x16_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
+VAR1 (STORE1, st1 , 0, STORE, v4hi)
+  return Int16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
+VAR1 (STORE1, st1 , 0, STORE, v8hi)
+  return Int16x8_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
+VAR1 (STORE1, st1 , 0, STORE, v2si)
+  return Int32x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
+VAR1 (STORE1, st1 , 0, STORE, v4si)
+  return Int32x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v2di)
+VAR1 (STORE1, st1 , 0, STORE, v2di)
+  return Int64x2_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v4hf)
+VAR1 (STORE1, st1 , 0, STORE, v4hf)
+  return Float16x4_t;
+VAR1 (LOAD1, ld1 , 0, LOAD, v8hf)
+VAR1 (STORE1, st1 , 0, STORE, v8hf)
+  return Float16x8_t;
+VAR1 (LOAD1, ld1 , 0, 

FW: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-09-24 Thread Jirui Wu via Gcc-patches
d: Generate IFN_TRUNC.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/merge_trunc1.c: New test.
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 17, 2021 9:13 AM
> > > To: Andrew Pinski 
> > > Cc: Jirui Wu ; Richard Sandiford 
> > > ; i...@airs.com; 
> > > gcc-patches@gcc.gnu.org; rguent...@suse.de
> > > Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for
> > > (double)(int) under -ffast-math on aarch64
> > >
> > > On Mon, Aug 16, 2021 at 8:48 PM Andrew Pinski via Gcc-patches
> > >  wrote:
> > > >
> > > > On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches 
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch generates FRINTZ instruction to optimize type casts.
> > > > >
> > > > > The changes in this patch covers:
> > > > > * Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR 
> > > > > using
> > > IFN_TRUNC.
> > > > > * Change of corresponding test cases.
> > > > >
> > > > > Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master? If OK can it be committed for me, I have no 
> > > > > commit
> rights.
> > > >
> > > > Is there a reason why you are doing the transformation manually 
> > > > inside forwprop rather than handling it inside match.pd?
> > > > Also can't this only be done for -ffast-math case?
> > >
> > > You definitely have to look at the intermediate type - that could 
> > > be a uint8_t or even a boolean type.  So unless the intermediate 
> > > type can represent all float values optimizing to trunc() is invalid.
> > > Also if you emit IFN_TRUNC you have to make sure there's target 
> > > support - we don't emit calls to a library
> > > trunc() from an internal function call (and we wouldn't want to 
> > > optimize it that way).
> > >
> > > Richard.
> > >
> > > >
> > > > Thanks,
> > > > Andrew Pinski
> > > >
> > > > >
> > > > > Thanks,
> > > > > Jirui
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * tree-ssa-forwprop.c (pass_forwprop::execute): 
> > > > > Optimize with
> frintz.
> > > > >
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/aarch64/fix_trunc1.c: Update to new expectation.
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
diff --git a/gcc/match.pd b/gcc/match.pd
index 19cbad7..72e8e91 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3487,6 +3487,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 >= inside_prec - !inside_unsignedp)
  (convert @0)))
 
+/* Detected a fix_trunc cast inside a float type cast,
+  use IFN_TRUNC to optimize.  */
+#if GIMPLE
+(simplify
+   (float (fix_trunc @0))
+   (if (flag_fp_int_builtin_inexact
+   && !flag_trapping_math
+   && types_match (type, TREE_TYPE (@0))
+   && direct_internal_fn_supported_p (IFN_TRUNC, type,
+ OPTIMIZE_FOR_BOTH))
+  (IFN_TRUNC @0)))
+#endif
+
 /* If we have a narrowing conversion to an integral type that is fed by a
BIT_AND_EXPR, we might be able to remove the BIT_AND_EXPR if it merely
masks off bits outside the final type (and nothing else).  */
diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c 
b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
new file mode 100644
index 000..0721706
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+float
+f1 (float x)
+{
+  int y = x;
+
+  return (float) y;
+}
+
+double
+f2 (double x)
+{
+  long y = x;
+
+  return (double) y;
+}
+
+float
+f3 (double x)
+{
+  int y = x;
+
+  return (float) y;
+}
+
+double
+f4 (float x)
+{
+  int y = x;
+
+  return (double) y;
+}
+
+/* { dg-final { scan-assembler "frintz\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "frintz\\td\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "fcvtzs\\tw\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\ts\[0-9\]+, w\[0-9\]+" } } */
+/* { dg-final { scan-assembler "fcvtzs\\tw\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\td\[0-9\]+, w\[0-9\]+" } } */
-- 
2.7.4



[Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-09-16 Thread Jirui Wu via Gcc-patches
Hi all,

This patch lowers the vld1 and vst1 variants of the
store and load neon builtins functions to gimple.

The changes in this patch covers:
* Replaces calls to the vld1 and vst1 variants of the builtins
* Uses MEM_REF gimple assignments to generate better code
* Updates test cases to prevent over optimization

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c 
(aarch64_general_gimple_fold_builtin):
lower vld1 and vst1 variants of the neon builtins

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmla_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/fmls_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/fmul_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mla_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mls_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/mul_intrinsic_1.c:
prevent over optimization
* gcc.target/aarch64/simd/vmul_elem_1.c:
prevent over optimization
* gcc.target/aarch64/vclz.c:
replace macro with function to prevent over optimization
* gcc.target/aarch64/vneg_s.c:
replace macro with function to prevent over optimization
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
eef9fc0f4440d7db359e53a7b4e21e48cf2a65f4..027491414da16b66a7fe922a1b979d97f553b724
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2382,6 +2382,31 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  /*Lower store and load neon builtins to gimple.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
+   if (!BYTES_BIG_ENDIAN)
+ {
+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   fold_build2 (MEM_REF,
+   TREE_TYPE
+   (gimple_call_lhs (stmt)),
+   args[0], build_int_cst
+   (TREE_TYPE (args[0]), 0)));
+ }
+   break;
+  BUILTIN_VALL_F16 (STORE1, st1, 0, STORE)
+   if (!BYTES_BIG_ENDIAN)
+ {
+ new_stmt = gimple_build_assign (fold_build2 (MEM_REF,
+  TREE_TYPE (gimple_call_arg
+(stmt, 1)),
+  gimple_call_arg (stmt, 0),
+  build_int_cst
+  (TREE_TYPE (gimple_call_arg
+ (stmt, 0)), 0)),
+  gimple_call_arg (stmt, 1));
+ }
+   break;
   BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10, ALL)
   BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10, ALL)
new_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
diff --git a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c 
b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
index 
59ad41ed0471b17418c395f31fbe666b60ec3623..bef31c45650dcd088b38a755083e6bd9fe530c52
 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLA(q1, q2, size, in1_lanes, in2_lanes)  \
 static void\
+__attribute__((noipa,noinline))
\
 test_vfma##q1##_lane##q2##_f##size (float##size##_t * res, \
   const float##size##_t *in1,  \
   const float##size##_t *in2)  \
@@ -104,12 +105,12 @@ main (int argc, char **argv)
vfmaq_laneq_f32.  */
 /* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, 
v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 2 } } */
 
-/* vfma_lane_f64.  */
-/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, 
d\[0-9\]+\, d\[0-9\]+" 1 } } */
+/* vfma_lane_f64.
+   vfma_laneq_f64.  */
+/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, 
d\[0-9\]+\, d\[0-9\]+" 2 } } */
 
 /* vfmaq_lane_f64.
-   vfma_laneq_f64.
vfmaq_laneq_f64.  */
-/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, 
v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c 

[PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-09-10 Thread Jirui Wu via Gcc-patches
Hi,

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html

Ok for master? If OK, can it be committed for me, I have no commit rights.

Jirui Wu
-Original Message-
From: Jirui Wu 
Sent: Friday, September 3, 2021 12:39 PM
To: 'Richard Biener' 
Cc: Richard Biener ; Andrew Pinski 
; Richard Sandiford ; 
i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 

Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under 
-ffast-math on aarch64

Ping

-Original Message-
From: Jirui Wu
Sent: Friday, August 20, 2021 4:28 PM
To: Richard Biener 
Cc: Richard Biener ; Andrew Pinski 
; Richard Sandiford ; 
i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 

Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under 
-ffast-math on aarch64

> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 20, 2021 8:15 AM
> To: Jirui Wu 
> Cc: Richard Biener ; Andrew Pinski 
> ; Richard Sandiford ; 
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 
> 
> Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for
> (double)(int) under -ffast-math on aarch64
> 
> On Thu, 19 Aug 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch generates FRINTZ instruction to optimize type casts.
> >
> > The changes in this patch covers:
> > * Generate FRINTZ for (double)(int) casts.
> > * Add new test cases.
> >
> > The intermediate type is not checked according to the C99 spec.
> > Overflow of the integral part when casting floats to integers causes
> undefined behavior.
> > As a result, optimization to trunc() is not invalid.
> > I've confirmed that Boolean type does not match the matching condition.
> >
> > Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +/* Detected a fix_trunc cast inside a float type cast,
> +   use IFN_TRUNC to optimize.  */
> +#if GIMPLE
> +(simplify
> +  (float (fix_trunc @0))
> +  (if (direct_internal_fn_supported_p (IFN_TRUNC, type,
> +  OPTIMIZE_FOR_BOTH)
> +   && flag_unsafe_math_optimizations
> +   && type == TREE_TYPE (@0))
> 
> types_match (type, TREE_TYPE (@0))
> 
> please.  Please perform cheap tests first (the flag test).
> 
> + (IFN_TRUNC @0)))
> +#endif
> 
> why only for GIMPLE?  I'm not sure flag_unsafe_math_optimizations is a 
> good test here.  If you say we can use undefined behavior of any 
> overflow of the fix_trunc operation what do we guard here?
> If it's Inf/NaN input then flag_finite_math_only would be more 
> appropriate, if it's behavior for -0. (I suppose trunc (-0.0) == -0.0 
> and thus "wrong") then a && !HONOR_SIGNED_ZEROS (type) is missing 
> instead.  If it's setting of FENV state and possibly trapping on 
> overflow (but it's undefined?!) then flag_trapping_math covers the 
> latter but we don't have any flag for eliding FENV state affecting 
> transforms, so there the kitchen-sink flag_unsafe_math_optimizations might 
> apply.
> 
> So - which is it?
> 
This change is only for GIMPLE because we can't test for the optab support 
without being in GIMPLE. direct_internal_fn_supported_p is defined only for 
GIMPLE. 

IFN_TRUNC's documentation mentions nothing for zero, NaNs/inf inputs.
So I think the correct guard is just flag_fp_int_builtin_inexact.
!flag_trapping_math because the operation can only still raise inexacts.

The new pattern is moved next to the place you mentioned.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui
> Note there's also the pattern
> 
> /* Handle cases of two conversions in a row.  */ (for ocvt (convert 
> float
> fix_trunc)  (for icvt (convert float)
>   (simplify
>(ocvt (icvt@1 @0))
>(with
> {
> ...
> 
> which is related so please put the new pattern next to that (the set 
> of conversions handled there does not include (float (fix_trunc @0)))
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Generate IFN_TRUNC.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/merge_trunc1.c: New test.
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 17, 2021 9:13 AM
> > > To: Andrew Pinski 
> > > Cc: Jirui Wu ; Richard Sandiford 
> > > ; i...@airs.com; 
> > > gcc-patches@gcc.gnu.org; rguent...@suse.de
> > > Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for
> > > (double)(int) under -f

RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-09-03 Thread Jirui Wu via Gcc-patches
Ping

-Original Message-
From: Jirui Wu 
Sent: Friday, August 20, 2021 4:28 PM
To: Richard Biener 
Cc: Richard Biener ; Andrew Pinski 
; Richard Sandiford ; 
i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 

Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under 
-ffast-math on aarch64

> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 20, 2021 8:15 AM
> To: Jirui Wu 
> Cc: Richard Biener ; Andrew Pinski 
> ; Richard Sandiford ; 
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 
> 
> Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for 
> (double)(int) under -ffast-math on aarch64
> 
> On Thu, 19 Aug 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch generates FRINTZ instruction to optimize type casts.
> >
> > The changes in this patch covers:
> > * Generate FRINTZ for (double)(int) casts.
> > * Add new test cases.
> >
> > The intermediate type is not checked according to the C99 spec.
> > Overflow of the integral part when casting floats to integers causes
> undefined behavior.
> > As a result, optimization to trunc() is not invalid.
> > I've confirmed that Boolean type does not match the matching condition.
> >
> > Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +/* Detected a fix_trunc cast inside a float type cast,
> +   use IFN_TRUNC to optimize.  */
> +#if GIMPLE
> +(simplify
> +  (float (fix_trunc @0))
> +  (if (direct_internal_fn_supported_p (IFN_TRUNC, type,
> +  OPTIMIZE_FOR_BOTH)
> +   && flag_unsafe_math_optimizations
> +   && type == TREE_TYPE (@0))
> 
> types_match (type, TREE_TYPE (@0))
> 
> please.  Please perform cheap tests first (the flag test).
> 
> + (IFN_TRUNC @0)))
> +#endif
> 
> why only for GIMPLE?  I'm not sure flag_unsafe_math_optimizations is a 
> good test here.  If you say we can use undefined behavior of any 
> overflow of the fix_trunc operation what do we guard here?
> If it's Inf/NaN input then flag_finite_math_only would be more 
> appropriate, if it's behavior for -0. (I suppose trunc (-0.0) == -0.0 
> and thus "wrong") then a && !HONOR_SIGNED_ZEROS (type) is missing 
> instead.  If it's setting of FENV state and possibly trapping on 
> overflow (but it's undefined?!) then flag_trapping_math covers the 
> latter but we don't have any flag for eliding FENV state affecting 
> transforms, so there the kitchen-sink flag_unsafe_math_optimizations might 
> apply.
> 
> So - which is it?
> 
This change is only for GIMPLE because we can't test for the optab support 
without being in GIMPLE. direct_internal_fn_supported_p is defined only for 
GIMPLE. 

IFN_TRUNC's documentation mentions nothing for zero, NaNs/inf inputs.
So I think the correct guard is just flag_fp_int_builtin_inexact.
!flag_trapping_math because the operation can only still raise inexacts.

The new pattern is moved next to the place you mentioned.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui
> Note there's also the pattern
> 
> /* Handle cases of two conversions in a row.  */ (for ocvt (convert 
> float
> fix_trunc)  (for icvt (convert float)
>   (simplify
>(ocvt (icvt@1 @0))
>(with
> {
> ...
> 
> which is related so please put the new pattern next to that (the set 
> of conversions handled there does not include (float (fix_trunc @0)))
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Generate IFN_TRUNC.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/merge_trunc1.c: New test.
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 17, 2021 9:13 AM
> > > To: Andrew Pinski 
> > > Cc: Jirui Wu ; Richard Sandiford 
> > > ; i...@airs.com; 
> > > gcc-patches@gcc.gnu.org; rguent...@suse.de
> > > Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for
> > > (double)(int) under -ffast-math on aarch64
> > >
> > > On Mon, Aug 16, 2021 at 8:48 PM Andrew Pinski via Gcc-patches 
> > >  wrote:
> > > >
> > > > On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches 
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch generates FRINTZ instruction to optimize type casts.
> > > > >
> > > > > The changes i

RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-20 Thread Jirui Wu via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 20, 2021 8:15 AM
> To: Jirui Wu 
> Cc: Richard Biener ; Andrew Pinski
> ; Richard Sandiford ;
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers
> 
> Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int)
> under -ffast-math on aarch64
> 
> On Thu, 19 Aug 2021, Jirui Wu wrote:
> 
> > Hi all,
> >
> > This patch generates FRINTZ instruction to optimize type casts.
> >
> > The changes in this patch covers:
> > * Generate FRINTZ for (double)(int) casts.
> > * Add new test cases.
> >
> > The intermediate type is not checked according to the C99 spec.
> > Overflow of the integral part when casting floats to integers causes
> undefined behavior.
> > As a result, optimization to trunc() is not invalid.
> > I've confirmed that Boolean type does not match the matching condition.
> >
> > Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? If OK can it be committed for me, I have no commit rights.
> 
> +/* Detected a fix_trunc cast inside a float type cast,
> +   use IFN_TRUNC to optimize.  */
> +#if GIMPLE
> +(simplify
> +  (float (fix_trunc @0))
> +  (if (direct_internal_fn_supported_p (IFN_TRUNC, type,
> +  OPTIMIZE_FOR_BOTH)
> +   && flag_unsafe_math_optimizations
> +   && type == TREE_TYPE (@0))
> 
> types_match (type, TREE_TYPE (@0))
> 
> please.  Please perform cheap tests first (the flag test).
> 
> + (IFN_TRUNC @0)))
> +#endif
> 
> why only for GIMPLE?  I'm not sure flag_unsafe_math_optimizations is a good
> test here.  If you say we can use undefined behavior of any overflow of the
> fix_trunc operation what do we guard here?
> If it's Inf/NaN input then flag_finite_math_only would be more appropriate, if
> it's behavior for -0. (I suppose trunc (-0.0) == -0.0 and thus "wrong") then a
> && !HONOR_SIGNED_ZEROS (type) is missing instead.  If it's setting of FENV
> state and possibly trapping on overflow (but it's undefined?!) then
> flag_trapping_math covers the latter but we don't have any flag for eliding
> FENV state affecting transforms, so there the kitchen-sink
> flag_unsafe_math_optimizations might apply.
> 
> So - which is it?
> 
This change is only for GIMPLE because we can't test for the optab support 
without being in GIMPLE. direct_internal_fn_supported_p is defined 
only for GIMPLE. 

IFN_TRUNC's documentation mentions nothing for zero, NaNs/inf inputs.
So I think the correct guard is just flag_fp_int_builtin_inexact.
!flag_trapping_math because the operation can only still raise inexacts.

The new pattern is moved next to the place you mentioned.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui
> Note there's also the pattern
> 
> /* Handle cases of two conversions in a row.  */ (for ocvt (convert float
> fix_trunc)  (for icvt (convert float)
>   (simplify
>(ocvt (icvt@1 @0))
>(with
> {
> ...
> 
> which is related so please put the new pattern next to that (the set of
> conversions handled there does not include (float (fix_trunc @0)))
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Jirui
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Generate IFN_TRUNC.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/merge_trunc1.c: New test.
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 17, 2021 9:13 AM
> > > To: Andrew Pinski 
> > > Cc: Jirui Wu ; Richard Sandiford
> > > ; i...@airs.com; gcc-patches@gcc.gnu.org;
> > > rguent...@suse.de
> > > Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for
> > > (double)(int) under -ffast-math on aarch64
> > >
> > > On Mon, Aug 16, 2021 at 8:48 PM Andrew Pinski via Gcc-patches  > > patc...@gcc.gnu.org> wrote:
> > > >
> > > > On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This patch generates FRINTZ instruction to optimize type casts.
> > > > >
> > > > > The changes in this patch covers:
> > > > > * Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using
> > > IFN_TRUNC.
> > > > > * Change of corresponding test cases.
> > > > >
> > > > > Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
>

RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-19 Thread Jirui Wu via Gcc-patches
Hi all,

This patch generates FRINTZ instruction to optimize type casts.

The changes in this patch covers:
* Generate FRINTZ for (double)(int) casts.
* Add new test cases.

The intermediate type is not checked according to the C99 spec. 
Overflow of the integral part when casting floats to integers causes undefined 
behavior.
As a result, optimization to trunc() is not invalid. 
I've confirmed that Boolean type does not match the matching condition.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* match.pd: Generate IFN_TRUNC.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/merge_trunc1.c: New test.

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 17, 2021 9:13 AM
> To: Andrew Pinski 
> Cc: Jirui Wu ; Richard Sandiford
> ; i...@airs.com; gcc-patches@gcc.gnu.org;
> rguent...@suse.de
> Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int)
> under -ffast-math on aarch64
> 
> On Mon, Aug 16, 2021 at 8:48 PM Andrew Pinski via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches
> >  wrote:
> > >
> > > Hi all,
> > >
> > > This patch generates FRINTZ instruction to optimize type casts.
> > >
> > > The changes in this patch covers:
> > > * Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using
> IFN_TRUNC.
> > > * Change of corresponding test cases.
> > >
> > > Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If OK can it be committed for me, I have no commit rights.
> >
> > Is there a reason why you are doing the transformation manually inside
> > forwprop rather than handling it inside match.pd?
> > Also can't this only be done for -ffast-math case?
> 
> You definitely have to look at the intermediate type - that could be a uint8_t
> or even a boolean type.  So unless the intermediate type can represent all
> float values optimizing to trunc() is invalid.  Also if you emit IFN_TRUNC you
> have to make sure there's target support - we don't emit calls to a library
> trunc() from an internal function call (and we wouldn't want to optimize it
> that way).
> 
> Richard.
> 
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Thanks,
> > > Jirui
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.c (pass_forwprop::execute): Optimize with 
> > > frintz.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/fix_trunc1.c: Update to new expectation.


diff
Description: diff


[Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-16 Thread Jirui Wu via Gcc-patches
Hi all,

This patch generates FRINTZ instruction to optimize type casts.

The changes in this patch covers:
* Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using IFN_TRUNC.
* Change of corresponding test cases.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* tree-ssa-forwprop.c (pass_forwprop::execute): Optimize with frintz.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fix_trunc1.c: Update to new expectation.


diff
Description: diff