Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-07 Thread Hao Liu OS
> Yes, I can see that.  I think the patch is OK with a minor nit - can you
> please document the nothrow flag usage in TREE_CHREC in
> tree-core.h?  There's a big comment doing flags documentation:

Thanks, committed with the new documentation:
https://gcc.gnu.org/g:2efe3a7de0107618397264017fb045f237764cc7

Thanks,
Hao.


From: Richard Biener 
Sent: Thursday, December 7, 2023 22:12
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree 
with a nonwrapping flag

On Thu, Dec 7, 2023 at 9:59 AM Hao Liu OS  wrote:
>
> > Can you try to do some statistics on say SPEC CPU?  I'm usually
> > building (with -j1) with -fopt-info-vec and diff build logs, you can then 
> > see
> > how many more loops (and which) we vectorize additionally?
>
> I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
> lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for 
> int/fp):
>
> | O3/Ofast| before | after | additionally vectorized |
> | --- | -- | - | --- |
> | 502.gcc_r   | 1075   | 1076  | reload1.c:1934  |
> | 510.parest_r| 9818   | 9819  | fe_dgp_monomial.cc:104  |
> | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650  |
> | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |
>
> All of them access arrays with unsigned offsets, which are previously thought
> can be overflow. E.g., the case in 502.gcc:
>
> unsigned int i;
> unsigned int this_nregs = ...;
>
> for (j = 1; j < this_nregs; j++)
>   {
> this_cost += spill_add_cost[regno + j];
> if ((TEST_HARD_REG_BIT (not_usable, regno + j))
>   || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
>   ok = 0;
>   }
>
> However, as they are not hot, the performance is not affected. I measured the 
> build time, which is also not affected. With "-flto", more benchmarks (12 in 
> total) will be affected (details are not analyzed):
>
> | O3/Ofast + -flto | before | after | diff |
> |  | -- | - |  |
> | 502.gcc_r| 979| 980   | +1   |
> | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
> | 508.namd_r   | 858| 857   | -1   |
> | 510.parest_r | 1575   | 1577  | +2   |
> | 511.povray_r | 810| 812   | +2   |
> | 521.wrf_r| 8769   | 8763  | -6   |
> | 523.xalancbmk_r  | 3959   | 3979  | +20  |
> | 526.blender_r| 4580   | 4575  | -5   |
> | 527.cam4_r   | 2371   | 2370  | -1   |
> | 538.imagick_r| 462| 461   | -1   |
> | 549.fotonik3d_r  | 436| 437   | +1   |
> | 554.roms_r   | 852| 851   | -1   |
>
> I think using unsigned index to access array should be rare. Programmers
> tend to use "for (int i; ...)" instead of unsigned values. But there may be
> special requirements. This opportunity is found in a real application, which
> has hot loops with such unsigned access pattern, and it can get huge
> improvements.

Yes, I can see that.  I think the patch is OK with a minor nit - can you
please document the nothrow flag usage in TREE_CHREC in
tree-core.h?  There's a big comment doing flags documentation:


/* The following table lists the uses of each of the above flags and
   for which types of nodes they are defined.
...

OK with that change.   Let's see how it goes ...

Thanks,
Richard.

> Thanks,
> Hao
>
> ____________
> From: Richard Biener 
> Sent: Wednesday, December 6, 2023 19:49
> To: Hao Liu OS
> Cc: GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree 
> with a nonwrapping flag
>
> On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS  
> wrote:
> >
> > Hi,
> >
> > Update the patch to fix problems in the test case:
> >  - add "-details" option to the dump command
> >  - add dg-require and target filters to avoid potential failures on 
> > platforms that don't support vectorization.
>
> Interesting simple trick - the downside is that this makes the
> recursive dependence
> of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
> set the flag on CHRECs that are not necessarily cached, so I'm not sure how
> effective this will be ...
>
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?
>
> Thanks,
> Richard.
>
> > Than

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-07 Thread Richard Biener
On Thu, Dec 7, 2023 at 9:59 AM Hao Liu OS  wrote:
>
> > Can you try to do some statistics on say SPEC CPU?  I'm usually
> > building (with -j1) with -fopt-info-vec and diff build logs, you can then 
> > see
> > how many more loops (and which) we vectorize additionally?
>
> I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
> lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for 
> int/fp):
>
> | O3/Ofast| before | after | additionally vectorized |
> | --- | -- | - | --- |
> | 502.gcc_r   | 1075   | 1076  | reload1.c:1934  |
> | 510.parest_r| 9818   | 9819  | fe_dgp_monomial.cc:104  |
> | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650  |
> | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |
>
> All of them access arrays with unsigned offsets, which are previously thought
> can be overflow. E.g., the case in 502.gcc:
>
> unsigned int i;
> unsigned int this_nregs = ...;
>
> for (j = 1; j < this_nregs; j++)
>   {
> this_cost += spill_add_cost[regno + j];
> if ((TEST_HARD_REG_BIT (not_usable, regno + j))
>   || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
>   ok = 0;
>   }
>
> However, as they are not hot, the performance is not affected. I measured the 
> build time, which is also not affected. With "-flto", more benchmarks (12 in 
> total) will be affected (details are not analyzed):
>
> | O3/Ofast + -flto | before | after | diff |
> |  | -- | - |  |
> | 502.gcc_r| 979| 980   | +1   |
> | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
> | 508.namd_r   | 858| 857   | -1   |
> | 510.parest_r | 1575   | 1577  | +2   |
> | 511.povray_r | 810| 812   | +2   |
> | 521.wrf_r| 8769   | 8763  | -6   |
> | 523.xalancbmk_r  | 3959   | 3979  | +20  |
> | 526.blender_r| 4580   | 4575  | -5   |
> | 527.cam4_r   | 2371   | 2370  | -1   |
> | 538.imagick_r| 462| 461   | -1   |
> | 549.fotonik3d_r  | 436| 437   | +1   |
> | 554.roms_r   | 852| 851   | -1   |
>
> I think using unsigned index to access array should be rare. Programmers
> tend to use "for (int i; ...)" instead of unsigned values. But there may be
> special requirements. This opportunity is found in a real application, which
> has hot loops with such unsigned access pattern, and it can get huge
> improvements.

Yes, I can see that.  I think the patch is OK with a minor nit - can you
please document the nothrow flag usage in TREE_CHREC in
tree-core.h?  There's a big comment doing flags documentation:


/* The following table lists the uses of each of the above flags and
   for which types of nodes they are defined.
...

OK with that change.   Let's see how it goes ...

Thanks,
Richard.

> Thanks,
> Hao
>
> ____________
> From: Richard Biener 
> Sent: Wednesday, December 6, 2023 19:49
> To: Hao Liu OS
> Cc: GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree 
> with a nonwrapping flag
>
> On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS  
> wrote:
> >
> > Hi,
> >
> > Update the patch to fix problems in the test case:
> >  - add "-details" option to the dump command
> >  - add dg-require and target filters to avoid potential failures on 
> > platforms that don't support vectorization.
>
> Interesting simple trick - the downside is that this makes the
> recursive dependence
> of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
> set the flag on CHRECs that are not necessarily cached, so I'm not sure how
> effective this will be ...
>
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?
>
> Thanks,
> Richard.
>
> > Thanks,
> > -Hao
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/112774
> > * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> > printed with additional  info.
> > * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> > nonwrapping_chrec_p to set and check the new flag respectively.
> > * tree-scalar-evolution.h: Likewise.
> > * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> > infer_loop_bounds_from_pointer_ar

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-07 Thread Hao Liu OS
> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?

I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for int/fp):

| O3/Ofast| before | after | additionally vectorized |
| --- | -- | - | --- |
| 502.gcc_r   | 1075   | 1076  | reload1.c:1934  |
| 510.parest_r| 9818   | 9819  | fe_dgp_monomial.cc:104  |
| 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650  |
| 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |

All of them access arrays with unsigned offsets, which are previously thought
can be overflow. E.g., the case in 502.gcc:

unsigned int i;
unsigned int this_nregs = ...;

for (j = 1; j < this_nregs; j++)
  {
this_cost += spill_add_cost[regno + j];
if ((TEST_HARD_REG_BIT (not_usable, regno + j))
  || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
  ok = 0;
  }

However, as they are not hot, the performance is not affected. I measured the 
build time, which is also not affected. With "-flto", more benchmarks (12 in 
total) will be affected (details are not analyzed):

| O3/Ofast + -flto | before | after | diff |
|  | -- | - |  |
| 502.gcc_r| 979| 980   | +1   |
| 507.cactuBSSN_r  | 3454   | 3458  | +4   |
| 508.namd_r   | 858| 857   | -1   |
| 510.parest_r | 1575   | 1577  | +2   |
| 511.povray_r | 810| 812   | +2   |
| 521.wrf_r| 8769   | 8763  | -6   |
| 523.xalancbmk_r  | 3959   | 3979  | +20  |
| 526.blender_r| 4580   | 4575  | -5   |
| 527.cam4_r   | 2371   | 2370  | -1   |
| 538.imagick_r| 462| 461   | -1   |
| 549.fotonik3d_r  | 436| 437   | +1   |
| 554.roms_r   | 852| 851   | -1   |

I think using unsigned index to access array should be rare. Programmers
tend to use "for (int i; ...)" instead of unsigned values. But there may be
special requirements. This opportunity is found in a real application, which
has hot loops with such unsigned access pattern, and it can get huge
improvements.

Thanks,
Hao


From: Richard Biener 
Sent: Wednesday, December 6, 2023 19:49
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree 
with a nonwrapping flag

On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS  wrote:
>
> Hi,
>
> Update the patch to fix problems in the test case:
>  - add "-details" option to the dump command
>  - add dg-require and target filters to avoid potential failures on platforms 
> that don't support vectorization.

Interesting simple trick - the downside is that this makes the
recursive dependence
of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
set the flag on CHRECs that are not necessarily cached, so I'm not sure how
effective this will be ...

Can you try to do some statistics on say SPEC CPU?  I'm usually
building (with -j1) with -fopt-info-vec and diff build logs, you can then see
how many more loops (and which) we vectorize additionally?

Thanks,
Richard.

> Thanks,
> -Hao
>
> gcc/ChangeLog:
>
> PR tree-optimization/112774
> * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
> printed with additional  info.
> * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
> nonwrapping_chrec_p to set and check the new flag respectively.
> * tree-scalar-evolution.h: Likewise.
> * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
> infer_loop_bounds_from_pointer_arith, 
> infer_loop_bounds_from_signedness,
> scev_probably_wraps_p): call record_nonwrapping_chrec before
> record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is 
> set and
> return false from scev_probably_wraps_p.
> * tree-vect-loop.cc (vect_analyze_loop): call
> free_numbers_of_iterations_estimates explicitly.
> * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
> to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++
>  gcc/tree-pretty-print.cc|  2 +-
>  gcc/tree-scalar-evolution.cc| 24 
>  gcc/tree-scalar-evolution.h |  2 ++
>  gcc/tree-ssa-loop-niter.cc  | 21 +++

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-06 Thread Richard Biener
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, 
> struct affine_iv *,
>bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
> the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void 
> *dta)
>
>/* If access is not executed on every iteration, we must ensure that 
> overlow
>   may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -  && scev_probably_wraps_p (NULL_TREE,
> -   initial_condition_in_loop_num (ev, loop->num),
> -   step, data->stmt, loop, true))
> -upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +{
> +  if (scev_probably_wraps_p (NULL_TREE,
> +initial_condition_in_loop_num (ev, 
> loop->num),
> +step, data->stmt, loop, true))
> +   upper = false;
> +}
> +  else
> +record_nonwrapping_chrec (ev);
>
>record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
> upper);
>return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, 
> gimple *stmt)
>if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>  low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE 
> (type)));
>
> +  record_nonwrapping_chrec (scev);
>record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, 
> gimple *stmt)
>high = wide_int_to_tree (type, r.upper_bound ());
>  }
>
> +  record_nonwrapping_chrec (scev);
>record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>if (loop_exits_before_overflow (base, step, at_stmt, loop))
>  return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., 
> the
> + above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +return false;
> +
>/* At this point we still don't have a proof that the iv does not
>   overflow: give up.  */
>return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3df020d2228..7f278e2d8f4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>  analysis are done under the assumptions.  */
>loop_constraint_set (loop, LOOP_C_FINITE);
>  }
> +  else
> +/* Clear the existing niter information to make sure the nonwrapping flag
> +   will be calculated and set propriately.  */
> +free_numbers_of_iterations_estimates (loop);
>
>auto_vector_modes vector_modes;
>/* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)  TREE_OPERAND (POLYNOMIAL_CHREC_CHECK 
> (NODE), 0)
> -#define CHREC_RIGHT(NODE)     TREE_OPERAND (POLYNOMIAL_CHREC_CHECK 
> (NODE), 1)
> -#define CHREC_VARIABLE(NODE)  POLYNOMIAL_CHREC_CHECK 
> (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE),

Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-06 Thread Hao Liu OS
dx_infer_loop_bounds (tree base, tree *idx, void *dta)

   /* If access is not executed on every iteration, we must ensure that overlow
  may not make the access valid later.  */
-  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
-  && scev_probably_wraps_p (NULL_TREE,
-   initial_condition_in_loop_num (ev, loop->num),
-   step, data->stmt, loop, true))
-upper = false;
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
+{
+  if (scev_probably_wraps_p (NULL_TREE,
+initial_condition_in_loop_num (ev, loop->num),
+step, data->stmt, loop, true))
+   upper = false;
+}
+  else
+record_nonwrapping_chrec (ev);

   record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
upper);
   return true;
@@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, 
gimple *stmt)
   if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
 low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));

+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }

@@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, 
gimple *stmt)
   high = wide_int_to_tree (type, r.upper_bound ());
 }

+  record_nonwrapping_chrec (scev);
   record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
 }

@@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
   if (loop_exits_before_overflow (base, step, at_stmt, loop))
 return false;

+  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
+ above loop exits before overflow).  */
+  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
+return false;
+
   /* At this point we still don't have a proof that the iv does not
  overflow: give up.  */
   return true;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3df020d2228..7f278e2d8f4 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
 analysis are done under the assumptions.  */
   loop_constraint_set (loop, LOOP_C_FINITE);
 }
+  else
+/* Clear the existing niter information to make sure the nonwrapping flag
+   will be calculated and set propriately.  */
+free_numbers_of_iterations_estimates (loop);

   auto_vector_modes vector_modes;
   /* Autodetect first vector size we try.  */
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375..59af8920f02 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
 #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))

 /* Accessors for the chains of recurrences.  */
-#define CHREC_LEFT(NODE)  TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 
0)
-#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 
1)
-#define CHREC_VARIABLE(NODE)  POLYNOMIAL_CHREC_CHECK 
(NODE)->base.u.chrec_var
+#define CHREC_LEFT(NODE)TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
+#define CHREC_RIGHT(NODE)   TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
+#define CHREC_VARIABLE(NODE)POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
+/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
+#define CHREC_NOWRAP(NODE)  POLYNOMIAL_CHREC_CHECK 
(NODE)->base.nothrow_flag

 /* LABEL_EXPR accessor. This gives access to the label associated with
the given label expression.  */
--
2.34.1


________
From: Hao Liu OS 
Sent: Monday, December 4, 2023 18:05
To: GCC-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with 
a nonwrapping flag

Loop vecotorization can not optimize following case due to SCEV is not affine
failure (i+offset may overflow):

int A[1024 * 2];

int foo (unsigned offset, unsigned N)
{
  int sum = 0;
  for (unsigned i = 0; i < N; i++)
sum += A[i + offset];
  return sum;
}

Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
can not overflow) by inferring from undefined behaviors like array access (see
record_nonwrapping_iv). But this information is not shared to SCEV yet. This
patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
vectorization could succeed.

The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
"{offset, +, 1}_1" -> "{offset, +, 1}_1" (nw is short for nonwrapping). Two
SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
set and check the flag

[PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

2023-12-04 Thread Hao Liu OS
Loop vecotorization can not optimize following case due to SCEV is not affine
failure (i+offset may overflow):

int A[1024 * 2];

int foo (unsigned offset, unsigned N) 
{
  int sum = 0;
  for (unsigned i = 0; i < N; i++)
sum += A[i + offset];
  return sum;
}

Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
can not overflow) by inferring from undefined behaviors like array access (see
record_nonwrapping_iv). But this information is not shared to SCEV yet. This
patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it 
to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop 
vectorization could succeed.

The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
"{offset, +, 1}_1" -> "{offset, +, 1}_1" (nw is short for nonwrapping). Two
SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to 
set and check the flag respectively.

However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
reset_scev_htab), which may not be synchronized with the calling of
free_numbers_of_iterations_estimates, which set the loop->estimate_state to
EST_NOT_COMPUTED and make sure the above inferring from array access is called.
In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
cache if the loop->estimate_state is not reset.
E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the 
estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
loop vectorization.

This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
vect_analyze_loop, which will make sure the flag is always set propriately in
loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
only), if there is more reasonable method, I will revert the simple fix and try
that.

This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions. 
OK for trunk?

---
The patch is as following:

[PATCH] SCEV: extend the chrec tree with a nonwrapping flag
 [PR112774]

The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
"{offset, +, 1}_1" to "{offset, +, 1}_1" (nw is short for nonwrapping).
Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
added to set and check the flag respectively.

As resetting the SCEV cache (i.e., the chrec trees) may not reset the
loop->estimate_state, free_numbers_of_iterations_estimates is called
explicitly in loop vectorization to make sure the flag can be
calculated propriately by niter.

gcc/ChangeLog:

PR tree-optimization/112774
* tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
printed with additional  info.
* tree-scalar-evolution.cc: add record_nonwrapping_chrec and
nonwrapping_chrec_p to set and check the new flag respectively.
* tree-scalar-evolution.h: Likewise.
* tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
scev_probably_wraps_p): call record_nonwrapping_chrec before
record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is 
set and
return false from scev_probably_wraps_p.
* tree-vect-loop.cc (vect_analyze_loop): call
free_numbers_of_iterations_estimates explicitly.
* gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
to represent the nonwrapping info.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/scev-16.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +
 gcc/tree-pretty-print.cc|  2 +-
 gcc/tree-scalar-evolution.cc| 24 
 gcc/tree-scalar-evolution.h |  2 ++
 gcc/tree-ssa-loop-niter.cc  | 21 -
 gcc/tree-vect-loop.cc   |  4 
 gcc/tree.h  |  8 +---
 7 files changed, 69 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
new file mode 100644
index 000..96ea36e4c65
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-vect-scev" } */
+
+int A[1024 * 2];
+
+int foo (unsigned offset, unsigned N)
+{
+  int sum = 0;
+
+  for (unsigned i = 0; i < N; i++)
+sum += A[i + offset];
+
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
+/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is 
not affine" "vect" } } */
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 1fadd752d05..0dabb6d1580 100644
--- a/gcc/tree-pretty-print.cc
+++