RE: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2024-01-08 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 8, 2024 11:29 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina 
> Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases
> 
> The following avoids creating a niter peeling epilog more consistently,
> matching what peeling later uses for the skip_vector condition, in
> particular when versioning is required which then also ensures the
> vector loop is entered unless the epilog is vectorized.  This should
> ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
> later, some refactoring could make that better matching.
> 
> The patch also makes sure to adjust the upper bound of the epilogues
> when we do not have a skip edge around the vector loop.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Tamar, does
> that look OK wrt early-breaks?

Yeah the value looks correct, I did find a few cases where the niters should 
actually be
higher for skip_vector, namely when of the breaks forces ncopies > 1 and we 
have a
break condition that requires all values to be true to continue.

The code is not wrong in that case, just executes a completely useless vector 
iters.

But that's unrelated, this looks correct because it means bound_scalar is not 
set, in
which case there's no difference between one and multiple exits.

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/113026
>   * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
>   Avoid an epilog in more cases.
>   * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
>   epilogues niter upper bounds and estimates.
> 
>   * gcc.dg/torture/pr113026-1.c: New testcase.
>   * gcc.dg/torture/pr113026-2.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 
>  gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +
>  gcc/tree-vect-loop-manip.cc   | 32 +++
>  gcc/tree-vect-loop.cc |  6 -
>  4 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> new file mode 100644
> index 000..56dfef3b36c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst[16];
> +
> +void
> +foo (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> new file mode 100644
> index 000..b9d5857a403
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wall" } */
> +
> +char dst1[17];
> +void
> +foo1 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst1[i] = src[i]; /* { dg-bogus "" } */
> +}
> +
> +char dst2[18];
> +void
> +foo2 (char *src, long n)
> +{
> +  for (long i = 0; i < n; i++)
> +dst2[i] = src[i]; /* { dg-bogus "" } */
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 9330183bfb9..927f76a0947 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
>   bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count
> ();
> bb_before_epilog = loop_preheader_edge (epilog)->src;
>   }
> +  else
> + {
> +   /* When we do not have a loop-around edge to the epilog we know
> +  the vector loop covered at least VF scalar iterations unless
> +  we have early breaks and the epilog will cover at most
> +  VF - 1 + gap peeling iterations.
> +  Update any known upper bound with this knowledge.  */
> +   if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   if (epilog->any_upper_bound)
> + epilog->nb_iterations_upper_bound -= lowest_vf;
> +   if (epilog->any_likely_upper_bound)
> + epilog->nb_iterations_likely_upper_bound -= lowest_vf;
> +   if (epilog->any_estimate)
> + epilog->nb_iterations_estimate -= lowest_vf;
> + }
> +   unsigned HOST_WIDE_INT const_vf;
> +   if

[PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2024-01-08 Thread Richard Biener
The following avoids creating a niter peeling epilog more consistently,
matching what peeling later uses for the skip_vector condition, in
particular when versioning is required which then also ensures the
vector loop is entered unless the epilog is vectorized.  This should
ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
later, some refactoring could make that better matching.

The patch also makes sure to adjust the upper bound of the epilogues
when we do not have a skip edge around the vector loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Tamar, does
that look OK wrt early-breaks?

Thanks,
Richard.

PR tree-optimization/113026
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Avoid an epilog in more cases.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
epilogues niter upper bounds and estimates.

* gcc.dg/torture/pr113026-1.c: New testcase.
* gcc.dg/torture/pr113026-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 
 gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 +
 gcc/tree-vect-loop-manip.cc   | 32 +++
 gcc/tree-vect-loop.cc |  6 -
 4 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
new file mode 100644
index 000..56dfef3b36c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst[16];
+
+void
+foo (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
new file mode 100644
index 000..b9d5857a403
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst1[17];
+void
+foo1 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst1[i] = src[i]; /* { dg-bogus "" } */
+}
+
+char dst2[18];
+void
+foo2 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst2[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 9330183bfb9..927f76a0947 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3364,6 +3364,38 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
bb_before_epilog->count = single_pred_edge 
(bb_before_epilog)->count ();
  bb_before_epilog = loop_preheader_edge (epilog)->src;
}
+  else
+   {
+ /* When we do not have a loop-around edge to the epilog we know
+the vector loop covered at least VF scalar iterations unless
+we have early breaks and the epilog will cover at most
+VF - 1 + gap peeling iterations.
+Update any known upper bound with this knowledge.  */
+ if (! LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+   {
+ if (epilog->any_upper_bound)
+   epilog->nb_iterations_upper_bound -= lowest_vf;
+ if (epilog->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound -= lowest_vf;
+ if (epilog->any_estimate)
+   epilog->nb_iterations_estimate -= lowest_vf;
+   }
+ unsigned HOST_WIDE_INT const_vf;
+ if (vf.is_constant (_vf))
+   {
+ const_vf += LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - 1;
+ if (epilog->any_upper_bound)
+   epilog->nb_iterations_upper_bound
+ = wi::umin (epilog->nb_iterations_upper_bound, const_vf);
+ if (epilog->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound
+ = wi::umin (epilog->nb_iterations_likely_upper_bound,
+ const_vf);
+ if (epilog->any_estimate)
+   epilog->nb_iterations_estimate
+ = wi::umin (epilog->nb_iterations_estimate, const_vf);
+   }
+   }
 
   /* If loop is peeled for non-zero constant times, now niters refers to
 orig_niters - prolog_peeling, it won't overflow even the orig_niters
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a06771611ac..9dd573ef125 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1261,7 +1261,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 the epilogue is unnecessary.  */
  && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
  || ((unsigned HOST_WIDE_INT) max_niter
- > (th / const_vf) * const_vf
+ /* We'd like to 

[PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2023-12-15 Thread Richard Biener
The following avoids creating a niter peeling epilog more consistently,
matching what peeling later uses for the skip_vector condition, in
particular when versioning is required which then also ensures the
vector loop is entered unless the epilog is vectorized.  This should
ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
later, some refactoring could make that better matching.

The patch also makes sure to adjust the upper bound of the epilogues
when we do not have a skip edge around the vector loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Tamar, I assume this will clash with early break vectorization
a bit so I'll defer until after that's in.

Thanks,
Richard.

PR tree-optimization/113026
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Avoid an epilog in more cases.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
epilogues niter upper bounds and estimates.

* gcc.dg/torture/pr113026-1.c: New testcase.
* gcc.dg/torture/pr113026-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 +++
 gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 ++
 gcc/tree-vect-loop-manip.cc   | 13 +
 gcc/tree-vect-loop.cc |  6 +-
 4 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
new file mode 100644
index 000..56dfef3b36c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst[16];
+
+void
+foo (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
new file mode 100644
index 000..b9d5857a403
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst1[17];
+void
+foo1 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst1[i] = src[i]; /* { dg-bogus "" } */
+}
+
+char dst2[18];
+void
+foo2 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst2[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index bcd90a331f5..07a30b7ee98 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3193,6 +3193,19 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
bb_before_epilog->count = single_pred_edge 
(bb_before_epilog)->count ();
  bb_before_epilog = loop_preheader_edge (epilog)->src;
}
+  else
+   {
+ /* When we do not have a loop-around edge to the epilog we know
+the vector loop covered at least VF scalar iterations.  Update
+any known upper bound with this knowledge.  */
+ if (loop->any_upper_bound)
+   epilog->nb_iterations_upper_bound -= constant_lower_bound (vf);
+ if (loop->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound -= constant_lower_bound 
(vf);
+ if (loop->any_estimate)
+   epilog->nb_iterations_estimate -= constant_lower_bound (vf);
+   }
+
   /* If loop is peeled for non-zero constant times, now niters refers to
 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7a3db5f098b..a4dd2caa400 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1260,7 +1260,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 the epilogue is unnecessary.  */
  && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
  || ((unsigned HOST_WIDE_INT) max_niter
- > (th / const_vf) * const_vf
+ /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
+but that's only computed later based on our result.
+The following is the most conservative approximation.  */
+ > (std::max ((unsigned HOST_WIDE_INT) th,
+  const_vf) / const_vf) * const_vf
 return true;
 
   return false;
-- 
2.35.3