RE: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Thursday, May 25, 2023 9:06 PM
To: Richard Sandiford 
Cc: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by 
variable amount support

On Thu, 25 May 2023, Richard Sandiford wrote:

> This looks good to me.  Just a couple of very minor cosmetic things:
> 
> juzhe.zh...@rivai.ai writes:
> > @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> > *loop,
> >   continue;
> >   }
> >  
> > -   /* See whether zero-based IV would ever generate all-false masks
> > -  or zero length before wrapping around.  */
> > -   bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> > -
> > -   /* Set up all controls for this group.  */
> > -   test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> > -_seq,
> > -_seq,
> > -loop_cond_gsi, rgc,
> > -niters, niters_skip,
> > -might_wrap_p);
> > +   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> > +   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> > +   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> Coding style is to put each subcondition on a separate line when the 
> whole condition doesn't fit on a single line.  So:
> 
>   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>   || !iv_rgc
>   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> > @@ -2725,6 +2726,17 @@ start_over:
> >&& !vect_verify_loop_lens (loop_vinfo))
> >  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >  
> > +  /* If we're vectorizing an loop that uses length "controls" and
> 
> s/an loop/a loop/(Sorry for not noticing earlier.)
> 
> OK for trunk from my POV with those changes; no need to repost unless 
> your policies require it.  Please give Richi a chance to comment too 
> though.

LGTM as well.

Thanks,
Richard.


Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Biener via Gcc-patches
On Thu, 25 May 2023, Richard Sandiford wrote:

> This looks good to me.  Just a couple of very minor cosmetic things:
> 
> juzhe.zh...@rivai.ai writes:
> > @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> > *loop,
> >   continue;
> >   }
> >  
> > -   /* See whether zero-based IV would ever generate all-false masks
> > -  or zero length before wrapping around.  */
> > -   bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> > -
> > -   /* Set up all controls for this group.  */
> > -   test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> > -_seq,
> > -_seq,
> > -loop_cond_gsi, rgc,
> > -niters, niters_skip,
> > -might_wrap_p);
> > +   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> > +   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> > +   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> Coding style is to put each subcondition on a separate line when the
> whole condition doesn't fit on a single line.  So:
> 
>   if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>   || !iv_rgc
>   || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>   != rgc->max_nscalars_per_iter * rgc->factor))
> 
> > @@ -2725,6 +2726,17 @@ start_over:
> >&& !vect_verify_loop_lens (loop_vinfo))
> >  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >  
> > +  /* If we're vectorizing an loop that uses length "controls" and
> 
> s/an loop/a loop/(Sorry for not noticing earlier.)
> 
> OK for trunk from my POV with those changes; no need to repost unless
> your policies require it.  Please give Richi a chance to comment too
> though.

LGTM as well.

Thanks,
Richard.


Re: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread 钟居哲
Thanks Richard so much.
I have sent V17 patch for commit (fix format as you suggested).
You don't need to reply that.

I am waiting for Richi's final approval.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 20:36
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V16] VECT: Add decrement IV iteration loop control by 
variable amount support
This looks good to me.  Just a couple of very minor cosmetic things:
 
juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> *loop,
>continue;
>}
>  
> - /* See whether zero-based IV would ever generate all-false masks
> -or zero length before wrapping around.  */
> - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> -
> - /* Set up all controls for this group.  */
> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> -  _seq,
> -  _seq,
> -  loop_cond_gsi, rgc,
> -  niters, niters_skip,
> -  might_wrap_p);
> + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> + || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> + != rgc->max_nscalars_per_iter * rgc->factor))
 
Coding style is to put each subcondition on a separate line when the
whole condition doesn't fit on a single line.  So:
 
if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
|| !iv_rgc
|| (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
!= rgc->max_nscalars_per_iter * rgc->factor))
 
> @@ -2725,6 +2726,17 @@ start_over:
>&& !vect_verify_loop_lens (loop_vinfo))
>  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  
> +  /* If we're vectorizing an loop that uses length "controls" and
 
s/an loop/a loop/(Sorry for not noticing earlier.)
 
OK for trunk from my POV with those changes; no need to repost unless
your policies require it.  Please give Richi a chance to comment too
though.
 
Thanks for your patience with the review process.  The final result
seems pretty clean to me.
 
Richard
 


Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Sandiford via Gcc-patches
This looks good to me.  Just a couple of very minor cosmetic things:

juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop 
> *loop,
> continue;
> }
>  
> - /* See whether zero-based IV would ever generate all-false masks
> -or zero length before wrapping around.  */
> - bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
> -
> - /* Set up all controls for this group.  */
> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> -  _seq,
> -  _seq,
> -  loop_cond_gsi, rgc,
> -  niters, niters_skip,
> -  might_wrap_p);
> + if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) || !iv_rgc
> + || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
> + != rgc->max_nscalars_per_iter * rgc->factor))

Coding style is to put each subcondition on a separate line when the
whole condition doesn't fit on a single line.  So:

if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
|| !iv_rgc
|| (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
!= rgc->max_nscalars_per_iter * rgc->factor))

> @@ -2725,6 +2726,17 @@ start_over:
>&& !vect_verify_loop_lens (loop_vinfo))
>  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  
> +  /* If we're vectorizing an loop that uses length "controls" and

s/an loop/a loop/(Sorry for not noticing earlier.)

OK for trunk from my POV with those changes; no need to repost unless
your policies require it.  Please give Richi a chance to comment too
though.

Thanks for your patience with the review process.  The final result
seems pretty clean to me.

Richard


[PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is supporting decrement IV by following the flow designed by Richard:

(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.

(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rgc has 1 control, this step is the SSA name created for that control.
Otherwise the step is a fresh SSA name, as in your patch.

(3) vect_set_loop_controls_directly stores this step somewhere for later
use, probably in LOOP_VINFO.  Let's use "S" to refer to this stored step.

(4) After the vect_set_loop_controls_directly call above, and outside
the "if" statement that now contains vect_set_loop_controls_directly,
check whether rgc->controls.length () > 1.  If so, use
vect_adjust_loop_lens_control to set the controls based on S.

Then the only caller of vect_adjust_loop_lens_control is
vect_set_loop_condition_partial_vectors.  And the starting
step for vect_adjust_loop_lens_control is always S.

This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
passed all testcase in RISC-V port.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_adjust_loop_lens_control): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variable.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c: New 
test.

---
 .../rvv/autovec/partial/multiple_rgroup-3.c   | 288 ++
 .../rvv/autovec/partial/multiple_rgroup-4.c   |  75 +
 .../autovec/partial/multiple_rgroup_run-3.c   |  36 +++
 .../autovec/partial/multiple_rgroup_run-4.c   |  15 +
 gcc/tree-vect-loop-manip.cc   | 135 +++-
 gcc/tree-vect-loop.cc |  12 +
 gcc/tree-vectorizer.h |   8 +
 7 files changed, 557 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
new file mode 100644
index 000..9579749c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
@@ -0,0 +1,288 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+
+void __attribute__ ((noinline, noclone))
+f0 (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_init (int8_t *__restrict x, int8_t *__restrict x2, int16_t *__restrict y,
+int16_t *__restrict y2, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] = i % 120;
+  x[i + 1] = i % 78;
+  x[i + 2] = i % 55;
+  x[i + 3] = i % 27;
+  y[j + 0] = j % 33;
+  y[j + 1] = j % 44;
+  y[j + 2] = j % 66;
+  y[j + 3] = j % 88;
+  y[j + 4] = j % 99;
+  y[j + 5] = j % 39;
+  y[j + 6] = j % 49;
+  y[j + 7] = j % 101;
+
+  x2[i + 0] = i % 120;
+  x2[i + 1] = i % 78;
+  x2[i + 2] = i % 55;
+  x2[i + 3] = i % 27;
+  y2[j + 0] = j % 33;
+  y2[j + 1] = j % 44;
+  y2[j + 2] = j % 66;
+  y2[j + 3] = j % 88;
+  y2[j + 4] = j % 99;
+  y2[j + 5] = j % 39;
+  y2[j + 6] = j % 49;
+  y2[j + 7] = j % 101;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_golden (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_check (int8_t *__restrict x, int8_t *__restrict x2, int16_t