RE: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Wednesday, August 16, 2023 9:23 AM
To: jeffreyalaw ; gcc-patches 
Cc: kito.cheng ; Kito.cheng ; 
Robin Dapp 
Subject: Re: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

Thanks Jeff.
I realize the quad_trunc/oct_trunc change is not necessary. I will remove that.

The middle-end support is approved, and testing on both X86 and ARM, soon will 
be committed.

Will commit this patch after middle-end patch is committed.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-15 22:18
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}
 
 
On 8/14/23 06:15, Juzhe-Zhong wrote:
> This patch is depending on middle-end support:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html
> 
> This patch allow us auto-vectorize this following case:
> 
> #define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE)
>  \
>void __attribute__ ((noinline, noclone))   
>   \
>NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,
>   \
> MASKTYPE *__restrict cond, intptr_t n) \
>{  
>   \
>  for (intptr_t i = 0; i < n; ++i) 
>   \
>if (cond[i])   
>   \
> dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
>+ src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
>+ src[i * 8 + 6] + src[i * 8 + 7]); \
>}
> 
> #define TEST2(NAME, OUTTYPE, INTYPE)  
>  \
>TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)   
> \
> 
> #define TEST1(NAME, OUTTYPE)  
>  \
>TEST2 (NAME##_i32, OUTTYPE, int32_t)   
>   \
> 
> #define TEST(NAME)
>  \
>TEST1 (NAME##_i32, int32_t)
>   \
> 
> TEST (test)
> 
> ASM:
> 
> test_i32_i32_f32_8:
> ble a3,zero,.L5
> .L3:
> vsetvli a4,a3,e8,mf4,ta,ma
> vle32.v v0,0(a2)
> vsetvli a5,zero,e32,m1,ta,ma
> vmsne.vi v0,v0,0
> vsetvli zero,a4,e32,m1,ta,ma
> vlseg8e32.v v8,(a1),v0.t
> vsetvli a5,zero,e32,m1,ta,ma
> slli a6,a4,2
> vadd.vv v1,v9,v8
> slli a7,a4,5
> vadd.vv v1,v1,v10
> sub a3,a3,a4
> vadd.vv v1,v1,v11
> vadd.vv v1,v1,v12
> vadd.vv v1,v1,v13
> vadd.vv v1,v1,v14
> vadd.vv v1,v1,v15
> vsetvli zero,a4,e32,m1,ta,ma
> vse32.v v1,0(a0),v0.t
> add a2,a2,a6
> add a1,a1,a7
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (vec_mask_len_load_lanes): 
> New pattern.
>  (vec_mask_len_store_lanes): Ditto.
>  (2): Fix pattern for ICE.
>  (2): Ditto.
>  * config/riscv/riscv-protos.h (expand_lanes_load_store): New 
> function.
>  * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
>  (expand_lanes_load_store): New function.
>  * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.
 
OK, but obviously hold off committing until the generic support is 
approved and committed.
 
Thanks,
jeff
 
 


Re: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread juzhe.zh...@rivai.ai
Thanks Jeff.
I realize the quad_trunc/oct_trunc change is not necessary. I will remove that.

The middle-end support is approved, and testing on both X86 and ARM, soon will 
be committed.

Will commit this patch after middle-end patch is committed.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-15 22:18
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}
 
 
On 8/14/23 06:15, Juzhe-Zhong wrote:
> This patch is depending on middle-end support:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html
> 
> This patch allow us auto-vectorize this following case:
> 
> #define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE)
>  \
>void __attribute__ ((noinline, noclone))   
>   \
>NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,
>   \
> MASKTYPE *__restrict cond, intptr_t n) \
>{  
>   \
>  for (intptr_t i = 0; i < n; ++i) 
>   \
>if (cond[i])   
>   \
> dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
>+ src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
>+ src[i * 8 + 6] + src[i * 8 + 7]); \
>}
> 
> #define TEST2(NAME, OUTTYPE, INTYPE)  
>  \
>TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)   
> \
> 
> #define TEST1(NAME, OUTTYPE)  
>  \
>TEST2 (NAME##_i32, OUTTYPE, int32_t)   
>   \
> 
> #define TEST(NAME)
>  \
>TEST1 (NAME##_i32, int32_t)
>   \
> 
> TEST (test)
> 
> ASM:
> 
> test_i32_i32_f32_8:
> ble a3,zero,.L5
> .L3:
> vsetvli a4,a3,e8,mf4,ta,ma
> vle32.v v0,0(a2)
> vsetvli a5,zero,e32,m1,ta,ma
> vmsne.vi v0,v0,0
> vsetvli zero,a4,e32,m1,ta,ma
> vlseg8e32.v v8,(a1),v0.t
> vsetvli a5,zero,e32,m1,ta,ma
> slli a6,a4,2
> vadd.vv v1,v9,v8
> slli a7,a4,5
> vadd.vv v1,v1,v10
> sub a3,a3,a4
> vadd.vv v1,v1,v11
> vadd.vv v1,v1,v12
> vadd.vv v1,v1,v13
> vadd.vv v1,v1,v14
> vadd.vv v1,v1,v15
> vsetvli zero,a4,e32,m1,ta,ma
> vse32.v v1,0(a0),v0.t
> add a2,a2,a6
> add a1,a1,a7
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (vec_mask_len_load_lanes): 
> New pattern.
>  (vec_mask_len_store_lanes): Ditto.
>  (2): Fix pattern for ICE.
>  (2): Ditto.
>  * config/riscv/riscv-protos.h (expand_lanes_load_store): New 
> function.
>  * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
>  (expand_lanes_load_store): New function.
>  * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.
 
OK, but obviously hold off committing until the generic support is 
approved and committed.
 
Thanks,
jeff
 
 


Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/14/23 06:15, Juzhe-Zhong wrote:

This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html

This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
   void __attribute__ ((noinline, noclone)) 
\
   NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  
\
MASKTYPE *__restrict cond, intptr_t n) \
   {
\
 for (intptr_t i = 0; i < n; ++i)   
\
   if (cond[i]) 
\
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
   }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
   TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) 
  \

#define TEST1(NAME, OUTTYPE)   \
   TEST2 (NAME##_i32, OUTTYPE, int32_t) 
\

#define TEST(NAME) \
   TEST1 (NAME##_i32, int32_t)  
\

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

 * config/riscv/autovec.md (vec_mask_len_load_lanes): 
New pattern.
 (vec_mask_len_store_lanes): Ditto.
 (2): Fix pattern for ICE.
 (2): Ditto.
 * config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
 * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
 (expand_lanes_load_store): New function.
 * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.


OK, but obviously hold off committing until the generic support is 
approved and committed.


Thanks,
jeff



Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/14/23 06:15, Juzhe-Zhong wrote:

This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html

This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
   void __attribute__ ((noinline, noclone)) 
\
   NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  
\
MASKTYPE *__restrict cond, intptr_t n) \
   {
\
 for (intptr_t i = 0; i < n; ++i)   
\
   if (cond[i]) 
\
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
   }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
   TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) 
  \

#define TEST1(NAME, OUTTYPE)   \
   TEST2 (NAME##_i32, OUTTYPE, int32_t) 
\

#define TEST(NAME) \
   TEST1 (NAME##_i32, int32_t)  
\

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

 * config/riscv/autovec.md (vec_mask_len_load_lanes): 
New pattern.
 (vec_mask_len_store_lanes): Ditto.
 (2): Fix pattern for ICE.
 (2): Ditto.
 * config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
 * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
 (expand_lanes_load_store): New function.
 * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.


OK, but obviously hold off committing until the generic support is 
approved and committed.


Thanks,
jeff



[PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-14 Thread Juzhe-Zhong
This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html

This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
  void __attribute__ ((noinline, noclone)) \
  NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  \
MASKTYPE *__restrict cond, intptr_t n) \
  {\
for (intptr_t i = 0; i < n; ++i)   \
  if (cond[i]) \
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
  }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
  TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)  
 \

#define TEST1(NAME, OUTTYPE)   \
  TEST2 (NAME##_i32, OUTTYPE, int32_t) \

#define TEST(NAME) \
  TEST1 (NAME##_i32, int32_t)  \

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (vec_mask_len_load_lanes): New 
pattern.
(vec_mask_len_store_lanes): Ditto.
(2): Fix pattern for ICE.
(2): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Adapt 
tests.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c: New 
test.