回复: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN

2023-11-20 Thread juzhe.zh...@rivai.ai
Hi, Richi.

strided load/store has been posted for a while.
Can this feature be available on GCC-14 ?
Or postpone it to GCC-15 ? 

Thanks.



juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-11-16 15:21
收件人: 钟居哲; gcc-patches
抄送: richard.sandiford; rguenther
主题: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store 
DOC/OPTAB/IFN
Update just finished test CI.

Tested on aarch64 QEMU no regression.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:39
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] DOC/IFN/OPTAB: Add 
mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store.
 
Document already has been reviewed.
 
This patch adds OPTAB/IFN support as follows:
 
1. strided load
GIMPLE level:
 
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
v = mask_len_strided_load (ptr, stried, mask, len, bias)
 
2. strided store
 
GIMPLE leve:
 
MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
mask_len_stried_store (ptr, stride, v, mask, len, bias)
 
Bootstrap and regression on X86 no regression.
 
Ok for trunk ?
gcc/ChangeLog:
 
* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.
 
---
gcc/doc/md.texi | 27 +++
gcc/internal-fn.cc  | 63 +
gcc/internal-fn.def |  6 +
gcc/optabs.def  |  2 ++
4 files changed, 98 insertions(+)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
be loaded from memory and clear if element @var{i} of the result should be 
undefined.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
@item @samp{scatter_store@var{m}@var{n}}
Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{vec_set@var{m}} instruction pattern
@item @samp{vec_set@var{m}}
Set given field in the vector value.  Operand 0 is the vect

Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN

2023-11-15 Thread juzhe.zh...@rivai.ai
Update just finished test CI.

Tested on aarch64 QEMU no regression.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-14 11:39
To: gcc-patches
CC: richard.sandiford; rguenther; Juzhe-Zhong
Subject: [PATCH] DOC/IFN/OPTAB: Add 
mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store.
 
Document already has been reviewed.
 
This patch adds OPTAB/IFN support as follows:
 
1. strided load
GIMPLE level:
 
v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
v = mask_len_strided_load (ptr, stried, mask, len, bias)
 
2. strided store
 
GIMPLE leve:
 
MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)
 
be expand (by internal-fn.cc) into:
 
mask_len_stried_store (ptr, stride, v, mask, len, bias)
 
Bootstrap and regression on X86 no regression.
 
Ok for trunk ?
gcc/ChangeLog:
 
* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.
 
---
gcc/doc/md.texi | 27 +++
gcc/internal-fn.cc  | 63 +
gcc/internal-fn.def |  6 +
gcc/optabs.def  |  2 ++
4 files changed, 98 insertions(+)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
be loaded from memory and clear if element @var{i} of the result should be 
undefined.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{scatter_store@var{m}@var{n}} instruction pattern
@item @samp{scatter_store@var{m}@var{n}}
Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
@cindex @code{vec_set@var{m}} instruction pattern
@item @samp{vec_set@var{m}}
Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
#define load_lanes_direct { -1, -1, false }
#define mask_load_lanes_direct { -1, -1, false }
#define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -

[PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN

2023-11-13 Thread Juzhe-Zhong
This patch adds mask_len_strided_load/mask_len_strided_store.

Document already has been reviewed.

This patch adds OPTAB/IFN support as follows:

1. strided load
GIMPLE level:

v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)

be expand (by internal-fn.cc) into:

v = mask_len_strided_load (ptr, stried, mask, len, bias)

2. strided store

GIMPLE leve:

MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)

be expand (by internal-fn.cc) into:

mask_len_stried_store (ptr, stride, v, mask, len, bias)

Bootstrap and regression on X86 no regression.

Ok for trunk ?
 
gcc/ChangeLog:

* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
* internal-fn.cc (strided_load_direct): Ditto.
(strided_store_direct): Ditto.
(expand_strided_store_optab_fn): Ditto.
(expand_strided_load_optab_fn): Ditto.
(direct_strided_load_optab_supported_p): Ditto.
(direct_strided_store_optab_supported_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
(MASK_LEN_STRIDED_STORE): Ditto.
* optabs.def (OPTAB_D): Ditto.

---
 gcc/doc/md.texi | 27 +++
 gcc/internal-fn.cc  | 63 +
 gcc/internal-fn.def |  6 +
 gcc/optabs.def  |  2 ++
 4 files changed, 98 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5d86152e5dd..5dc76a1183c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5a998e794ad..bfb307684a9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -164,6 +164,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -173,6 +174,7 @@ init_internal_fns ()
 #define