回复: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
Hi, Richi. strided load/store has been posted for a while. Can this feature be available on GCC-14 ? Or postpone it to GCC-15 ? Thanks. juzhe.zh...@rivai.ai 发件人: juzhe.zh...@rivai.ai 发送时间: 2023-11-16 15:21 收件人: 钟居哲; gcc-patches 抄送: richard.sandiford; rguenther 主题: Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN Update just finished test CI. Tested on aarch64 QEMU no regression. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-11-14 11:39 To: gcc-patches CC: richard.sandiford; rguenther; Juzhe-Zhong Subject: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN This patch adds mask_len_strided_load/mask_len_strided_store. Document already has been reviewed. This patch adds OPTAB/IFN support as follows: 1. strided load GIMPLE level: v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) be expand (by internal-fn.cc) into: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store GIMPLE leve: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias) be expand (by internal-fn.cc) into: mask_len_stried_store (ptr, stride, v, mask, len, bias) Bootstrap and regression on X86 no regression. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store. * internal-fn.cc (strided_load_direct): Ditto. (strided_store_direct): Ditto. (expand_strided_store_optab_fn): Ditto. (expand_strided_load_optab_fn): Ditto. (direct_strided_load_optab_supported_p): Ditto. (direct_strided_store_optab_supported_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto. (MASK_LEN_STRIDED_STORE): Ditto. * optabs.def (OPTAB_D): Ditto. --- gcc/doc/md.texi | 27 +++ gcc/internal-fn.cc | 63 + gcc/internal-fn.def | 6 + gcc/optabs.def | 2 ++ 4 files changed, 98 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5d86152e5dd..5dc76a1183c 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be undefined. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_load@var{m}} instruction pattern +@item @samp{mask_len_strided_load@var{m}} +Load several separate memory locations into a destination vector of mode @var{m}. +Operand 0 is a destination vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i load address is operand 1 + @var{i} * operand 2. +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be zero. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_store@var{m}} instruction pattern +@item @samp{mask_len_strided_store@var{m}} +Store a vector of mode m into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. +Operand 2 is the vector of values that should be stored, which is of mode @var{m}. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step. +For each element index i store address is operand 0 + @var{i} * operand 1. +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vect
Re: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
Update just finished test CI. Tested on aarch64 QEMU no regression. juzhe.zh...@rivai.ai From: Juzhe-Zhong Date: 2023-11-14 11:39 To: gcc-patches CC: richard.sandiford; rguenther; Juzhe-Zhong Subject: [PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN This patch adds mask_len_strided_load/mask_len_strided_store. Document already has been reviewed. This patch adds OPTAB/IFN support as follows: 1. strided load GIMPLE level: v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) be expand (by internal-fn.cc) into: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store GIMPLE leve: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias) be expand (by internal-fn.cc) into: mask_len_stried_store (ptr, stride, v, mask, len, bias) Bootstrap and regression on X86 no regression. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store. * internal-fn.cc (strided_load_direct): Ditto. (strided_store_direct): Ditto. (expand_strided_store_optab_fn): Ditto. (expand_strided_load_optab_fn): Ditto. (direct_strided_load_optab_supported_p): Ditto. (direct_strided_store_optab_supported_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto. (MASK_LEN_STRIDED_STORE): Ditto. * optabs.def (OPTAB_D): Ditto. --- gcc/doc/md.texi | 27 +++ gcc/internal-fn.cc | 63 + gcc/internal-fn.def | 6 + gcc/optabs.def | 2 ++ 4 files changed, 98 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5d86152e5dd..5dc76a1183c 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be undefined. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_load@var{m}} instruction pattern +@item @samp{mask_len_strided_load@var{m}} +Load several separate memory locations into a destination vector of mode @var{m}. +Operand 0 is a destination vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i load address is operand 1 + @var{i} * operand 2. +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be zero. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_store@var{m}} instruction pattern +@item @samp{mask_len_strided_store@var{m}} +Store a vector of mode m into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. +Operand 2 is the vector of values that should be stored, which is of mode @var{m}. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step. +For each element index i store address is operand 0 + @var{i} * operand 1. +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5a998e794ad..bfb307684a9 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -164,6 +164,7 @@ init_internal_fns () #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } +#define strided_load_direct { -
[PATCH] DOC/IFN/OPTAB: Add mask_len_strided_load/mask_len_strided_store DOC/OPTAB/IFN
This patch adds mask_len_strided_load/mask_len_strided_store. Document already has been reviewed. This patch adds OPTAB/IFN support as follows: 1. strided load GIMPLE level: v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias) be expand (by internal-fn.cc) into: v = mask_len_strided_load (ptr, stried, mask, len, bias) 2. strided store GIMPLE leve: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias) be expand (by internal-fn.cc) into: mask_len_stried_store (ptr, stride, v, mask, len, bias) Bootstrap and regression on X86 no regression. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store. * internal-fn.cc (strided_load_direct): Ditto. (strided_store_direct): Ditto. (expand_strided_store_optab_fn): Ditto. (expand_strided_load_optab_fn): Ditto. (direct_strided_load_optab_supported_p): Ditto. (direct_strided_store_optab_supported_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto. (MASK_LEN_STRIDED_STORE): Ditto. * optabs.def (OPTAB_D): Ditto. --- gcc/doc/md.texi | 27 +++ gcc/internal-fn.cc | 63 + gcc/internal-fn.def | 6 + gcc/optabs.def | 2 ++ 4 files changed, 98 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5d86152e5dd..5dc76a1183c 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be undefined. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_load@var{m}} instruction pattern +@item @samp{mask_len_strided_load@var{m}} +Load several separate memory locations into a destination vector of mode @var{m}. +Operand 0 is a destination vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i load address is operand 1 + @var{i} * operand 2. +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 5) elements from memory. +Element @var{i} of the mask (operand 3) is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be zero. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_store@var{m}} instruction pattern +@item @samp{mask_len_strided_store@var{m}} +Store a vector of mode m into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode. +Operand 2 is the vector of values that should be stored, which is of mode @var{m}. +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias operand. +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 1 as step. +For each element index i store address is operand 0 + @var{i} * operand 1. +Similar to mask_len_store, the instruction stores at most (operand 4 + operand 5) elements of mask (operand 3) to memory. +Element @var{i} of the mask is set if element @var{i} of (operand 3) should be stored. +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5a998e794ad..bfb307684a9 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -164,6 +164,7 @@ init_internal_fns () #define load_lanes_direct { -1, -1, false } #define mask_load_lanes_direct { -1, -1, false } #define gather_load_direct { 3, 1, false } +#define strided_load_direct { -1, -1, false } #define len_load_direct { -1, -1, false } #define mask_len_load_direct { -1, 4, false } #define mask_store_direct { 3, 2, false } @@ -173,6 +174,7 @@ init_internal_fns () #define