Hi Gerda,



Thank for optimize.

Other part looks fine, I just some comment on insert_new_s16_elements_x8 (and 
x4)




1) the function name is not clear enough

2) merge_block_tbl is constant table, is it necessary input as parameter? it is 
not template parameters here.

3) does register combine + TBL method faster than EXT directly?




Regards,

Chen








At 2025-04-15 17:36:30, "Gerda Zsejke More" <gerdazsejke.m...@arm.com> wrote:
>Hi,
>
>This patch series adds SVE intrinsic optimisations of interp functions.
>
>Many thanks,
>Gerda
>
>Gerda Zsejke More (4):
>  AArch64: Add SVE implementation of HBD interp_horiz_pp
>  AArch64: Add SVE implementation of HBD interp_horiz_ps
>  AArch64: Add SVE implementation of HBD interp_vert_ss
>  AArch64: Add SVE implementation of HBD interp_vert_pp
>
> source/common/CMakeLists.txt              |    2 +-
> source/common/aarch64/asm-primitives.cpp  |    2 +
> source/common/aarch64/filter-prim-sve.cpp | 1022 +++++++++++++++++++++
> source/common/aarch64/filter-prim-sve.h   |   37 +
> source/common/aarch64/neon-sve-bridge.h   |   12 +
> 5 files changed, 1074 insertions(+), 1 deletion(-)
> create mode 100644 source/common/aarch64/filter-prim-sve.cpp
> create mode 100644 source/common/aarch64/filter-prim-sve.h
>
>-- 
>2.39.5 (Apple Git-154)
>
>_______________________________________________
>x265-devel mailing list
>x265-devel@videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to