Hi Gerda,
Thank for optimize. Other part looks fine, I just some comment on insert_new_s16_elements_x8 (and x4) 1) the function name is not clear enough 2) merge_block_tbl is constant table, is it necessary input as parameter? it is not template parameters here. 3) does register combine + TBL method faster than EXT directly? Regards, Chen At 2025-04-15 17:36:30, "Gerda Zsejke More" <gerdazsejke.m...@arm.com> wrote: >Hi, > >This patch series adds SVE intrinsic optimisations of interp functions. > >Many thanks, >Gerda > >Gerda Zsejke More (4): > AArch64: Add SVE implementation of HBD interp_horiz_pp > AArch64: Add SVE implementation of HBD interp_horiz_ps > AArch64: Add SVE implementation of HBD interp_vert_ss > AArch64: Add SVE implementation of HBD interp_vert_pp > > source/common/CMakeLists.txt | 2 +- > source/common/aarch64/asm-primitives.cpp | 2 + > source/common/aarch64/filter-prim-sve.cpp | 1022 +++++++++++++++++++++ > source/common/aarch64/filter-prim-sve.h | 37 + > source/common/aarch64/neon-sve-bridge.h | 12 + > 5 files changed, 1074 insertions(+), 1 deletion(-) > create mode 100644 source/common/aarch64/filter-prim-sve.cpp > create mode 100644 source/common/aarch64/filter-prim-sve.h > >-- >2.39.5 (Apple Git-154) > >_______________________________________________ >x265-devel mailing list >x265-devel@videolan.org >https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel