Hi Hari,
Thank for the new patches, most looks good to me, just one comment. [PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp performance result looks not good enough, and why shortcut branch in case (coeffIdx == 4)? Regards, Chen At 2024-09-06 21:32:25, "Hari Limaye" <hari.lim...@arm.com> wrote: >Hi, > >This patch series adds further optimised implementations of the ipfilter >primitives, using Armv8.4 Neon DotProd and Armv8.6 Neon I8MM instructions. > >Relative performance numbers are in the individual commit messages. > >The series is based on the x265_git master branch. > >Many thanks, >Hari > >George Steed (1): > testbench.cpp: Guard extensions based on architecture > >Hari Limaye (13): > AArch64: Add Armv8.4 Neon DotProd implementations of luma_hpp > AArch64: Add Armv8.4 Neon DotProd implementations of luma_hps > AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp > AArch64: Add Armv8.4 Neon DotProd implementations of filter_hps > AArch64: Add Armv8.4 Neon DotProd implementation of interp_hv_pp > AArch64: Add Armv8.6 Neon I8MM feature detection > AArch64: Add Armv8.6 Neon I8MM implementations of luma_hpp > AArch64: Add Armv8.6 Neon I8MM implementations of luma_hps > AArch64: Add Armv8.6 Neon I8MM implementations of chroma_hpp > AArch64: Add Armv8.6 Neon I8MM implementation of interp_hv_pp > AArch64: Add Armv8.4 Neon DotProd implementations of luma_vps > AArch64: Add Armv8.6 Neon I8MM implementations of luma_vps > AArch64: Add Armv8.6 Neon I8MM implementations of luma_vpp > > build/README.txt | 23 +- > source/CMakeLists.txt | 32 +- > source/cmake/FindNEON_I8MM.cmake | 21 + > source/common/CMakeLists.txt | 14 + > source/common/aarch64/asm-primitives.cpp | 14 + > source/common/aarch64/filter-neon-dotprod.cpp | 1131 +++++++++++++ > source/common/aarch64/filter-neon-dotprod.h | 37 + > source/common/aarch64/filter-neon-i8mm.cpp | 1412 +++++++++++++++++ > source/common/aarch64/filter-neon-i8mm.h | 37 + > source/common/aarch64/mem-neon.h | 16 + > source/common/cpu.cpp | 18 +- > source/test/testbench.cpp | 4 + > source/x265.h | 1 + > 13 files changed, 2742 insertions(+), 18 deletions(-) > create mode 100644 source/cmake/FindNEON_I8MM.cmake > create mode 100644 source/common/aarch64/filter-neon-dotprod.cpp > create mode 100644 source/common/aarch64/filter-neon-dotprod.h > create mode 100644 source/common/aarch64/filter-neon-i8mm.cpp > create mode 100644 source/common/aarch64/filter-neon-i8mm.h > >-- >2.42.1 > >_______________________________________________ >x265-devel mailing list >x265-devel@videolan.org >https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel