Hi, This patch series optimizes the exisiting standard bit-depth pixel_var Neon intrinsics implementation, deletes the slower assembly implementation. It also adds Neon DotProd intrinsics implementation for the standard bit-depth and Neon, SVE intrinsics implementations for the high bit-depth of pixel_var function.
Many thanks, Li Li Zhang (4): AArch64: Optimize and clean up SBD pixel_var functions AArch64: Add HBD pixel_var Neon intrinscis implementations AArch64: Add SBD pixel_var Neon DotProd intrinsics implementations AArch64: Add HBD pixel_var SVE intrinsics implementations source/common/CMakeLists.txt | 4 +- source/common/aarch64/asm-primitives.cpp | 14 +- source/common/aarch64/fun-decls.h | 10 - source/common/aarch64/neon-sve-bridge.h | 7 + .../aarch64/pixel-prim-neon-dotprod.cpp | 111 ++++++++++ source/common/aarch64/pixel-prim-sve.cpp | 137 ++++++++++++ source/common/aarch64/pixel-prim.cpp | 197 +++++++++++++++--- source/common/aarch64/pixel-prim.h | 6 + source/common/aarch64/pixel-util-common.S | 27 --- source/common/aarch64/pixel-util-sve2.S | 195 ----------------- source/common/aarch64/pixel-util.S | 61 ------ 11 files changed, 434 insertions(+), 335 deletions(-) create mode 100644 source/common/aarch64/pixel-prim-neon-dotprod.cpp create mode 100644 source/common/aarch64/pixel-prim-sve.cpp -- 2.39.5 (Apple Git-154) _______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel