Hello, This patch series optimizes and implements several AArch64 block copy primitives using Neon intrinsics. It also cleans up and removes the Neon and SVE assembly implementations that are either slower or offer no performance benefit.
Many thanks, Li Li Zhang (8): AArch64: Optimize blockcopy_pp_neon intrinsics implementation AArch64: Optimize blockcopy_ps Neon intrinsics implementation AArch64: Implement blockcopy_ss primitives using Neon intrinsics AArch64: Implement blockcopy_sp primitives using Neon intrinsics AArch64: Optimize cpy1Dto2D_shl Neon intrinsics implementation AArch64: Optimize cpy2Dto1D_shl Neon intrinsics implementation AArch64: Implement cpy2Dto1D_shr using Neon intrinsics AArch64: Implement cpy1Dto2D_shr using Neon intrinsics source/common/CMakeLists.txt | 2 +- source/common/aarch64/asm-primitives.cpp | 180 --- source/common/aarch64/blockcopy8-common.S | 54 - source/common/aarch64/blockcopy8-sve.S | 1346 --------------------- source/common/aarch64/blockcopy8.S | 1049 ---------------- source/common/aarch64/pixel-prim.cpp | 358 +++++- 6 files changed, 305 insertions(+), 2684 deletions(-) delete mode 100644 source/common/aarch64/blockcopy8-common.S -- 2.39.5 (Apple Git-154) _______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel