Hi Li,
Thank for the patches, it looks good to me, the only question is how much improve on the performance after change asm to intrinsic. Regards, Chen At 2025-06-18 02:22:25, "Li Zhang" <li.zha...@arm.com> wrote: >Hi, > >This patch series optimizes the exisiting standard bit-depth pixel_var >Neon intrinsics implementation, deletes the slower assembly >implementation. It also adds Neon DotProd intrinsics implementation for >the standard bit-depth and Neon, SVE intrinsics implementations for the >high bit-depth of pixel_var function. > >Many thanks, >Li > >Li Zhang (4): > AArch64: Optimize and clean up SBD pixel_var functions > AArch64: Add HBD pixel_var Neon intrinscis implementations > AArch64: Add SBD pixel_var Neon DotProd intrinsics implementations > AArch64: Add HBD pixel_var SVE intrinsics implementations > > source/common/CMakeLists.txt | 4 +- > source/common/aarch64/asm-primitives.cpp | 14 +- > source/common/aarch64/fun-decls.h | 10 - > source/common/aarch64/neon-sve-bridge.h | 7 + > .../aarch64/pixel-prim-neon-dotprod.cpp | 111 ++++++++++ > source/common/aarch64/pixel-prim-sve.cpp | 137 ++++++++++++ > source/common/aarch64/pixel-prim.cpp | 197 +++++++++++++++--- > source/common/aarch64/pixel-prim.h | 6 + > source/common/aarch64/pixel-util-common.S | 27 --- > source/common/aarch64/pixel-util-sve2.S | 195 ----------------- > source/common/aarch64/pixel-util.S | 61 ------ > 11 files changed, 434 insertions(+), 335 deletions(-) > create mode 100644 source/common/aarch64/pixel-prim-neon-dotprod.cpp > create mode 100644 source/common/aarch64/pixel-prim-sve.cpp > >-- >2.39.5 (Apple Git-154) > >_______________________________________________ >x265-devel mailing list >x265-devel@videolan.org >https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel