[x265] [PATCH v2 0/4] AArch64: Add Neon optimisations of interp functions

Gerda Zsejke More Thu, 24 Apr 2025 03:01:20 -0700

Hello,

This is the v2 patch series to solve Chen's review comments. 
Some comments on your suggestions:
1.I’ve renamed the functions to insert_row_into_window_s16x4 and 
insert_row_into_window_s16x8 for clarity.
2.Passing merge_block_tbl as a parameter enhances readability and maintains 
flexibility. 
Since the functions are inlined, this approach doesn’t introduce performance 
overhead.
3.Utilizing TBL instructions is necessary to construct the required vectors.
Simple extraction isn't sufficient in this context.


Hopefully, everything is good now.

Best regards,
Gerda

Gerda Zsejke More (4):
  AArch64: Add SVE implementation of HBD interp_horiz_pp
  AArch64: Add SVE implementation of HBD interp_horiz_ps
  AArch64: Add SVE implementation of HBD interp_vert_ss
  AArch64: Add SVE implementation of HBD interp_vert_pp

 source/common/CMakeLists.txt              |    2 +-
 source/common/aarch64/asm-primitives.cpp  |    2 +
 source/common/aarch64/filter-prim-sve.cpp | 1054 +++++++++++++++++++++
 source/common/aarch64/filter-prim-sve.h   |   37 +
 source/common/aarch64/neon-sve-bridge.h   |   12 +
 5 files changed, 1106 insertions(+), 1 deletion(-)
 create mode 100644 source/common/aarch64/filter-prim-sve.cpp
 create mode 100644 source/common/aarch64/filter-prim-sve.h

-- 
2.39.5 (Apple Git-154)

_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH v2 0/4] AArch64: Add Neon optimisations of interp functions

Reply via email to