Hi,

This patch series adds further optimised implementations of the ipfilter 
primitives, using Armv8.4 Neon DotProd and Armv8.6 Neon I8MM instructions.

Relative performance numbers are in the individual commit messages.

The series is based on the x265_git master branch.

Many thanks,
Hari

George Steed (1):
  testbench.cpp: Guard extensions based on architecture

Hari Limaye (13):
  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hpp
  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hps
  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hps
  AArch64: Add Armv8.4 Neon DotProd implementation of interp_hv_pp
  AArch64: Add Armv8.6 Neon I8MM feature detection
  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hpp
  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hps
  AArch64: Add Armv8.6 Neon I8MM implementations of chroma_hpp
  AArch64: Add Armv8.6 Neon I8MM implementation of interp_hv_pp
  AArch64: Add Armv8.4 Neon DotProd implementations of luma_vps
  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vps
  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vpp

 build/README.txt                              |   23 +-
 source/CMakeLists.txt                         |   32 +-
 source/cmake/FindNEON_I8MM.cmake              |   21 +
 source/common/CMakeLists.txt                  |   14 +
 source/common/aarch64/asm-primitives.cpp      |   14 +
 source/common/aarch64/filter-neon-dotprod.cpp | 1131 +++++++++++++
 source/common/aarch64/filter-neon-dotprod.h   |   37 +
 source/common/aarch64/filter-neon-i8mm.cpp    | 1412 +++++++++++++++++
 source/common/aarch64/filter-neon-i8mm.h      |   37 +
 source/common/aarch64/mem-neon.h              |   16 +
 source/common/cpu.cpp                         |   18 +-
 source/test/testbench.cpp                     |    4 +
 source/x265.h                                 |    1 +
 13 files changed, 2742 insertions(+), 18 deletions(-)
 create mode 100644 source/cmake/FindNEON_I8MM.cmake
 create mode 100644 source/common/aarch64/filter-neon-dotprod.cpp
 create mode 100644 source/common/aarch64/filter-neon-dotprod.h
 create mode 100644 source/common/aarch64/filter-neon-i8mm.cpp
 create mode 100644 source/common/aarch64/filter-neon-i8mm.h

-- 
2.42.1

_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to