[x265] [PATCH 0/2] AArch64: Fix SVE DCT implementations

Jonathan Wright Tue, 10 Jun 2025 10:50:07 -0700

Hi,

This patch series fixes bugs in the Arm SVE 16x16 and 32x32 DCT
implementations, and also mitigates a portion of the performance
regression due to the fix. Both SVE DCT implementations are still
sgnificantly faster than the equivalent Neon paths.


Note that the DCT unit tests did not show these bugs. They were found
after differences in encoded output videos were observed on Arm and
x86 for veryslow, slower and slow encoding presets. With these patches
applied encoded output matches for all speed presets.

Thanks,
Jonathan

Jonathan Wright (2):
  AArch64: Fix SVE 16x16 and 32x32 DCT implementations
  AArch64: Specialize passes of 16x16 and 32x32 SVE DCTs

 source/common/aarch64/dct-prim-sve.cpp | 338 ++++++++++++++++++++++---
 1 file changed, 306 insertions(+), 32 deletions(-)

-- 
2.39.5 (Apple Git-154)

_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 0/2] AArch64: Fix SVE DCT implementations

Reply via email to