Hi Hari & Jonathan,
Thank for the patches, I have some comments, [PATCH 1/9] Move C DCT implementations into X265_NS 1. These function will share for 8/10/12 bpp, if move into X265_NS, it will make duplicated copy 2. add new section "namespace X265_NS" before these functions are better than move, it affects code history record. [PATCH 3/9] AArch64: Optimise partialButterfly8_neon [PATCH 4/9] AArch64: Optimise partialButterfly16_neon [PATCH 5/9] AArch64: Optimise partialButterfly32_neon [PATCH 7/9] AArch64: Add SVE implementation of 8x8 DCT [PATCH 8/9] AArch64: Add SVE implementation of 16x16 DCT [PATCH 9/9] AArch64: Add SVE implementation of 32x32 DCT partialButterfly8_neon For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd spends 8 operators per line, Even spends 4 operators plus 1 temporary store and 2 prepare operators, totally 7 operators with dependency link, looks no performance benefits, especally SVE SVDOT may get more performance with Odd method. Code style mismatch in different code section, one line is better. + int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]), + vmull_s16(c1, O[j + 1])); *** vs + t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1])); dct8_neon Better inline two of partialButterfly8_neon, it reduce some operators, such as int32x4_t c0 = vld1q_s32(t8_even[0]); 16x16 and 32x2 are similar const table may share in between Neon and Sve code Regards, Chen At 2024-08-22 23:17:50, "Hari Limaye" <hari.lim...@arm.com> wrote: >Move C implementations of DCT functions into the X265_NS namespace, and >remove the static modifier from their declarations, so that they can be >referenced from external code when linking to libx265. >--- > source/common/dct.cpp | 340 +++++++++++++++++++++--------------------- > 1 file changed, 170 insertions(+), 170 deletions(-) > >diff --git a/source/common/dct.cpp b/source/common/dct.cpp >index b102b6e31..d318b2c64 100644 >--- a/source/common/dct.cpp >+++ b/source/common/dct.cpp >@@ -439,176 +439,6 @@ static void partialButterfly4(const int16_t* src, >int16_t* dst, int shift, int l > } > } > >-static void dst4_c(const int16_t* src, int16_t* dst, intptr_t srcStride) >-{ >- const int shift_1st = 1 + X265_DEPTH - 8; >- const int shift_2nd = 8; >- >- ALIGN_VAR_32(int16_t, coef[4 * 4]); >- ALIGN_VAR_32(int16_t, block[4 * 4]); >- >- for (int i = 0; i < 4; i++) >- { >- memcpy(&block[i * 4], &src[i * srcStride], 4 * sizeof(int16_t)); >- } >- >- fastForwardDst(block, coef, shift_1st); >- fastForwardDst(coef, dst, shift_2nd); >-}
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel