Re: [x265] [PATCH 1/9] Move C DCT implementations into X265_NS

chen Thu, 22 Aug 2024 21:42:44 -0700

Hi Hari & Jonathan,




Thank for the patches, I have some comments,




[PATCH 1/9] Move C DCT implementations into X265_NS

1. These function will share for 8/10/12 bpp, if move into X265_NS, it will 
make duplicated copy

2. add new section "namespace X265_NS" before these functions are better than 
move, it affects code history record.




[PATCH 3/9] AArch64: Optimise partialButterfly8_neon

[PATCH 4/9] AArch64: Optimise partialButterfly16_neon

[PATCH 5/9] AArch64: Optimise partialButterfly32_neon

[PATCH 7/9] AArch64: Add SVE implementation of 8x8 DCT

[PATCH 8/9] AArch64: Add SVE implementation of 16x16 DCT

[PATCH 9/9] AArch64: Add SVE implementation of 32x32 DCT

partialButterfly8_neon
For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd 
spends 8 operators per line, Even spends 4 operators plus 1 temporary store and 
2 prepare operators, totally 7 operators with dependency link, looks no 
performance benefits, especally SVE SVDOT may get more performance with Odd 
method.
Code style mismatch in different code section, one line is better.
+        int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]),
+                                   vmull_s16(c1, O[j + 1]));
*** vs
+        t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1]));


dct8_neon
Better inline two of partialButterfly8_neon, it reduce some operators, such as 
int32x4_t c0 = vld1q_s32(t8_even[0]);
16x16 and 32x2 are similar
const table may share in between Neon and Sve code




Regards,
Chen

At 2024-08-22 23:17:50, "Hari Limaye" <hari.lim...@arm.com> wrote:
>Move C implementations of DCT functions into the X265_NS namespace, and
>remove the static modifier from their declarations, so that they can be
>referenced from external code when linking to libx265.
>---
> source/common/dct.cpp | 340 +++++++++++++++++++++---------------------
> 1 file changed, 170 insertions(+), 170 deletions(-)
>
>diff --git a/source/common/dct.cpp b/source/common/dct.cpp
>index b102b6e31..d318b2c64 100644
>--- a/source/common/dct.cpp
>+++ b/source/common/dct.cpp
>@@ -439,176 +439,6 @@ static void partialButterfly4(const int16_t* src, 
>int16_t* dst, int shift, int l
>     }
> }
> 
>-static void dst4_c(const int16_t* src, int16_t* dst, intptr_t srcStride)
>-{
>-    const int shift_1st = 1 + X265_DEPTH - 8;
>-    const int shift_2nd = 8;
>-
>-    ALIGN_VAR_32(int16_t, coef[4 * 4]);
>-    ALIGN_VAR_32(int16_t, block[4 * 4]);
>-
>-    for (int i = 0; i < 4; i++)
>-    {
>-        memcpy(&block[i * 4], &src[i * srcStride], 4 * sizeof(int16_t));
>-    }
>-
>-    fastForwardDst(block, coef, shift_1st);
>-    fastForwardDst(coef, dst, shift_2nd);
>-}

_______________________________________________
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] [PATCH 1/9] Move C DCT implementations into X265_NS

Reply via email to