Hi Hari,
Thank for the update, it looks good to me now. For the DCT optimize, we may keep your version, if we write hand-write version future, we may try again. Regards, Chen At 2024-08-27 23:10:18, "Hari Limaye" <hari.lim...@arm.com> wrote: >Hi Chen, > >Thank you for reviewing the patches. Please see replies below - we have pushed >this updated series to address your comments. > >>1. These function will share for 8/10/12 bpp, if move into X265_NS, it will >>make duplicated copy > >If I understand correctly, you are referring to the duplication that will >occur when building the "Multi-library interface"? I think this duplication >exists already in the current multi bitdepth build, its just that the symbols >are defined with internal linkage. By moving things into the X265_NS we don't >add any duplicated copies, although we do at least make the functions >reference-able from other code. Also if I understand correctly, this should >not be an issue for the C implementations - as when building the Multi-library >interface you should set EXPORT_C_API=OFF for all but one of the builds. > >>2. add new section "namespace X265_NS" before these functions are better than >>move, it affects code history record. > >Updated patch to do this. > >>partialButterfly8_neon >For size 8, butterfly E & O is necessary, but EE/EO is not a good idea, Odd >spends 8 operators per line, Even spends 4 operators plus 1 temporary store >and 2 prepare operators, totally 7 operators with dependency link, looks no >performance benefits, especally SVE SVDOT may get more performance with Odd >method. > >Regarding EE/EO not being a good idea in 8x8: register pressure is also a >consideration here, as well as how the end of the first pass interacts with >the start of the second (in terms of data positioning once they have been >inlined - since everything stays in registers rather than being stored >to/loaded from memory.) The current implementation performs better than the >suggested implementations as computing EE and EO reduces register pressure (an >issue 16-bit dot product cannot mitigate) and also allows the compiler to >optimize more effectively across both passes after they have been inlined. > >>Code style mismatch in different code section, one line is better. >>+ int32x4_t t01 = vpaddq_s32(vmull_s16(c1, O[j + 0]), >>+ vmull_s16(c1, O[j + 1])); >>*** vs >>+ t01 = vpaddq_s32(vmull_s16(c3, O[j + 0]), vmull_s16(c3, O[j + 1])); > >The reason for the mismatch here is because the second line goes over the >character limit of 80. This formatting is produced automatic formatting >software (uncrustify). > >>dct8_neon >>Better inline two of partialButterfly8_neon, it reduce some operators, such >>as int32x4_t c0 = vld1q_s32(t8_even[0]); >>16x16 and 32x2 are similar > >The compiler currently inlines these - we have updated patches to be more >explicit by using the `inline` keyword for the butterfly helper functions. > >>const table may share in between Neon and Sve code > >Updated with this done. > >Many thanks, >Hari > >Hari Limaye (5): > Move C DCT implementations into X265_NS > AArch64: Move Neon DCT implementations into X265_NS > AArch64: Optimise partialButterfly8_neon > AArch64: Optimise partialButterfly16_neon > AArch64: Optimise partialButterfly32_neon > >Jonathan Wright (4): > AArch64: Move Neon-SVE bridge helpers into dedicated header > AArch64: Add SVE implementation of 8x8 DCT > AArch64: Add SVE implementation of 16x16 DCT > AArch64: Add SVE implementation of 32x32 DCT > > source/common/CMakeLists.txt | 2 +- > source/common/aarch64/asm-primitives.cpp | 1 + > source/common/aarch64/dct-prim-sve.cpp | 491 ++++++++++++++++++++ > source/common/aarch64/dct-prim.cpp | 557 +++++++++++++++-------- > source/common/aarch64/dct-prim.h | 40 ++ > source/common/aarch64/neon-sve-bridge.h | 67 +++ > source/common/aarch64/sao-prim.h | 32 +- > source/common/dct.cpp | 22 +- > 8 files changed, 980 insertions(+), 232 deletions(-) > create mode 100644 source/common/aarch64/dct-prim-sve.cpp > create mode 100644 source/common/aarch64/neon-sve-bridge.h > >-- >2.42.1
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel