Hi Hari,
Thank for the details, we may keep your current verion, we may rewrite assembly to improve performance future. Regards, Chen At 2024-09-09 16:27:51, "Hari Limaye" <hari.lim...@arm.com> wrote: >Hi Chen, > >Thank you for reviewing the patches. > >Regarding the patch that you highlighted: > [PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of > filter_hpp > >> performance result looks not good enough, >The key result for this patch is the performance uplift for Neoverse N1 >(1.123x), as this machine does not support Neon I8MM instructions. >The results for the other machines are stated for completeness - however these >machines will instead run the Neon I8MM implementation: > > https://mailman.videolan.org/pipermail/x265-devel/2024-September/013907.html > >the uplift from which is copied here: > > Geomean uplift across all block sizes for chroma filters, relative to > Armv8.4 Neon DotProd implementations: > > Neoverse N2: 1.402x > Neoverse V1: 1.214x > Neoverse V2: 1.289x > >>and why shortcut branch in case (coeffIdx == 4)? >As the Armv8.0 Neon implementation can be highly specialized for coeffIdx of >4, the Armv8.4 Neon DotProd implementation is not faster for this filter - so >we dispatch to the Armv8.0 Neon implementation in this case. >The uplift for the other values of coeffIdx from the Armv8.4 Neon DotProd >implementation (on Neoverse N1) is significant. > >Many thanks, >Hari >
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel