> +.macro filterPixelToShort_64xN h > +function x265_filterPixelToShort_64x\h\()_neon > + add x3, x3, x3 > + sub x3, x3, #0x40 > + movi v4.8h, #0xe0, lsl #8 > +.rept \h > I guess unroll N is not good idea, because the code section too large, it > most probability to make cache flush and missing.
Please see attached the amended patch to include the loop. Ok to commit? Thanks, Sebastian
0001-arm64-port-x265_filterPixelToShort_-_neon.patch
Description: 0001-arm64-port-x265_filterPixelToShort_-_neon.patch
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel