> move SUB follow by LD1 will hidden memory operator latency, Thanks, that helped a little bit:
Before: scale2D_64to32 86.83x 158.42 13756.12 After: scale2D_64to32 87.00x 158.20 13764.38 Added to the patch.
0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
Description: 0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel