Hi George,
Thank for the improve patch. I just a little comments below, At 2025-03-08 00:41:05, "George Steed" <george.st...@arm.com> wrote: > source/common/aarch64/pixel-util.S | 94 +++++++++++++----------------- > 1 file changed, 42 insertions(+), 52 deletions(-) > >diff --git a/source/common/aarch64/pixel-util.S >b/source/common/aarch64/pixel-util.S >index d8b3f4365..6635e52b1 100644 >--- a/source/common/aarch64/pixel-util.S >+++ b/source/common/aarch64/pixel-util.S >@@ -2213,27 +2213,25 @@ endfunc > // const uint16_t* scanCG4x4, // x6 > // const int trSize) // x7 > function PFX(scanPosLast_neon) >-.Loop_spl: >- // position of current CG >+ ldr q28, [x10] // v28 = mask for pmovmskb >+ add x10, x7, x7 // 2*x7 >+ add x11, x7, x7, lsl #1 // 3*x7 >+ add x9, x4, #1 // CG count >+ >+1: This is GCC style label, please keep generic style of local label > // coeffFlag = reverse_bit(w15) in 16-bit >- rbit w12, w15 >- lsr w12, w12, #16 >- fmov s30, w12 >+ rbit w12, w13 >+ and w12, w12, #0xffff Is this necessary? > strh w12, [x3], #2 > >- // compute coeffNum = popcount(coeffFlag) >- cnt v30.8b, v30.8b >- addp v30.8b, v30.8b, v30.8b >- fmov w6, s30 >- sub x5, x5, x6 We are not need 64bits x5 >- strb w6, [x4], #1 >- >- cbnz x5, .Loop_spl >+ cbnz x5, 1b Same x5 here
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel