# HG changeset patch # User Praveen Tiwari # Date 1383912504 -19800 # Node ID 227a5666e08869d36e07a75f3db95dd94c774715 # Parent d9fd009d7118c9cb76b84e94e01732b10d7bf313 blockcopy_sp_8x16, optimized asm code
diff -r d9fd009d7118 -r 227a5666e088 source/common/x86/blockcopy8.asm --- a/source/common/x86/blockcopy8.asm Fri Nov 08 17:28:57 2013 +0530 +++ b/source/common/x86/blockcopy8.asm Fri Nov 08 17:38:24 2013 +0530 @@ -1215,54 +1215,47 @@ ;----------------------------------------------------------------------------- %macro BLOCKCOPY_SP_W8_H8 2 INIT_XMM sse2 -cglobal blockcopy_sp_%1x%2, 4, 7, 8, dest, destStride, src, srcStride +cglobal blockcopy_sp_%1x%2, 4, 5, 8, dest, destStride, src, srcStride -mov r6d, %2 +mov r4d, %2/8 add r3, r3 -mova m0, [tab_Vm] +.loop + movu m0, [r2] + movu m1, [r2 + r3] + movu m2, [r2 + 2 * r3] + lea r2, [r2 + 2 * r3] + movu m3, [r2 + r3] + movu m4, [r2 + 2 * r3] + lea r2, [r2 + 2 * r3] + movu m5, [r2 + r3] + movu m6, [r2 + 2 * r3] + lea r2, [r2 + 2 * r3] + movu m7, [r2 + r3] -.loop - movu m1, [r2] - movu m2, [r2 + r3] - movu m3, [r2 + 2 * r3] - lea r4, [r2 + 2 * r3] - movu m4, [r4 + r3] - movu m5, [r4 + 2 * r3] - lea r4, [r4 + 2 * r3] - movu m6, [r4 + r3] - movu m7, [r4 + 2 * r3] - lea r5, [r4 + 2 * r3] + packuswb m0, m1 + packuswb m2, m3 + packuswb m4, m5 + packuswb m6, m7 - pshufb m1, m0 - pshufb m2, m0 - pshufb m3, m0 - pshufb m4, m0 - pshufb m5, m0 - pshufb m6, m0 - pshufb m7, m0 + movlps [r0], m0 + movhps [r0 + r1], m0 + movlps [r0 + 2 * r1], m2 + lea r0, [r0 + 2 * r1] + movhps [r0 + r1], m2 + movlps [r0 + 2 * r1], m4 + lea r0, [r0 + 2 * r1] + movhps [r0 + r1], m4 + movlps [r0 + 2 * r1], m6 + lea r0, [r0 + 2 * r1] + movhps [r0 + r1], m6 - movh [r0], m1 - movh [r0 + r1], m2 - movh [r0 + 2 * r1], m3 - lea r4, [r0 + 2 * r1] - movh [r4 + r1], m4 - movh [r4 + 2 * r1], m5 - lea r4, [r4 + 2 * r1] - movh [r4 + r1], m6 - movh [r4 + 2 * r1], m7 + lea r0, [r0 + 2 * r1] + lea r2, [r2 + 2 * r3] - movu m1, [r5 + r3] - pshufb m1, m0 - lea r4, [r4 + 2 * r1] - movh [r4 + r1], m1 - - lea r0, [r0 + 8 * r1] - lea r2, [r2 + 8 * r3] - - sub r6d, 8 - jnz .loop + dec r4d + jnz .loop RET %endmacro _______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
