At 2015-10-12 11:58:46,[email protected] wrote: ># HG changeset patch ># User Ramya Sriraman <[email protected]> ># Date 1443592336 -19800 ># Wed Sep 30 11:22:16 2015 +0530 ># Node ID f0d43eb655f0048fe5dd491ee4ad1c0d304b76f8 ># Parent b6156a08b1def3584647f26096866c1a0c11e54a >asm: Add sse_ss for [16x16],[32x32] & [64x64] for 8bpp avx2 > >diff -r b6156a08b1de -r f0d43eb655f0 source/common/x86/asm-primitives.cpp >--- a/source/common/x86/asm-primitives.cpp Fri Oct 09 20:45:59 2015 +0530 >+++ b/source/common/x86/asm-primitives.cpp Wed Sep 30 11:22:16 2015 +0530 >@@ -2667,6 +2667,10 @@ > #if X86_64 > if (cpuMask & X265_CPU_AVX2) > { >+ p.cu[BLOCK_16x16].sse_ss = >(pixel_sse_ss_t)PFX(pixel_ssd_ss_16x16_avx2); >+ p.cu[BLOCK_32x32].sse_ss = >(pixel_sse_ss_t)PFX(pixel_ssd_ss_32x32_avx2); >+ p.cu[BLOCK_64x64].sse_ss = >(pixel_sse_ss_t)PFX(pixel_ssd_ss_64x64_avx2); >+ > p.cu[BLOCK_16x16].var = PFX(pixel_var_16x16_avx2); > p.cu[BLOCK_32x32].var = PFX(pixel_var_32x32_avx2); > p.cu[BLOCK_64x64].var = PFX(pixel_var_64x64_avx2); >diff -r b6156a08b1de -r f0d43eb655f0 source/common/x86/ssd-a.asm >--- a/source/common/x86/ssd-a.asm Fri Oct 09 20:45:59 2015 +0530 >+++ b/source/common/x86/ssd-a.asm Wed Sep 30 11:22:16 2015 +0530 >@@ -1016,8 +1016,171 @@ > SSD_SS_32xN > SSD_SS_48 > SSD_SS_64xN >+ >+INIT_YMM avx2 >+cglobal pixel_ssd_ss_16x16, 4,4,3 >+ add r1d, r1d >+ add r3d, r3d >+ pxor m2, m2 >+ >+ movu m0, [r0] >+ movu m1, [r0 + r1] >+ psubw m0, [r2] >+ psubw m1, [r2 + r3] >+ lea r0, [r0 + 2 * r1] >+ lea r2, [r2 + 2 * r3] we have more register to buffer r1*3, it will reduce number of LEA
>+ pmaddwd m0, m0 >+ pmaddwd m1, m1 >+ paddd m2, m1 >+ paddd m2, m0 use m2 and m3 for partial sum, it may broken long dependency link
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
