# HG changeset patch # User Praveen Tiwari # Date 1383835016 -19800 # Node ID 2848c581fbc3483c951610311e8ffb8ab66c1064 # Parent fed3fbe5e9f1942da657957821a5d1bb396f3d37 asm code for blockfil_s, 32x32
diff -r fed3fbe5e9f1 -r 2848c581fbc3 source/common/x86/asm-primitives.cpp --- a/source/common/x86/asm-primitives.cpp Thu Nov 07 19:40:51 2013 +0530 +++ b/source/common/x86/asm-primitives.cpp Thu Nov 07 20:06:56 2013 +0530 @@ -365,6 +365,7 @@ p.blockfill_s[BLOCK_4x4] = x265_blockfil_s_4x4_sse2; p.blockfill_s[BLOCK_8x8] = x265_blockfil_s_8x8_sse2; p.blockfill_s[BLOCK_16x16] = x265_blockfil_s_16x16_sse2; + p.blockfill_s[BLOCK_32x32] = x265_blockfil_s_32x32_sse2; #if X86_64 p.satd[LUMA_8x32] = x265_pixel_satd_8x32_sse2; p.satd[LUMA_16x4] = x265_pixel_satd_16x4_sse2; diff -r fed3fbe5e9f1 -r 2848c581fbc3 source/common/x86/blockcopy8.asm --- a/source/common/x86/blockcopy8.asm Thu Nov 07 19:40:51 2013 +0530 +++ b/source/common/x86/blockcopy8.asm Thu Nov 07 20:06:56 2013 +0530 @@ -1747,3 +1747,51 @@ %endmacro BLOCKFIL_S_W16_H8 16, 16 + +;----------------------------------------------------------------------------- +; void blockfil_s_%1x%2(int16_t *dest, intptr_t destride, int16_t val) +;----------------------------------------------------------------------------- +%macro BLOCKFIL_S_W32_H4 2 +INIT_XMM sse2 +cglobal blockfil_s_%1x%2, 3, 5, 1, dest, destStride, val + +mov r3d, %2 + +add r1, r1 + +movd m0, r2d +pshuflw m0, m0, 0 +pshufd m0, m0, 0 + +.loop + movu [r0], m0 + movu [r0 + 16], m0 + movu [r0 + 32], m0 + movu [r0 + 48], m0 + + movu [r0 + r1], m0 + movu [r0 + r1 + 16], m0 + movu [r0 + r1 + 32], m0 + movu [r0 + r1 + 48], m0 + + movu [r0 + 2 * r1], m0 + movu [r0 + 2 * r1 + 16], m0 + movu [r0 + 2 * r1 + 32], m0 + movu [r0 + 2 * r1 + 48], m0 + + lea r4, [r0 + 2 * r1] + + movu [r4 + r1], m0 + movu [r4 + r1 + 16], m0 + movu [r4 + r1 + 32], m0 + movu [r4 + r1 + 48], m0 + + lea r0, [r0 + 4 * r1] + + sub r3d, 4 + jnz .loop + +RET +%endmacro + +BLOCKFIL_S_W32_H4 32, 32 diff -r fed3fbe5e9f1 -r 2848c581fbc3 source/common/x86/pixel.h --- a/source/common/x86/pixel.h Thu Nov 07 19:40:51 2013 +0530 +++ b/source/common/x86/pixel.h Thu Nov 07 20:06:56 2013 +0530 @@ -269,6 +269,7 @@ void x265_blockfil_s_4x4_sse2(int16_t *dst, intptr_t dstride, int16_t val); void x265_blockfil_s_8x8_sse2(int16_t *dst, intptr_t dstride, int16_t val); void x265_blockfil_s_16x16_sse2(int16_t *dst, intptr_t dstride, int16_t val); +void x265_blockfil_s_32x32_sse2(int16_t *dst, intptr_t dstride, int16_t val); #undef DECL_PIXELS #undef DECL_SUF _______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel