At 2013-10-15 16:35:15,[email protected] wrote: ># HG changeset patch ># User Dnyaneshwar Gorade <[email protected]> ># Date 1381826069 -19800 ># Tue Oct 15 14:04:29 2013 +0530 ># Node ID 3cd533917aa110f7231abf6e0186e99b22dd4dcf ># Parent 1a85d8814346efdb984ea9eae24d1b06b973e9a8 >pixel-sse41.cpp: Modified PROCESS_SSE_SS4x1 macro with faster intrinsics. > >diff -r 1a85d8814346 -r 3cd533917aa1 source/common/vec/pixel-sse41.cpp >--- a/source/common/vec/pixel-sse41.cpp Tue Oct 15 12:45:58 2013 +0530 >+++ b/source/common/vec/pixel-sse41.cpp Tue Oct 15 14:04:29 2013 +0530 >@@ -5331,10 +5331,8 @@ > #define PROCESS_SSE_SS4x1(BASE)\ > m1 = _mm_loadu_si128((__m128i const*)(fenc + BASE)); \ > n1 = _mm_loadu_si128((__m128i const*)(fref + BASE)); \ >- sign1 = _mm_srai_epi16(m1, 15); \ >- tmp1 = _mm_unpacklo_epi16(m1, sign1); \ >- sign2 = _mm_srai_epi16(n1, 15); \ >- tmp2 = _mm_unpacklo_epi16(n1, sign2); \ >+ tmp1= _mm_cvtepi16_epi32(m1); \ >+ tmp2= _mm_cvtepi16_epi32(n1); \ > diff = _mm_sub_epi32(tmp1, tmp2); \ > diff = _mm_mullo_epi32(diff, diff); \ > sum = _mm_add_epi32(sum, diff)
two suggest: 1. be careful use SSE4 instruction with VS compiler, it have many bugs 2. are we have full of 16-bits dynamic range? if not, we may use instruction PMADDWD for more performance
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
