At 2014-02-19 18:53:47,dnyanesh...@multicorewareinc.com wrote: ># HG changeset patch ># User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> ># Date 1392807092 -19800 ># Wed Feb 19 16:21:32 2014 +0530 ># Node ID cede20cde62ba0a96ac181bcf78a508097de0e7c ># Parent 6150985c3d535f0ea7a1dc0b8f3c69e65e30d25b >asm-16bpp: code for addAvg luma and chroma all sizes > >+%if HIGH_BIT_DEPTH >+INIT_XMM sse4 >+cglobal addAvg_2x4, 6,7,8, pSrc0, pSrc1, pDst, iStride0, iStride1, iDstStride >+ mova m7, [pw_16400] >+ mova m0, [pw_1023] m7 and m0 used just once, so merge address into instruction is shorter code size. >+ add r3, r3 >+ add r4, r4 >+ add r5, r5 >+ >+ movd m1, [r0] >+ movd m2, [r0 + r3] >+ movd m3, [r1] >+ movd m4, [r1 + r4] >+ >+ punpckldq m1, m2 >+ punpckldq m3, m4 >+ >+ lea r0, [r0 + 2 * r3] >+ lea r1, [r1 + 2 * r4] >+ >+ movd m2, [r0] >+ movd m4, [r0 + r3] >+ movd m5, [r1] >+ movd m6, [r1 + r4] >+ >+ punpckldq m2, m4 >+ punpckldq m5, m6 >+ punpcklqdq m1, m2 >+ punpcklqdq m3, m5 >+ >+ paddw m1, m3 >+ paddw m1, m7 m7 is 16440, it is most possible to overflow, please do the dynamic range analyze here
>+ psraw m1, 5 >+ pxor m6, m6 >+ pmaxsw m1, m6 >+ pminsw m1, m0 >+ >+ movd [r2], m1 >+ pextrd [r2 + r5], m1, 1 >+ lea r2, [r2 + 2 * r5] >+ pextrd [r2], m1, 2 >+ pextrd [r2 + r5], m1, 3 >+ >+ RET >+
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel