does this patch need re-review? Not seeing this into public tree? On Thu, Mar 19, 2015 at 11:44 AM, <[email protected]> wrote:
> # HG changeset patch > # User Dnyaneshwar G <[email protected]> > # Date 1426745611 -19800 > # Thu Mar 19 11:43:31 2015 +0530 > # Node ID d2b99b5edfde84bd4ad22daaca6f87662d46c2df > # Parent 2807d9a5a494de78340ab6d09867205b6676330b > asm: addAvg avx2 code for chroma sizes width >= 8, reused code from luma > > AVX2: > addAvg[ 8x2] 7.79x 84.28 656.61 > addAvg[ 8x6] 10.92x 334.48 3654.07 > > SSE4: > addAvg[ 8x2] 7.50x 89.67 672.36 > addAvg[ 8x6] 10.74x 342.18 3673.41 > > diff -r 2807d9a5a494 -r d2b99b5edfde source/common/x86/asm-primitives.cpp > --- a/source/common/x86/asm-primitives.cpp Thu Mar 19 11:22:06 2015 > +0530 > +++ b/source/common/x86/asm-primitives.cpp Thu Mar 19 11:43:31 2015 > +0530 > @@ -1446,6 +1446,26 @@ > p.pu[LUMA_64x48].addAvg = x265_addAvg_64x48_avx2; > p.pu[LUMA_64x64].addAvg = x265_addAvg_64x64_avx2; > > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].addAvg = > x265_addAvg_8x2_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].addAvg = > x265_addAvg_8x4_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].addAvg = > x265_addAvg_8x6_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].addAvg = > x265_addAvg_8x8_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].addAvg = > x265_addAvg_8x16_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].addAvg = > x265_addAvg_8x32_avx2; > + > + p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].addAvg = > x265_addAvg_12x16_avx2; > + > + p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].addAvg = > x265_addAvg_16x4_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].addAvg = > x265_addAvg_16x8_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].addAvg = > x265_addAvg_16x12_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].addAvg = > x265_addAvg_16x16_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].addAvg = > x265_addAvg_16x32_avx2; > + > + p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].addAvg = > x265_addAvg_32x8_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].addAvg = > x265_addAvg_32x16_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].addAvg = > x265_addAvg_32x24_avx2; > + p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].addAvg = > x265_addAvg_32x32_avx2; > + > p.cu[BLOCK_16x16].add_ps = x265_pixel_add_ps_16x16_avx2; > p.cu[BLOCK_32x32].add_ps = x265_pixel_add_ps_32x32_avx2; > p.cu[BLOCK_64x64].add_ps = x265_pixel_add_ps_64x64_avx2; > diff -r 2807d9a5a494 -r d2b99b5edfde source/common/x86/mc-a.asm > --- a/source/common/x86/mc-a.asm Thu Mar 19 11:22:06 2015 +0530 > +++ b/source/common/x86/mc-a.asm Thu Mar 19 11:43:31 2015 +0530 > @@ -1762,6 +1762,84 @@ > ; addAvg avx2 code start > > > ;----------------------------------------------------------------------------- > > +INIT_YMM avx2 > +cglobal addAvg_8x2, 6,6,4, pSrc0, src0, src1, dst, src0Stride, src1tride, > dstStride > + movu xm0, [r0] > + vinserti128 m0, m0, [r0 + 2 * r3], 1 > + > + movu xm2, [r1] > + vinserti128 m2, m2, [r1 + 2 * r4], 1 > + > + paddw m0, m2 > + pmulhrsw m0, [pw_256] > + paddw m0, [pw_128] > + > + packuswb m0, m0 > + vextracti128 xm1, m0, 1 > + movq [r2], xm0 > + movq [r2 + r5], xm1 > + RET > + > +cglobal addAvg_8x6, 6,6,6, pSrc0, src0, src1, dst, src0Stride, src1tride, > dstStride > + mova m4, [pw_256] > + mova m5, [pw_128] > + add r3, r3 > + add r4, r4 > + > + movu xm0, [r0] > + vinserti128 m0, m0, [r0 + r3], 1 > + > + movu xm2, [r1] > + vinserti128 m2, m2, [r1 + r4], 1 > + > + paddw m0, m2 > + pmulhrsw m0, m4 > + paddw m0, m5 > + > + packuswb m0, m0 > + vextracti128 xm1, m0, 1 > + movq [r2], xm0 > + movq [r2 + r5], xm1 > + > + lea r2, [r2 + 2 * r5] > + lea r0, [r0 + 2 * r3] > + lea r1, [r1 + 2 * r4] > + > + movu xm0, [r0] > + vinserti128 m0, m0, [r0+ r3], 1 > + > + movu xm2, [r1] > + vinserti128 m2, m2, [r1 + r4], 1 > + > + paddw m0, m2 > + pmulhrsw m0, m4 > + paddw m0, m5 > + > + packuswb m0, m0 > + vextracti128 xm1, m0, 1 > + movq [r2], xm0 > + movq [r2 + r5], xm1 > + > + lea r2, [r2 + 2 * r5] > + lea r0, [r0 + 2 * r3] > + lea r1, [r1 + 2 * r4] > + > + movu xm0, [r0] > + vinserti128 m0, m0, [r0 + r3], 1 > + > + movu xm2, [r1] > + vinserti128 m2, m2, [r1 + r4], 1 > + > + paddw m0, m2 > + pmulhrsw m0, m4 > + paddw m0, m5 > + > + packuswb m0, m0 > + vextracti128 xm1, m0, 1 > + movq [r2], xm0 > + movq [r2 + r5], xm1 > + RET > + > %macro ADDAVG_W8_H4_AVX2 1 > INIT_YMM avx2 > cglobal addAvg_8x%1, 6,7,6, pSrc0, src0, src1, dst, src0Stride, > src1tride, dstStride >
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
