Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
Has this been fixed? Murugan - have you reproduced/fixed this issue? On Sat, Feb 15, 2014 at 12:13 AM, Steve Borho st...@borho.org wrote: On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote: On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.comwrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392374441 -19800 # Fri Feb 14 16:10:41 2014 +0530 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695 # Parent ed310b17ff6681f191c85341cf6efe7a50770143 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives with this patch applied, if I fixup the elif problems, I get occasional dequant test failures on 8bpp mac. steve@zeppelin ./test/TestBench Using random seed 52FE6216 8bpp Testing primitives: SSE2 Testing primitives: SSE3 Testing primitives: SSSE3 Testing primitives: SSE4 dequant: Failed! Sorry, the dequant test failures appear to be caused by Murugan's testbench changes. I'm dequeuing those as well until we understand why the test is failing. -- Steve Borho ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
I have also checked on Mac machine, I am not getting dequant failure. I will send patch again, as this will not apply on latest tip. --- Dnyaneshwar G On Wed, Feb 19, 2014 at 9:57 AM, Murugan Vairavel muru...@multicorewareinc.com wrote: Hi Deepthi, I tried to reproduce this , but it is working fine. On Wed, Feb 19, 2014 at 7:21 AM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: Has this been fixed? Murugan - have you reproduced/fixed this issue? On Sat, Feb 15, 2014 at 12:13 AM, Steve Borho st...@borho.org wrote: On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote: On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.comwrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392374441 -19800 # Fri Feb 14 16:10:41 2014 +0530 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695 # Parent ed310b17ff6681f191c85341cf6efe7a50770143 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives with this patch applied, if I fixup the elif problems, I get occasional dequant test failures on 8bpp mac. steve@zeppelin ./test/TestBench Using random seed 52FE6216 8bpp Testing primitives: SSE2 Testing primitives: SSE3 Testing primitives: SSSE3 Testing primitives: SSE4 dequant: Failed! Sorry, the dequant test failures appear to be caused by Murugan's testbench changes. I'm dequeuing those as well until we understand why the test is failing. -- Steve Borho ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel -- With Regards, Murugan. V +919659287478 ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
On Wed, Feb 19, 2014 at 1:04 AM, dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392792673 -19800 # Wed Feb 19 12:21:13 2014 +0530 # Node ID 6150985c3d535f0ea7a1dc0b8f3c69e65e30d25b # Parent 1a0d5b456b19e8f187290c662425080cfc870492 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives pushed diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/asm-primitives.cpp --- a/source/common/x86/asm-primitives.cpp Tue Feb 18 14:46:51 2014 -0600 +++ b/source/common/x86/asm-primitives.cpp Wed Feb 19 12:21:13 2014 +0530 @@ -808,6 +808,10 @@ p.calcrecon[BLOCK_8x8] = x265_calcRecons8_sse2; p.calcrecon[BLOCK_16x16] = x265_calcRecons16_sse2; p.calcrecon[BLOCK_32x32] = x265_calcRecons32_sse2; + +p.dct[DCT_4x4] = x265_dct4_sse2; +p.idct[IDCT_4x4] = x265_idct4_sse2; +p.idct[IDST_4x4] = x265_idst4_sse2; } if (cpuMask X265_CPU_SSSE3) { @@ -822,10 +826,12 @@ SETUP_INTRA_ANG32(2, 2, ssse3); SETUP_INTRA_ANG32(34, 2, ssse3); + +p.dct[DST_4x4] = x265_dst4_ssse3; } if (cpuMask X265_CPU_SSE4) { - +p.dct[DCT_8x8] = x265_dct8_sse4; p.quant = x265_quant_sse4; p.dequant_normal = x265_dequant_normal_sse4; p.cvt16to32_shl = x265_cvt16to32_shl_sse4; diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/const-a.asm --- a/source/common/x86/const-a.asm Tue Feb 18 14:46:51 2014 -0600 +++ b/source/common/x86/const-a.asm Wed Feb 19 12:21:13 2014 +0530 @@ -69,9 +69,10 @@ const pw_ppmmppmm, dw 1,1,-1,-1,1,1,-1,-1 const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-1 const pw_pmmp, dw 1,-1,-1,1,0,0,0,0 - const pd_1,times 4 dd 1 const pd_2,times 4 dd 2 +const pd_4,times 4 dd 4 +const pd_8,times 4 dd 8 const pd_16, times 4 dd 16 const pd_32, times 4 dd 32 const pd_64, times 4 dd 64 diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/dct8.asm --- a/source/common/x86/dct8.asmTue Feb 18 14:46:51 2014 -0600 +++ b/source/common/x86/dct8.asmWed Feb 19 12:21:13 2014 +0530 @@ -64,9 +64,12 @@ pb_unpackhlw1: db 0,1,8,9,2,3,10,11,4,5,12,13,6,7,14,15 SECTION .text - cextern pd_1 cextern pd_2 +cextern pd_4 +cextern pd_8 +cextern pd_16 +cextern pd_32 cextern pd_64 cextern pd_128 cextern pd_256 @@ -79,16 +82,21 @@ ;-- INIT_XMM sse2 cglobal dct4, 3, 4, 8 - +%if BIT_DEPTH == 10 + %define DCT_SHIFT 3 + mova m7, [pd_4] +%elif BIT_DEPTH == 8 + %define DCT_SHIFT 1 + mova m7, [pd_1] +%else + %error Unsupported BIT_DEPTH! +%endif add r2d, r2d lea r3, [tab_dct4] movam4, [r3 + 0 * 16] movam5, [r3 + 1 * 16] movam6, [r3 + 2 * 16] - -movam7, [pd_1] - movhm0, [r0 + 0 * r2] movhm1, [r0 + 1 * r2] punpcklqdq m0, m1 @@ -107,27 +115,21 @@ paddw m1, m2, m0 psubw m2, m0 - pmaddwd m0, m1, m4 paddd m0, m7 -psrad m0, 1 - +psrad m0, DCT_SHIFT pmaddwd m3, m2, m5 paddd m3, m7 -psrad m3, 1 - +psrad m3, DCT_SHIFT packssdwm0, m3 pshufd m0, m0, 0xD8 pshufhw m0, m0, 0xB1 - pmaddwd m1, m6 paddd m1, m7 -psrad m1, 1 - +psrad m1, DCT_SHIFT pmaddwd m2, [r3 + 3 * 16] paddd m2, m7 -psrad m2, 1 - +psrad m2, DCT_SHIFT packssdwm1, m2 pshufd m1, m1, 0xD8 pshufhw m1, m1, 0xB1 @@ -179,7 +181,7 @@ %define IDCT4_OFFSET [pd_512] %define IDCT4_SHIFT 10 %else - %error Unsupport BIT_DEPTH! + %error Unsupported BIT_DEPTH! %endif add r2d, r2d lea r3, [tab_dct4] @@ -268,67 +270,60 @@ INIT_XMM ssse3 %if ARCH_X86_64 cglobal dst4, 3, 4, 8+2 + %define coef2 m8 + %define coef3 m9 %else ; ARCH_X86_64 = 0 cglobal dst4, 3, 4, 8 + %define coef2 [r3 + 2 * 16] + %define coef3 [r3 + 3 * 16] %endif ; ARCH_X86_64 +%define coef0 m6 +%define coef1 m7 -%define coef0 m6 -%define coef1 m7 -%if ARCH_X86_64 -%define coef2 m8 -%define coef3 m9 -%else -%define coef2 [r3 + 2 * 16] -%define coef3 [r3 + 3 * 16] +%if BIT_DEPTH == 8 + %define DST_SHIFT 1 + mova m5, [pd_1] +%elif BIT_DEPTH == 10 + %define DST_SHIFT 3 + mova m5, [pd_4] %endif - add r2d, r2d lea r3, [tab_dst4] - -movam5, [pd_1] - movacoef0, [r3 + 0 * 16] movacoef1, [r3 + 1 * 16] %if ARCH_X86_64
Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
right At 2014-02-14 18:41:34,dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392374441 -19800 # Fri Feb 14 16:10:41 2014 +0530 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695 # Parent ed310b17ff6681f191c85341cf6efe7a50770143 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392374441 -19800 # Fri Feb 14 16:10:41 2014 +0530 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695 # Parent ed310b17ff6681f191c85341cf6efe7a50770143 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives with this patch applied, if I fixup the elif problems, I get occasional dequant test failures on 8bpp mac. steve@zeppelin ./test/TestBench Using random seed 52FE6216 8bpp Testing primitives: SSE2 Testing primitives: SSE3 Testing primitives: SSSE3 Testing primitives: SSE4 dequant: Failed! diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/asm-primitives.cpp --- a/source/common/x86/asm-primitives.cpp Fri Feb 14 02:30:52 2014 -0600 +++ b/source/common/x86/asm-primitives.cpp Fri Feb 14 16:10:41 2014 +0530 @@ -726,6 +726,10 @@ p.calcrecon[BLOCK_8x8] = x265_calcRecons8_sse2; p.calcrecon[BLOCK_16x16] = x265_calcRecons16_sse2; p.calcrecon[BLOCK_32x32] = x265_calcRecons32_sse2; + +p.dct[DCT_4x4] = x265_dct4_sse2; +p.idct[IDCT_4x4] = x265_idct4_sse2; +p.idct[IDST_4x4] = x265_idst4_sse2; } if (cpuMask X265_CPU_SSSE3) { @@ -740,9 +744,12 @@ SETUP_INTRA_ANG32(2, 2, ssse3); SETUP_INTRA_ANG32(34, 2, ssse3); + +p.dct[DST_4x4] = x265_dst4_ssse3; } if (cpuMask X265_CPU_SSE4) { +p.dct[DCT_8x8] = x265_dct8_sse4; p.cvt16to32_shl = x265_cvt16to32_shl_sse4; p.intra_pred[BLOCK_4x4][0] = x265_intra_pred_planar4_sse4; diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/const-a.asm --- a/source/common/x86/const-a.asm Fri Feb 14 02:30:52 2014 -0600 +++ b/source/common/x86/const-a.asm Fri Feb 14 16:10:41 2014 +0530 @@ -72,6 +72,8 @@ const pd_1,times 4 dd 1 const pd_2,times 4 dd 2 +const pd_4,times 4 dd 4 +const pd_8,times 4 dd 8 const pd_16, times 4 dd 16 const pd_32, times 4 dd 32 const pd_64, times 4 dd 64 diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/dct8.asm --- a/source/common/x86/dct8.asmFri Feb 14 02:30:52 2014 -0600 +++ b/source/common/x86/dct8.asmFri Feb 14 16:10:41 2014 +0530 @@ -67,6 +67,10 @@ cextern pd_1 cextern pd_2 +cextern pd_4 +cextern pd_8 +cextern pd_16 +cextern pd_32 cextern pd_64 cextern pd_128 cextern pd_256 @@ -79,6 +83,15 @@ ;-- INIT_XMM sse2 cglobal dct4, 3, 4, 8 +%if BIT_DEPTH == 10 + %define DCT_SHIFT 3 + mova m7, [pd_4] +%else if BIT_DEPTH == 8 %elif BIT_DEPTH == 8 + %define DCT_SHIFT 1 + mova m7, [pd_1] +%else + %error Unsupported BIT_DEPTH! +%endif add r2d, r2d lea r3, [tab_dct4] @@ -87,8 +100,6 @@ movam5, [r3 + 1 * 16] movam6, [r3 + 2 * 16] -movam7, [pd_1] - movhm0, [r0 + 0 * r2] movhm1, [r0 + 1 * r2] punpcklqdq m0, m1 @@ -110,11 +121,11 @@ pmaddwd m0, m1, m4 paddd m0, m7 -psrad m0, 1 +psrad m0, DCT_SHIFT pmaddwd m3, m2, m5 paddd m3, m7 -psrad m3, 1 +psrad m3, DCT_SHIFT packssdwm0, m3 pshufd m0, m0, 0xD8 @@ -122,11 +133,11 @@ pmaddwd m1, m6 paddd m1, m7 -psrad m1, 1 +psrad m1, DCT_SHIFT pmaddwd m2, [r3 + 3 * 16] paddd m2, m7 -psrad m2, 1 +psrad m2, DCT_SHIFT packssdwm1, m2 pshufd m1, m1, 0xD8 @@ -179,7 +190,7 @@ %define IDCT4_OFFSET [pd_512] %define IDCT4_SHIFT 10 %else - %error Unsupport BIT_DEPTH! + %error Unsupported BIT_DEPTH! %endif add r2d, r2d lea r3, [tab_dct4] @@ -268,25 +279,28 @@ INIT_XMM ssse3 %if ARCH_X86_64 cglobal dst4, 3, 4, 8+2 + %define coef2 m8 + %define coef3 m9 %else ; ARCH_X86_64 = 0 cglobal dst4, 3, 4, 8 + %define coef2 [r3 + 2 * 16] + %define coef3 [r3 + 3 * 16] %endif ; ARCH_X86_64 -%define coef0 m6 -%define coef1 m7 -%if ARCH_X86_64 -%define coef2 m8 -%define coef3 m9 -%else -%define coef2 [r3 + 2 * 16] -%define coef3 [r3 + 3 * 16] -%endif +%define coef0 m6 +%define coef1 m7 + +%if BIT_DEPTH == 8 + %define DST_SHIFT 1 + mova m5, [pd_1] +%else if BIT_DEPTH == 10 %elif BIT_DEPTH == 10, there's one more of these below + %define DST_SHIFT 3 + mova m5, [pd_4] +%endif add r2d, r2d lea r3, [tab_dst4] -movam5, [pd_1] - movacoef0, [r3 + 0 * 16] movacoef1, [r3 + 1 * 16]
Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives
On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote: On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1392374441 -19800 # Fri Feb 14 16:10:41 2014 +0530 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695 # Parent ed310b17ff6681f191c85341cf6efe7a50770143 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives with this patch applied, if I fixup the elif problems, I get occasional dequant test failures on 8bpp mac. steve@zeppelin ./test/TestBench Using random seed 52FE6216 8bpp Testing primitives: SSE2 Testing primitives: SSE3 Testing primitives: SSSE3 Testing primitives: SSE4 dequant: Failed! Sorry, the dequant test failures appear to be caused by Murugan's testbench changes. I'm dequeuing those as well until we understand why the test is failing. -- Steve Borho ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel