Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-18 Thread Deepthi Nandakumar
Has this been fixed? Murugan - have you reproduced/fixed this issue?


On Sat, Feb 15, 2014 at 12:13 AM, Steve Borho st...@borho.org wrote:




 On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote:




 On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.comwrote:

 # HG changeset patch
 # User Dnyaneshwar G dnyanesh...@multicorewareinc.com
 # Date 1392374441 -19800
 #  Fri Feb 14 16:10:41 2014 +0530
 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695
 # Parent  ed310b17ff6681f191c85341cf6efe7a50770143
 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4
 primitives


 with this patch applied, if I fixup the elif problems, I get occasional
 dequant test failures on 8bpp mac.

 steve@zeppelin ./test/TestBench

 Using random seed 52FE6216 8bpp

 Testing primitives: SSE2

 Testing primitives: SSE3

 Testing primitives: SSSE3

 Testing primitives: SSE4

 dequant: Failed!


 Sorry, the dequant test failures appear to be caused by Murugan's
 testbench changes.  I'm dequeuing those as well until we understand why the
 test is failing.

 --
 Steve Borho

 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel


___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-18 Thread Dnyaneshwar Gorade
I have also checked on Mac machine, I am not getting dequant failure. I
will send patch again, as this will not apply on latest tip.

---
Dnyaneshwar G


On Wed, Feb 19, 2014 at 9:57 AM, Murugan Vairavel 
muru...@multicorewareinc.com wrote:

 Hi Deepthi,

 I tried to reproduce this , but it is working fine.


 On Wed, Feb 19, 2014 at 7:21 AM, Deepthi Nandakumar 
 deep...@multicorewareinc.com wrote:

 Has this been fixed? Murugan - have you reproduced/fixed this issue?


 On Sat, Feb 15, 2014 at 12:13 AM, Steve Borho st...@borho.org wrote:




 On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote:




 On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.comwrote:

 # HG changeset patch
 # User Dnyaneshwar G dnyanesh...@multicorewareinc.com
 # Date 1392374441 -19800
 #  Fri Feb 14 16:10:41 2014 +0530
 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695
 # Parent  ed310b17ff6681f191c85341cf6efe7a50770143
 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and
 idst4x4 primitives


 with this patch applied, if I fixup the elif problems, I get occasional
 dequant test failures on 8bpp mac.

 steve@zeppelin ./test/TestBench

 Using random seed 52FE6216 8bpp

 Testing primitives: SSE2

 Testing primitives: SSE3

 Testing primitives: SSSE3

 Testing primitives: SSE4

 dequant: Failed!


 Sorry, the dequant test failures appear to be caused by Murugan's
 testbench changes.  I'm dequeuing those as well until we understand why the
 test is failing.

 --
 Steve Borho

 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel



 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel




 --
 With Regards,

 Murugan. V
 +919659287478

 ___
 x265-devel mailing list
 x265-devel@videolan.org
 https://mailman.videolan.org/listinfo/x265-devel


___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-18 Thread Steve Borho
On Wed, Feb 19, 2014 at 1:04 AM, dnyanesh...@multicorewareinc.com wrote:

 # HG changeset patch
 # User Dnyaneshwar G dnyanesh...@multicorewareinc.com
 # Date 1392792673 -19800
 #  Wed Feb 19 12:21:13 2014 +0530
 # Node ID 6150985c3d535f0ea7a1dc0b8f3c69e65e30d25b
 # Parent  1a0d5b456b19e8f187290c662425080cfc870492
 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4
 primitives


pushed



 diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/asm-primitives.cpp
 --- a/source/common/x86/asm-primitives.cpp  Tue Feb 18 14:46:51 2014
 -0600
 +++ b/source/common/x86/asm-primitives.cpp  Wed Feb 19 12:21:13 2014
 +0530
 @@ -808,6 +808,10 @@
  p.calcrecon[BLOCK_8x8] = x265_calcRecons8_sse2;
  p.calcrecon[BLOCK_16x16] = x265_calcRecons16_sse2;
  p.calcrecon[BLOCK_32x32] = x265_calcRecons32_sse2;
 +
 +p.dct[DCT_4x4] = x265_dct4_sse2;
 +p.idct[IDCT_4x4] = x265_idct4_sse2;
 +p.idct[IDST_4x4] = x265_idst4_sse2;
  }
  if (cpuMask  X265_CPU_SSSE3)
  {
 @@ -822,10 +826,12 @@

  SETUP_INTRA_ANG32(2, 2, ssse3);
  SETUP_INTRA_ANG32(34, 2, ssse3);
 +
 +p.dct[DST_4x4] = x265_dst4_ssse3;
  }
  if (cpuMask  X265_CPU_SSE4)
  {
 -
 +p.dct[DCT_8x8] = x265_dct8_sse4;
  p.quant = x265_quant_sse4;
  p.dequant_normal = x265_dequant_normal_sse4;
  p.cvt16to32_shl = x265_cvt16to32_shl_sse4;
 diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/const-a.asm
 --- a/source/common/x86/const-a.asm Tue Feb 18 14:46:51 2014 -0600
 +++ b/source/common/x86/const-a.asm Wed Feb 19 12:21:13 2014 +0530
 @@ -69,9 +69,10 @@
  const pw_ppmmppmm, dw 1,1,-1,-1,1,1,-1,-1
  const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-1
  const pw_pmmp, dw 1,-1,-1,1,0,0,0,0
 -
  const pd_1,times 4 dd 1
  const pd_2,times 4 dd 2
 +const pd_4,times 4 dd 4
 +const pd_8,times 4 dd 8
  const pd_16,   times 4 dd 16
  const pd_32,   times 4 dd 32
  const pd_64,   times 4 dd 64
 diff -r 1a0d5b456b19 -r 6150985c3d53 source/common/x86/dct8.asm
 --- a/source/common/x86/dct8.asmTue Feb 18 14:46:51 2014 -0600
 +++ b/source/common/x86/dct8.asmWed Feb 19 12:21:13 2014 +0530
 @@ -64,9 +64,12 @@
  pb_unpackhlw1:  db 0,1,8,9,2,3,10,11,4,5,12,13,6,7,14,15

  SECTION .text
 -
  cextern pd_1
  cextern pd_2
 +cextern pd_4
 +cextern pd_8
 +cextern pd_16
 +cextern pd_32
  cextern pd_64
  cextern pd_128
  cextern pd_256
 @@ -79,16 +82,21 @@
  ;--
  INIT_XMM sse2
  cglobal dct4, 3, 4, 8
 -
 +%if BIT_DEPTH == 10
 +  %define   DCT_SHIFT 3
 +  mova  m7, [pd_4]
 +%elif BIT_DEPTH == 8
 +  %define   DCT_SHIFT 1
 +  mova  m7, [pd_1]
 +%else
 +  %error Unsupported BIT_DEPTH!
 +%endif
  add r2d, r2d
  lea r3, [tab_dct4]

  movam4, [r3 + 0 * 16]
  movam5, [r3 + 1 * 16]
  movam6, [r3 + 2 * 16]
 -
 -movam7, [pd_1]
 -
  movhm0, [r0 + 0 * r2]
  movhm1, [r0 + 1 * r2]
  punpcklqdq  m0, m1
 @@ -107,27 +115,21 @@

  paddw   m1, m2, m0
  psubw   m2, m0
 -
  pmaddwd m0, m1, m4
  paddd   m0, m7
 -psrad   m0, 1
 -
 +psrad   m0, DCT_SHIFT
  pmaddwd m3, m2, m5
  paddd   m3, m7
 -psrad   m3, 1
 -
 +psrad   m3, DCT_SHIFT
  packssdwm0, m3
  pshufd  m0, m0, 0xD8
  pshufhw m0, m0, 0xB1
 -
  pmaddwd m1, m6
  paddd   m1, m7
 -psrad   m1, 1
 -
 +psrad   m1, DCT_SHIFT
  pmaddwd m2, [r3 + 3 * 16]
  paddd   m2, m7
 -psrad   m2, 1
 -
 +psrad   m2, DCT_SHIFT
  packssdwm1, m2
  pshufd  m1, m1, 0xD8
  pshufhw m1, m1, 0xB1
 @@ -179,7 +181,7 @@
%define IDCT4_OFFSET  [pd_512]
%define IDCT4_SHIFT   10
  %else
 -  %error Unsupport BIT_DEPTH!
 +  %error Unsupported BIT_DEPTH!
  %endif
  add r2d, r2d
  lea r3, [tab_dct4]
 @@ -268,67 +270,60 @@
  INIT_XMM ssse3
  %if ARCH_X86_64
  cglobal dst4, 3, 4, 8+2
 +  %define   coef2   m8
 +  %define   coef3   m9
  %else ; ARCH_X86_64 = 0
  cglobal dst4, 3, 4, 8
 +  %define   coef2   [r3 + 2 * 16]
 +  %define   coef3   [r3 + 3 * 16]
  %endif ; ARCH_X86_64
 +%define coef0   m6
 +%define coef1   m7

 -%define coef0   m6
 -%define coef1   m7
 -%if ARCH_X86_64
 -%define coef2   m8
 -%define coef3   m9
 -%else
 -%define coef2   [r3 + 2 * 16]
 -%define coef3   [r3 + 3 * 16]
 +%if BIT_DEPTH == 8
 +  %define   DST_SHIFT 1
 +  mova  m5, [pd_1]
 +%elif BIT_DEPTH == 10
 +  %define   DST_SHIFT 3
 +  mova  m5, [pd_4]
  %endif
 -
  add r2d, r2d
  lea r3, [tab_dst4]
 -
 -movam5, [pd_1]
 -
  movacoef0, [r3 + 0 * 16]
  movacoef1, [r3 + 1 * 16]
  %if ARCH_X86_64
 

Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-14 Thread chen
right

At 2014-02-14 18:41:34,dnyanesh...@multicorewareinc.com wrote:
# HG changeset patch
# User Dnyaneshwar G dnyanesh...@multicorewareinc.com
# Date 1392374441 -19800
#  Fri Feb 14 16:10:41 2014 +0530
# Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695
# Parent  ed310b17ff6681f191c85341cf6efe7a50770143
asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 
primitives
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-14 Thread Steve Borho
On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.com wrote:

 # HG changeset patch
 # User Dnyaneshwar G dnyanesh...@multicorewareinc.com
 # Date 1392374441 -19800
 #  Fri Feb 14 16:10:41 2014 +0530
 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695
 # Parent  ed310b17ff6681f191c85341cf6efe7a50770143
 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4
 primitives


with this patch applied, if I fixup the elif problems, I get occasional
dequant test failures on 8bpp mac.

steve@zeppelin ./test/TestBench

Using random seed 52FE6216 8bpp

Testing primitives: SSE2

Testing primitives: SSE3

Testing primitives: SSSE3

Testing primitives: SSE4

dequant: Failed!



 diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/asm-primitives.cpp
 --- a/source/common/x86/asm-primitives.cpp  Fri Feb 14 02:30:52 2014
 -0600
 +++ b/source/common/x86/asm-primitives.cpp  Fri Feb 14 16:10:41 2014
 +0530
 @@ -726,6 +726,10 @@
  p.calcrecon[BLOCK_8x8] = x265_calcRecons8_sse2;
  p.calcrecon[BLOCK_16x16] = x265_calcRecons16_sse2;
  p.calcrecon[BLOCK_32x32] = x265_calcRecons32_sse2;
 +
 +p.dct[DCT_4x4] = x265_dct4_sse2;
 +p.idct[IDCT_4x4] = x265_idct4_sse2;
 +p.idct[IDST_4x4] = x265_idst4_sse2;
  }
  if (cpuMask  X265_CPU_SSSE3)
  {
 @@ -740,9 +744,12 @@

  SETUP_INTRA_ANG32(2, 2, ssse3);
  SETUP_INTRA_ANG32(34, 2, ssse3);
 +
 +p.dct[DST_4x4] = x265_dst4_ssse3;
  }
  if (cpuMask  X265_CPU_SSE4)
  {
 +p.dct[DCT_8x8] = x265_dct8_sse4;
  p.cvt16to32_shl = x265_cvt16to32_shl_sse4;

  p.intra_pred[BLOCK_4x4][0] = x265_intra_pred_planar4_sse4;
 diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/const-a.asm
 --- a/source/common/x86/const-a.asm Fri Feb 14 02:30:52 2014 -0600
 +++ b/source/common/x86/const-a.asm Fri Feb 14 16:10:41 2014 +0530
 @@ -72,6 +72,8 @@

  const pd_1,times 4 dd 1
  const pd_2,times 4 dd 2
 +const pd_4,times 4 dd 4
 +const pd_8,times 4 dd 8
  const pd_16,   times 4 dd 16
  const pd_32,   times 4 dd 32
  const pd_64,   times 4 dd 64
 diff -r ed310b17ff66 -r 831536babdc0 source/common/x86/dct8.asm
 --- a/source/common/x86/dct8.asmFri Feb 14 02:30:52 2014 -0600
 +++ b/source/common/x86/dct8.asmFri Feb 14 16:10:41 2014 +0530
 @@ -67,6 +67,10 @@

  cextern pd_1
  cextern pd_2
 +cextern pd_4
 +cextern pd_8
 +cextern pd_16
 +cextern pd_32
  cextern pd_64
  cextern pd_128
  cextern pd_256
 @@ -79,6 +83,15 @@
  ;--
  INIT_XMM sse2
  cglobal dct4, 3, 4, 8
 +%if BIT_DEPTH == 10
 +  %define   DCT_SHIFT 3
 +  mova  m7, [pd_4]
 +%else if BIT_DEPTH == 8


%elif BIT_DEPTH == 8


 +  %define   DCT_SHIFT 1
 +  mova  m7, [pd_1]
 +%else
 +  %error Unsupported BIT_DEPTH!
 +%endif

  add r2d, r2d
  lea r3, [tab_dct4]
 @@ -87,8 +100,6 @@
  movam5, [r3 + 1 * 16]
  movam6, [r3 + 2 * 16]

 -movam7, [pd_1]
 -
  movhm0, [r0 + 0 * r2]
  movhm1, [r0 + 1 * r2]
  punpcklqdq  m0, m1
 @@ -110,11 +121,11 @@

  pmaddwd m0, m1, m4
  paddd   m0, m7
 -psrad   m0, 1
 +psrad   m0, DCT_SHIFT

  pmaddwd m3, m2, m5
  paddd   m3, m7
 -psrad   m3, 1
 +psrad   m3, DCT_SHIFT

  packssdwm0, m3
  pshufd  m0, m0, 0xD8
 @@ -122,11 +133,11 @@

  pmaddwd m1, m6
  paddd   m1, m7
 -psrad   m1, 1
 +psrad   m1, DCT_SHIFT

  pmaddwd m2, [r3 + 3 * 16]
  paddd   m2, m7
 -psrad   m2, 1
 +psrad   m2, DCT_SHIFT

  packssdwm1, m2
  pshufd  m1, m1, 0xD8
 @@ -179,7 +190,7 @@
%define IDCT4_OFFSET  [pd_512]
%define IDCT4_SHIFT   10
  %else
 -  %error Unsupport BIT_DEPTH!
 +  %error Unsupported BIT_DEPTH!
  %endif
  add r2d, r2d
  lea r3, [tab_dct4]
 @@ -268,25 +279,28 @@
  INIT_XMM ssse3
  %if ARCH_X86_64
  cglobal dst4, 3, 4, 8+2
 +  %define   coef2   m8
 +  %define   coef3   m9
  %else ; ARCH_X86_64 = 0
  cglobal dst4, 3, 4, 8
 +  %define   coef2   [r3 + 2 * 16]
 +  %define   coef3   [r3 + 3 * 16]
  %endif ; ARCH_X86_64

 -%define coef0   m6
 -%define coef1   m7
 -%if ARCH_X86_64
 -%define coef2   m8
 -%define coef3   m9
 -%else
 -%define coef2   [r3 + 2 * 16]
 -%define coef3   [r3 + 3 * 16]
 -%endif
 +%define coef0   m6
 +%define coef1   m7
 +
 +%if BIT_DEPTH == 8
 +  %define   DST_SHIFT 1
 +  mova  m5, [pd_1]
 +%else if BIT_DEPTH == 10


%elif BIT_DEPTH == 10, there's one more of these below


 +  %define   DST_SHIFT 3
 +  mova  m5, [pd_4]
 +%endif

  add r2d, r2d
  lea r3, [tab_dst4]

 -movam5, [pd_1]
 -
  movacoef0, [r3 + 0 * 16]
  movacoef1, [r3 + 1 * 16]
 

Re: [x265] [PATCH] asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4 primitives

2014-02-14 Thread Steve Borho
On Fri, Feb 14, 2014 at 12:39 PM, Steve Borho st...@borho.org wrote:




 On Fri, Feb 14, 2014 at 4:41 AM, dnyanesh...@multicorewareinc.com wrote:

 # HG changeset patch
 # User Dnyaneshwar G dnyanesh...@multicorewareinc.com
 # Date 1392374441 -19800
 #  Fri Feb 14 16:10:41 2014 +0530
 # Node ID 831536babdc08f1553a10754bf2a4f4af6aa1695
 # Parent  ed310b17ff6681f191c85341cf6efe7a50770143
 asm: added 16bpp support for dct[4x4, 8x8], idct4x4, dst4x4 and idst4x4
 primitives


 with this patch applied, if I fixup the elif problems, I get occasional
 dequant test failures on 8bpp mac.

 steve@zeppelin ./test/TestBench

 Using random seed 52FE6216 8bpp

 Testing primitives: SSE2

 Testing primitives: SSE3

 Testing primitives: SSSE3

 Testing primitives: SSE4

 dequant: Failed!


Sorry, the dequant test failures appear to be caused by Murugan's testbench
changes.  I'm dequeuing those as well until we understand why the test is
failing.

--
Steve Borho
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel