Re: [x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread Dnyaneshwar Gorade
Okay. Will check IACA report and try pxor for m0 and buffer 1023. On Mon, Jun 22, 2015 at 8:24 PM, chen chenm...@163.com wrote: right some comment: 'psignb X, [pb_128]' equal to 'psubb X, 0, X', in AVX2, second type faster, in SSE4, choice depends on IACA report in PMINSW, you buffer ZERO

Re: [x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread chen
right some comment: 'psignb X, [pb_128]' equal to 'psubb X, 0, X', in AVX2, second type faster, in SSE4, choice depends on IACA report in PMINSW, you buffer ZERO into M0, and use pw_1023 directly, could you try buffer pw_1023 and use PXOR to get ZERO? At 2015-06-22

[x265] [PATCH] cmake: introduce multilib support for MSVC

2015-06-22 Thread Steve Borho
# HG changeset patch # User Min Chen # Date 1434999426 18000 # Mon Jun 22 13:57:06 2015 -0500 # Node ID 2cdab8d3b76066b18f32d1aa13c17fe50f9fa289 # Parent 83a7d824442455ba5e0a6b53ea68e6b7043845de cmake: introduce multilib support for MSVC Note, the multilib configuration exposes a bug in

[x265] [PATCH 4 of 4] asm: improve AVX2 sad_x4[32xN] by new faster algorithm

2015-06-22 Thread Min Chen
# HG changeset patch # User Min Chen chenm...@163.com # Date 1435019994 25200 # Node ID e3c31f11936b5e915ec773200e4c1b1d8db2730f # Parent 965b507acd52ad89160723253b770ef0036c71a5 asm: improve AVX2 sad_x4[32xN] by new faster algorithm Old: sad_x4[32x32] 41.91x 2379.45 99724.21

[x265] [PATCH 2 of 4] reduce shift operator on coeff remain code

2015-06-22 Thread Min Chen
# HG changeset patch # User Min Chen chenm...@163.com # Date 1435019988 25200 # Node ID 64167d1ad6d81c6c2d2ab762592d131860825fe9 # Parent b3915bc1febdeb620ecbb0fdef5583fa6c73c44e reduce shift operator on coeff remain code --- source/encoder/entropy.cpp |7 +-- 1 files changed, 5

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Stephen Hutchinson
On Mon, Jun 22, 2015 at 5:46 AM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: I don't think the second error that was reported on XP with no-assembly has been fixed. Actually, the second part of issue #146 isn't Windows-specific at all, XP support enabled or not. It happens with any

[x265] [PATCH] cpu: fix multilib compiling for some rarer build options

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435013054 18000 # Mon Jun 22 17:44:14 2015 -0500 # Node ID dfdf378a3968a15a1465a3aa3098e507fb4f10e5 # Parent 76015a49e45cf979da0380382a31f11758cae26e cpu: fix multilib compiling for some rarer build options diff -r 76015a49e45c

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Steve Borho
On 06/22, Stephen Hutchinson wrote: On Mon, Jun 22, 2015 at 5:46 AM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: I don't think the second error that was reported on XP with no-assembly has been fixed. Actually, the second part of issue #146 isn't Windows-specific at all, XP

[x265] [PATCH 3 of 4] asm: AVX2 of SAD_x4[32xN]

2015-06-22 Thread Min Chen
# HG changeset patch # User Min Chen chenm...@163.com # Date 1435019991 25200 # Node ID 965b507acd52ad89160723253b770ef0036c71a5 # Parent 64167d1ad6d81c6c2d2ab762592d131860825fe9 asm: AVX2 of SAD_x4[32xN] AVX: sad_x4[32x32] 36.69x 2843.87 104330.24 sad_x4[32x16] 35.67x 1547.93

[x265] [PATCH 1 of 4] faster algorithm to calculate ctxSet in codeCoeffNxN()

2015-06-22 Thread Min Chen
# HG changeset patch # User Min Chen chenm...@163.com # Date 1435019985 25200 # Node ID b3915bc1febdeb620ecbb0fdef5583fa6c73c44e # Parent 83a7d824442455ba5e0a6b53ea68e6b7043845de faster algorithm to calculate ctxSet in codeCoeffNxN() --- source/encoder/entropy.cpp |5 ++--- 1 files changed,

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Mario *LigH* Rohkrämer
Am 22.06.2015, 11:46 Uhr, schrieb Deepthi Nandakumar deep...@multicorewareinc.com: Ah, this was a typo. OK, with the fix patch (name space identifier all upper case), compiling passes. -- Fun and success! Mario *LigH* Rohkrämer mailto:cont...@ligh.de

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Deepthi Nandakumar
Ah, this was a typo. We dont have a Windows XP machine in our test farm, so we're hitting blind. I don't think the second error that was reported on XP with no-assembly has been fixed. On Mon, Jun 22, 2015 at 3:12 PM, Mario *LigH* Rohkrämer cont...@ligh.de wrote: A lot of warnings in XP

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Mario *LigH* Rohkrämer
These are compilation errors. I don't have a machine with XP running either; I just compile on Windows 7 SP1 x64 in XhmikosR's older Win32 MSYS environment with the following calls: cmake -G MSYS Makefiles -DWINXP_SUPPORT:BOOL=TRUE ../../source cmake -G MSYS Makefiles

[x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Mario *LigH* Rohkrämer
A lot of warnings in XP compatible Win32 builds with GCC 4.8.2 (here the HIGH_BIT_DEPTH setup, the 8-bit fails too): + Scanning dependencies of target clean-generated Built target clean-generated -- cmake version 3.3.0-rc1 -- Detected x86 target processor -- Found Yasm 1.3.0 to build

[x265] [PATCH] cmake: make it more clear that the extra libs are only linked to the CLI

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435001649 18000 # Mon Jun 22 14:34:09 2015 -0500 # Node ID 537c5fd8f5d59f5a5f8b3c7a5616e856906eeb49 # Parent af4ab43b2cf0d240a76a100d5174c792ac50a393 cmake: make it more clear that the extra libs are only linked to the CLI this

Re: [x265] XP compatible x265_NS refactoring is incomplete

2015-06-22 Thread Stephen Hutchinson
On Mon, Jun 22, 2015 at 6:44 PM, Steve Borho st...@borho.org wrote: ninja? I didn't know that cmake generator worked on Windows It does, either provided through MSys2 or compiled on its own. It's not as pretty to look at/compact when executed under MSys2's shell, though. Although I was

[x265] [PATCH 6 of 8] doc: fix dither docs on api page (not applicable if internal bit depth != 8)

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435028287 18000 # Mon Jun 22 21:58:07 2015 -0500 # Node ID 98d42fd1c07c746ed4c4257b36ef5d05fb0dc8f3 # Parent 25390edc448aef5ff1732bb077d080f2d254e0d2 doc: fix dither docs on api page (not applicable if internal bit depth != 8)

[x265] [PATCH 7 of 8] doc: nit

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435028295 18000 # Mon Jun 22 21:58:15 2015 -0500 # Node ID 05165a2be4f86a5454aa9da4ab7b8f86ad8a7764 # Parent 98d42fd1c07c746ed4c4257b36ef5d05fb0dc8f3 doc: nit diff -r 98d42fd1c07c -r 05165a2be4f8 doc/reST/api.rst ---

[x265] [PATCH 2 of 8] clarify 8bit/10bit in code comments

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435022545 18000 # Mon Jun 22 20:22:25 2015 -0500 # Node ID 78c562d38eea28d7b99cbb8ff4a228440c50ca2c # Parent d5376bf40aeab3eece8504045cbc603a145788b5 clarify 8bit/10bit in code comments diff -r d5376bf40aea -r 78c562d38eea

[x265] [PATCH 8 of 8] doc: attempt to document statically linked multi-library implications

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435028560 18000 # Mon Jun 22 22:02:40 2015 -0500 # Node ID 21183e6f9444662cccd84b4309cb9bc4889a65dc # Parent 05165a2be4f86a5454aa9da4ab7b8f86ad8a7764 doc: attempt to document statically linked multi-library implications This

[x265] [PATCH 3 of 8] param: clarify 8bit/10bit in logs

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435022674 18000 # Mon Jun 22 20:24:34 2015 -0500 # Node ID 901fc6837a25aca531c9056ed59a6e75126c4a53 # Parent 78c562d38eea28d7b99cbb8ff4a228440c50ca2c param: clarify 8bit/10bit in logs removes the internal bit-depth logging for

[x265] [PATCH 1 of 8] doc: replace 'bpp' in docs with 'bit' (do not imply pixels)

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435021884 18000 # Mon Jun 22 20:11:24 2015 -0500 # Node ID d5376bf40aeab3eece8504045cbc603a145788b5 # Parent dfdf378a3968a15a1465a3aa3098e507fb4f10e5 doc: replace 'bpp' in docs with 'bit' (do not imply pixels) diff -r

[x265] [PATCH 4 of 8] cmake: change msvc multilib build folders to 8bit/10bit

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435023159 18000 # Mon Jun 22 20:32:39 2015 -0500 # Node ID 0b36c393bb3a2ddaacad0c05dafd78aa3ca44682 # Parent 901fc6837a25aca531c9056ed59a6e75126c4a53 cmake: change msvc multilib build folders to 8bit/10bit diff -r 901fc6837a25

[x265] [PATCH 5 of 8] cmake: change multilib namespaces to x265_8bit/x265_10bit

2015-06-22 Thread Steve Borho
# HG changeset patch # User Steve Borho st...@borho.org # Date 1435023306 18000 # Mon Jun 22 20:35:06 2015 -0500 # Node ID 25390edc448aef5ff1732bb077d080f2d254e0d2 # Parent 0b36c393bb3a2ddaacad0c05dafd78aa3ca44682 cmake: change multilib namespaces to x265_8bit/x265_10bit diff -r

[x265] [PATCH 1 of 6] asm: 10bpp sse4 code for saoCuOrgE0, improved 8740c-974c, over C code

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434712676 -19800 # Fri Jun 19 16:47:56 2015 +0530 # Node ID a94e9a1f0fde08e060a9b52e3353ce2f242d9257 # Parent 83a7d824442455ba5e0a6b53ea68e6b7043845de asm: 10bpp sse4 code for saoCuOrgE0, improved 8740c-974c,

[x265] [PATCH 4 of 6] asm: 10bpp sse4 code for saoCuOrgE2

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434963191 -19800 # Mon Jun 22 14:23:11 2015 +0530 # Node ID f85c15cc0e1d70e63182b03e294c2778f598143d # Parent 558ffdc4e832061d99f1ec688fe1ae64db48642f asm: 10bpp sse4 code for saoCuOrgE2 Performance

[x265] [PATCH 6 of 6] asm: 10bpp sse4 code for saoCuOrgB0, improved 173346c-23127c over C code

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434973065 -19800 # Mon Jun 22 17:07:45 2015 +0530 # Node ID f282fc4d8915ec712f3915f387fc6018481fd467 # Parent a946a0178f57e65c02e6be2d9c6485c58658fe20 asm: 10bpp sse4 code for saoCuOrgB0, improved

[x265] [PATCH 5 of 6] asm: 10bpp sse4 code for saoCuOrgE3

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434969412 -19800 # Mon Jun 22 16:06:52 2015 +0530 # Node ID a946a0178f57e65c02e6be2d9c6485c58658fe20 # Parent f85c15cc0e1d70e63182b03e294c2778f598143d asm: 10bpp sse4 code for saoCuOrgE3 Performance

[x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread dnyaneshwar
SAO_EO_08.97x974.03 8740.81 SAO_EO_110.18x 492.67 5017.42 SAO_EO_1_2Rows 11.21x 900.82 10095.86 SAO_EO_2[0] 6.27x207.22 1298.92 SAO_EO_2[1] 8.92x555.20 4949.69 SAO_EO_3[0] 4.97x236.72 1177.29

[x265] [PATCH 2 of 6] asm: 10bpp sse4 code for saoCuOrgE1, improved 5017c-470c, over C code

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434948494 -19800 # Mon Jun 22 10:18:14 2015 +0530 # Node ID 1c02df66f093b5b1bacc8a1bbf9be2ef81591ad5 # Parent a94e9a1f0fde08e060a9b52e3353ce2f242d9257 asm: 10bpp sse4 code for saoCuOrgE1, improved 5017c-470c,

[x265] [PATCH] asm: pixelavg_pp[16xN] avx2 code for 10bpp

2015-06-22 Thread rajesh
# HG changeset patch # User Rajesh Paulrajraj...@multicorewareinc.com # Date 1434981872 -19800 # Mon Jun 22 19:34:32 2015 +0530 # Node ID d4c7638a0d5b842ca2657969b0f1a2bcd8a82d0b # Parent 83a7d824442455ba5e0a6b53ea68e6b7043845de asm: pixelavg_pp[16xN] avx2 code for 10bpp avx2: avg_pp[ 16x4]