[x265] [PATCH] arm: Implement filterPixelToShort ARM NEON asm

2016-03-01 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1456831820 -19800 # Tue Mar 01 17:00:20 2016 +0530 # Node ID 61e51faf9e7ee1c8056ac2f66cf51da104bfa106 # Parent 79c00b9bc2b81afef2e41526fc3c390528f3174c arm: Implement filterPixelToShort ARM NEON asm d

[x265] [PATCH 0 of 3 ] Patch series for new primitive pelFilterChroma and ASM code

2016-02-26 Thread dnyaneshwar
Speed up = pelFilterChroma_Vertical : 600c -> 300c pelFilterChroma_Horizontal : 585c -> 160c ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] [PATCH 1 of 3] asm: separated pelFilterChroma function into horizontal & vertical primitives for asm

2016-02-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1456466613 -19800 # Fri Feb 26 11:33:33 2016 +0530 # Node ID 5ff8ee940ad7f4d34b106ae4999b996245c87919 # Parent 01782e7f0a8cb93efbe4ff1534602ff9055c8565 asm: separated pelFilterChroma function into hori

[x265] [PATCH 3 of 3] asm: asm code for pelFilterLumaStrong_V/H & pelFilterChroma_V/H for main10 & main12

2016-02-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1456466696 -19800 # Fri Feb 26 11:34:56 2016 +0530 # Node ID d7d0c03b5e6e7fd0258d609ad5e9f4d7c0a40390 # Parent 59d9eca3d144e71f11d509a5dd40b634bb9ab500 asm: asm code for pelFilterLumaStro

[x265] [PATCH] arm: Implement pixel_ssd_s ARM NEON asm

2016-02-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1456136894 -19800 # Mon Feb 22 15:58:14 2016 +0530 # Node ID ed3dd1a26cb5801e306db8f1d4a52cd1f4d6620b # Parent 4a1b8f3c0c7385ff19fd61133e0af4464510e9aa arm: Implement pixel_ssd_s ARM NEON asm d

[x265] [PATCH] arm: Implement pixel_sse_ss ARM NEON asm

2016-02-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1456382751 -19800 # Thu Feb 25 12:15:51 2016 +0530 # Node ID 4a1b8f3c0c7385ff19fd61133e0af4464510e9aa # Parent 45c0dbd43dec24608199362a86bfba6ef91cacca arm: Implement pixel_sse_ss ARM NEON asm d

[x265] [PATCH] arm: Implement pixel_sse_pp ARM NEON asm

2016-02-18 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1455794242 -19800 # Thu Feb 18 16:47:22 2016 +0530 # Node ID 5e4593ef30cc4bccc5eec2a0109b8dff397e5c93 # Parent b31fa1a4ef43697e163d17dda0f4650de45d6ff9 arm: Implement pixel_sse_pp ARM NEON asm d

[x265] [PATCH] arm: Implement pixel_var ARM NEON asm

2016-02-18 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1455793621 -19800 # Thu Feb 18 16:37:01 2016 +0530 # Node ID b31fa1a4ef43697e163d17dda0f4650de45d6ff9 # Parent cb8769b5ea70304d658173e02deb254fb8572bd6 arm: Implement pixel_var ARM NEON asm d

[x265] [PATCH] arm: Implement sad_x3 and sad_x4 ARM NEON asm

2016-02-15 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1455598958 -19800 # Tue Feb 16 10:32:38 2016 +0530 # Node ID ac6c535109a43e9cdb69f30db1143c06400a19f4 # Parent e3902c96c3c268ec4ab1a4976ee2feae7348b36f arm: Implement sad_x3 and sad_x4 ARM NEON asm d

Re: [x265] [PATCH] arm: Implement blockcopy_pp_NxN_neon

2016-02-11 Thread Dnyaneshwar Gorade
On Thu, Feb 11, 2016 at 5:30 PM, chen wrote: > > At 2016-02-11 17:54:45,radhakrish...@multicorewareinc.com wrote: > ># HG changeset patch > ># User radhakrish...@multicorewareinc.com > ># Date 1455183020 -19800 > ># Thu Feb 11 15:00:20 2016 +0530 > ># Node ID

[x265] [PATCH] threadpool: utilize all processors on embedded ARM platforms

2016-02-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1455010589 -19800 # Tue Feb 09 15:06:29 2016 +0530 # Node ID 18b83aaee1b56e2048a425c25a452aa62c39da89 # Parent 023e6051c4c63ab1633b2de0e8f37e6158796288 threadpool: utilize all processors on embedd

[x265] [PATCH] arm: Implement blockcopy_pp_16x16_neon. Modified include guards with ARM suffix

2016-02-02 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1454410744 -19800 # Tue Feb 02 16:29:04 2016 +0530 # Node ID 5463e2b9f37e4952bb16e94673c6fd2991243145 # Parent dc62b47dd0d98f732165345883edac55320baec1 arm: Implement blockcopy_pp_16x16_neon. Modified i

[x265] [PATCH] arm: Implement blockcopy_pp_16x16_neon. Modified include guards with ARM suffix

2016-02-01 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1454327470 -19800 # Mon Feb 01 17:21:10 2016 +0530 # Node ID 894e0fce5d14844d3c85cdb2a287f302fc8cffca # Parent dc62b47dd0d98f732165345883edac55320baec1 arm: Implement blockcopy_pp_16x16_neon. Modified i

[x265] [PATCH] testbench: port x264 stack & register check code for ARM arch

2016-01-27 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1453891819 -19800 # Wed Jan 27 16:20:19 2016 +0530 # Node ID 14c4806a24eb277d31fa77c1c906838ffcb62395 # Parent f548abe8eae8fb75513a85d1b09233e706c7b5ba testbench: port x264 stack & register check co

[x265] [PATCH] testbench: port x264 stack & register check code for ARM arch

2016-01-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1453872887 -19800 # Wed Jan 27 11:04:47 2016 +0530 # Node ID f98483674435cdb5cbd7acb655ee217feffdf976 # Parent f548abe8eae8fb75513a85d1b09233e706c7b5ba testbench: port x264 stack & register check co

[x265] [PATCH 2 of 2] asm: improved intra_ang8x8 modes 3 to 17 AVX2 asm over 20% than previous AVX2 asm

2016-01-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1449813841 -19800 # Fri Dec 11 11:34:01 2015 +0530 # Node ID ee47dd944e08ebb49fd54114979c65dadabfe0df # Parent 593a1907e915c9bad7bd3ff608a30770289c249a asm: improved intra_ang8x8 modes 3 to 17 AVX2 as

[x265] [PATCH 1 of 2] asm: move common constants into const-a.asm, remove unused constants

2016-01-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1449894370 -19800 # Sat Dec 12 09:56:10 2015 +0530 # Node ID 593a1907e915c9bad7bd3ff608a30770289c249a # Parent a5309338d1352978e79da6210a0d64eb88d60c8f asm: move common constants into const-a.asm,

[x265] [PATCH] testbench: setup testbench for ARM assembly

2016-01-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1452327300 -19800 # Sat Jan 09 13:45:00 2016 +0530 # Node ID a5309338d1352978e79da6210a0d64eb88d60c8f # Parent d94f6c2b45f87f5b4b10b4fa70f8a9bd03d3d1c2 testbench: setup testbench for ARM assembl

[x265] [PATCH] testbench: setup testbench for ARM assembly

2016-01-08 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1452321658 -19800 # Sat Jan 09 12:10:58 2016 +0530 # Node ID cd9318b1671bb24212321fcd005381e50642af4c # Parent d94f6c2b45f87f5b4b10b4fa70f8a9bd03d3d1c2 testbench: setup testbench for ARM assembl

Re: [x265] [PATCH] testbench: setup testbench for ARM assembly

2016-01-08 Thread Dnyaneshwar Gorade
Please ignore this patch. need little modifications. On Sat, Jan 9, 2016 at 12:12 PM, <dnyanesh...@multicorewareinc.com> wrote: > # HG changeset patch > # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> > # Date 1452321658 -19800 > # Sat Jan 09 12:10:5

[x265] [PATCH] enable arm-linux cross compile build

2016-01-05 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1450516372 -19800 # Sat Dec 19 14:42:52 2015 +0530 # Node ID d4de155912366fb831021c9f6a0fde6757a168d7 # Parent 25f78ff3d8efaa1e9d85bc3e718c887ec9afa557 enable arm-linux cross compile build d

[x265] [PATCH 2 of 3] asm: psyCost_pp avx2 asm code for main12

2015-12-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1448963172 -19800 # Tue Dec 01 15:16:12 2015 +0530 # Node ID 9357c1f448a7b987cebfd3cc5542cc6c65e63fe2 # Parent e2b07541670331ab0cd94b5f312f8f7cac893f92 asm: psyCost_pp avx2 asm code for main12 psy_c

[x265] [PATCH 1 of 3] asm: SA8D avx2 asm code for main12

2015-12-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar Gorade <gorad...@gmail.com> # Date 1449647037 -19800 # Wed Dec 09 13:13:57 2015 +0530 # Node ID e2b07541670331ab0cd94b5f312f8f7cac893f92 # Parent b80087c9bf25697c3d354d732323fc895a2ca11f asm: SA8D avx2 asm code for main12 sa8d[ 8x8] 4.70x

[x265] [PATCH 3 of 3] asm: fix dct[8x8] AVX2 asm for main12

2015-12-09 Thread dnyaneshwar
# HG changeset patch # User Aasaipriya Chandran # Date 1449648215 -19800 # Wed Dec 09 13:33:35 2015 +0530 # Node ID 9e3f71d784e59527a14702e83de474bc3f12fd15 # Parent 9357c1f448a7b987cebfd3cc5542cc6c65e63fe2 asm: fix dct[8x8] AVX2 asm for main12 diff -r

[x265] [PATCH] asm: move common constants into const-a.asm, remove unused constants

2015-12-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1449723720 -19800 # Thu Dec 10 10:32:00 2015 +0530 # Node ID ff08c87f20a7f3f36bfb0849bd2d10fc1f8da465 # Parent 33d04da2f68830ac51151cfbda8f38fb9a7e8bb9 asm: move common constants into const-a.asm,

Re: [x265] [PATCH 1 of 2] asm: SA8D avx2 asm code for main12

2015-12-08 Thread Dnyaneshwar Gorade
Thanks, Min. I am re-sending these two patches with the above modifications. On Wed, Dec 2, 2015 at 8:57 PM, chen <chenm...@163.com> wrote: > I suggest just keep one name of sa8d_avx2 > > At 2015-12-02 12:31:59,"Dnyaneshwar Gorade" < > dnyanesh...@multicorew

Re: [x265] [PATCH 1 of 2] asm: SA8D avx2 asm code for main12

2015-12-01 Thread Dnyaneshwar Gorade
t; ># HG changeset patch > ># User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> > ># Date 1448962785 -19800 > ># Tue Dec 01 15:09:45 2015 +0530 > ># Node ID f8b0ce4e9f4092a38d8095961825e734a34f112e > ># Parent e2e507ffe752d6c193a219b242c433bdc55f39f

[x265] [PATCH 2 of 2] asm: psyCost_pp avx2 asm code for main12

2015-12-01 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1448963172 -19800 # Tue Dec 01 15:16:12 2015 +0530 # Node ID dbc004801f4734ba048a451d779c1c9c82f1b6ac # Parent f8b0ce4e9f4092a38d8095961825e734a34f112e asm: psyCost_pp avx2 asm code for main12 psy_c

[x265] [PATCH 1 of 2] asm: SA8D avx2 asm code for main12

2015-12-01 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1448962785 -19800 # Tue Dec 01 15:09:45 2015 +0530 # Node ID f8b0ce4e9f4092a38d8095961825e734a34f112e # Parent e2e507ffe752d6c193a219b242c433bdc55f39f7 asm: SA8D avx2 asm code for main12 sa8d[ 8x8]

[x265] [PATCH] use 32-bits multiply in mbtree_propagate_cost to avoid intraCost overflow

2015-11-24 Thread dnyaneshwar
# HG changeset patch # User Min Chen # Date 1447865933 21600 # Wed Nov 18 10:58:53 2015 -0600 # Node ID d4e8af415c2ea939f1c82cf2dc1561fee20847de # Parent ad15f3756ad888b99a4ba868b857e09909dae226 use 32-bits multiply in mbtree_propagate_cost to avoid intraCost overflow

[x265] [PATCH] asm: fix inconsistent crash due to unaligned NR buffer in denoiseDct SSE4 asm

2015-11-18 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1447829883 -19800 # Wed Nov 18 12:28:03 2015 +0530 # Node ID 653430a3de3f9ba342922ee6ea46d4cf52c1eb39 # Parent e8f9a60d4cd9e73c9f2baf05c2ccda5af1892b46 asm: fix inconsistent crash due to unaligned NR

[x265] [PATCH] asm: fix output change due to overflow in mbtree_propagate_cost 10bit asm

2015-11-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1447828315 -19800 # Wed Nov 18 12:01:55 2015 +0530 # Node ID 58c177d2e182e5b633670024c567b535eb49614f # Parent e8f9a60d4cd9e73c9f2baf05c2ccda5af1892b46 asm: fix output change due to ov

[x265] [PATCH] asm: fix intrapred_planar16x16 SSE4 code for main12

2015-11-04 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1446700839 -19800 # Thu Nov 05 10:50:39 2015 +0530 # Node ID 69bd13c0047d2c1a3b232bea40b72e436baa618e # Parent 3103afbd31fa9b26533f06202516a511ee221439 asm: fix intrapred_planar16x16 SSE4 code for

[x265] [PATCH] asm: fix mbtree_propagate_cost asm failure, fixes crash in OpenBSD

2015-11-04 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1446645042 -19800 # Wed Nov 04 19:20:42 2015 +0530 # Node ID 25bada1bb5494fc12d62e87d1b7b788307dd963f # Parent c11dd97a8b999414c60dceef8620d3d9055cf4c1 asm: fix mbtree_propagate_cost asm failure, fixes

Re: [x265] [PATCH] fix invalid Instruction Set provided in CLI if CPU doesn't support it

2015-11-02 Thread Dnyaneshwar Gorade
.org> wrote: > >> On 10/28, dnyanesh...@multicorewareinc.com wrote: >> > # HG changeset patch >> > # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> >> > # Date 1446021877 -19800 >> > # Wed Oct 28 14:14:37 2015 +0530 >> > # Node ID 975087370d14e9

[x265] [PATCH] fix invalid Instruction Set provided in CLI if CPU doesn't support it

2015-10-28 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1446021877 -19800 # Wed Oct 28 14:14:37 2015 +0530 # Node ID 975087370d14e90cd63edecb34fb4bf2feda2468 # Parent 6563218ce342c30bfd4f9bc172a1dab510e6e55b fix invalid Instruction Set provided in CLI

[x265] [PATCH] asm: fix intrapred_planar16x16 sse4 code for main12

2015-10-23 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1445588852 -19800 # Fri Oct 23 13:57:32 2015 +0530 # Node ID 0fb5a67c2f5ea4f3fe1a7e0dcbc0c5c117dd6dfc # Parent a7251c3e0ef810b95bb25be5371035208e36996d asm: fix intrapred_planar16x16 sse4 code for

Re: [x265] [PATCH] asm: fix intrapred_planar16x16 sse4 code for main12

2015-10-22 Thread Dnyaneshwar Gorade
​​ On Wed, Oct 21, 2015 at 7:58 AM, chen <chenm...@163.com> wrote: > > > At 2015-10-20 18:38:56,dnyanesh...@multicorewareinc.com wrote: > ># HG changeset patch > ># User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> > ># Date 1445337446 -19800 &

[x265] [PATCH] asm: fix intrapred_planar16x16 sse4 code for main12

2015-10-20 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1445337446 -19800 # Tue Oct 20 16:07:26 2015 +0530 # Node ID 987b5f8c2c447dc5b0e410d37f6212470feecd1c # Parent f335a9a7b9083dcb2fc7a1cadc2dbeffdd6388f2 asm: fix intrapred_planar16x16 sse4 code for

[x265] [PATCH] asm: fix intrapred_planar16x16 sse4 code for main12

2015-10-19 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1445245458 -19800 # Mon Oct 19 14:34:18 2015 +0530 # Node ID 76d4fc7264a0d22218db30f65bb58095c294db1b # Parent 04575a459a160162391fcf1a12e8e6f2e81e95b4 asm: fix intrapred_planar16x16 sse4 code for

[x265] [PATCH] multilib: fix multiple definition of pelFilterLumaStrong_c

2015-10-15 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1444972708 -19800 # Fri Oct 16 10:48:28 2015 +0530 # Node ID 76a36eabd4be405fc4880d882499a754c3f190fa # Parent fe65544b6c40d7cd62c2b86275bf98b264b6edb0 multilib: fix multiple defi

[x265] [PATCH 2 of 2] asm: asm code for deblocking filter horizontal and vertical

2015-10-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1444286180 -19800 # Thu Oct 08 12:06:20 2015 +0530 # Node ID 86627e458e6e2e357fe1746067392c6984b8915f # Parent 38e4b94377fa6ffe57472c49ecff6c909ed4f6dc asm: asm code for deblocking filter hori

[x265] [PATCH 1 of 2] asm: separated deblocking filter into horizontal & vertical primitives for asm

2015-10-09 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1444121396 -19800 # Tue Oct 06 14:19:56 2015 +0530 # Node ID 38e4b94377fa6ffe57472c49ecff6c909ed4f6dc # Parent f8ad1ff7074aab85a6cf376886014c88f46b7275 asm: separated deblocking filter into hori

[x265] [PATCH] add 64-byte alignment macro, align NR buffer & Encoder class to cache line of 64-byte

2015-10-05 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1444107449 -19800 # Tue Oct 06 10:27:29 2015 +0530 # Node ID 93525c471023575d500c912284a3853ee8df8991 # Parent f8b8ebdc54578e6735216d8b9abce5ba80c05bd8 add 64-byte alignment macro, align NR buffer &am

[x265] [PATCH] asm: avx2 code for sad_x3_32xN, improved over 40% than SSE

2015-09-24 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1443156551 -19800 # Fri Sep 25 10:19:11 2015 +0530 # Node ID 310d35ed0ba85174676d0b0bb91e6b8b5f475726 # Parent 975352b2c0223b9139aad233b43eaf2113ac8167 asm: avx2 code for sad_x3_32xN, improved over 40

Re: [x265] How can I enable the AVX2 version of DCT and IDCT?

2015-09-16 Thread Dnyaneshwar Gorade
mand prompt output if cpu capabilities info shows AVX2 instruction set or not. You can get the source code of DCT AVX2 functions in dct8.asm file. Regards, Dnyaneshwar On Wed, Sep 16, 2015 at 6:25 PM, Ximing Cheng <chengximing1...@gmail.com> wrote: > I read the source code of the /source/c

[x265] [PATCH 1 of 3] asm: AVX2 code for pixel_var primitive, improved over 40% than SSE

2015-09-10 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1441715051 -19800 # Tue Sep 08 17:54:11 2015 +0530 # Node ID 89c234e68523b05550b8c5197b83849544dc97d1 # Parent 365f7ed4d89628d49cd6af8d81d4edc01f73ffad asm: AVX2 code for pixel_var primitive, improved o

[x265] [PATCH 2 of 3] asm: avx2 code for sad_x3_32xN, improved over 40% than SSE

2015-09-10 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1441885683 -19800 # Thu Sep 10 17:18:03 2015 +0530 # Node ID 5b5d7438e90196d7974b9ceec2130b6c924e2342 # Parent abab4304e992b7addb65ad8fbdfe309ba57732a6 asm: avx2 code for sad_x3_32xN, improved over 40

[x265] [PATCH 3 of 3] asm: avx2 code for sad_x3_64xN, improved over 40% than SSE

2015-09-10 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1441886472 -19800 # Thu Sep 10 17:31:12 2015 +0530 # Node ID d31b9e8bdcf4f5fac2e3f0c567f1c90c1d19a382 # Parent 5b5d7438e90196d7974b9ceec2130b6c924e2342 asm: avx2 code for sad_x3_64xN, improved over 40

[x265] [PATCH] asm: fix crash as NR buffer is not aligned to 16-byte boundry

2015-09-10 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1441865435 -19800 # Thu Sep 10 11:40:35 2015 +0530 # Node ID abab4304e992b7addb65ad8fbdfe309ba57732a6 # Parent 89c234e68523b05550b8c5197b83849544dc97d1 asm: fix crash as NR buffer is not aligned to 1

[x265] [PATCH 1 of 2] asm: avx2 asm for intra_ang32 mode 16 & 20

2015-09-07 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G <dnyanesh...@multicorewareinc.com> # Date 1441085487 -19800 # Tue Sep 01 11:01:27 2015 +0530 # Node ID 3238ecbdbdf551a69bcd0dfdf8391f6462db45ac # Parent e1adac00dce8e5641cbe9aec3d50a72261c308d9 asm: avx2 asm for intra_ang32 mode 16 & 2

[x265] [PATCH] asm: fix dynamic range of input to quant primitive

2015-08-27 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1440736935 -19800 # Fri Aug 28 10:12:15 2015 +0530 # Node ID dce85f739efeea842e490a0f555d4abdc89a5c80 # Parent 905c4f2e203ec082bd50b361865a7d4d297e45ce asm: fix dynamic range of input to quant primitive diff

[x265] [PATCH 5 of 7] asm: avx2 asm for intra_ang32 mode 12, 4758c-1474c

2015-08-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1440583211 -19800 # Wed Aug 26 15:30:11 2015 +0530 # Node ID cb3f520f9942080d05ca1b3ba2cae0c1b4bcb345 # Parent a27ac3b998f5677570a48285d22e1b771c08ab75 asm: avx2 asm for intra_ang32 mode 12, 4758c-1474c updated

[x265] [PATCH 7 of 7] asm: avx2 asm for intra_ang32 mode 14, 5600c-1400c

2015-08-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1440583506 -19800 # Wed Aug 26 15:35:06 2015 +0530 # Node ID 40ae6c49fa489dc995f78d93a35b441639e0847d # Parent 00b26e64fd2c42bcb9652668721f6953d8f2eb0f asm: avx2 asm for intra_ang32 mode 14, 5600c-1400c updated

[x265] [PATCH 3 of 7] asm: avx2 asm for intra_ang32 mode 11, 4550c-1326c

2015-08-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1440479904 -19800 # Tue Aug 25 10:48:24 2015 +0530 # Node ID 630bae9a91392fdf9a327673f7c00eeedf60139f # Parent 0409b136c208cb944fb76bfd400e76ba43e330a8 asm: avx2 asm for intra_ang32 mode 11, 4550c-1326c

[x265] [PATCH] asm: avx2 asm for intra_ang32 mode 15, 5700c-1600c

2015-08-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1440650636 -19800 # Thu Aug 27 10:13:56 2015 +0530 # Node ID 905c4f2e203ec082bd50b361865a7d4d297e45ce # Parent 40ae6c49fa489dc995f78d93a35b441639e0847d asm: avx2 asm for intra_ang32 mode 15, 5700c-1600c updated

Re: [x265] [PATCH] asm: disabled 10bpp AVX AVX2 primitives having less than 10% speed up over SSE

2015-08-19 Thread Dnyaneshwar Gorade
right.. you can send it to mailing list On Wed, Aug 19, 2015 at 3:26 PM, aasaipr...@multicorewareinc.com wrote: # HG changeset patch # User Aasaipriya Chandran aasaipr...@multicorewareinc.com # Date 1439972978 -19800 # Wed Aug 19 13:59:38 2015 +0530 # Node ID

Re: [x265] [PATCH] asm: disabled 10bpp AVX AVX2 primitives having less than 3% speed up over SSE

2015-08-18 Thread Dnyaneshwar Gorade
right.. but small correction - in #if 0 #endif disable only specific primitives and not all sizes (expand the macro keep only less than 3%) On Tue, Aug 18, 2015 at 12:05 PM, aasaipr...@multicorewareinc.com wrote: # HG changeset patch # User Aasaipriya Chandran

Re: [x265] [PATCH 1 of 5] asm: AVX2 asm for intra_ang_32 mode 9, improved over 40% than SSE asm

2015-08-18 Thread Dnyaneshwar Gorade
? At 2015-08-18 12:11:35,dnyanesh...@multicorewareinc.com wrote: # HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439531917 -19800 # Fri Aug 14 11:28:37 2015 +0530 # Node ID 5ed23f786ea8f98e003189a537f960e4ff16201f # Parent

[x265] [PATCH 3 of 5] asm: avx2 asm for intra_ang32 mode 11, 4550c-1326c

2015-08-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439812025 -19800 # Mon Aug 17 17:17:05 2015 +0530 # Node ID 43c9ec65927666db1316efe63d112bd8f9cb5f35 # Parent 8752daab2f07711c556dfffa9a733b7278484479 asm: avx2 asm for intra_ang32 mode 11, 4550c-1326c diff

[x265] [PATCH 2 of 5] asm: AVX2 asm for intra_ang_32 mode 10, 816c-452c

2015-08-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439557064 -19800 # Fri Aug 14 18:27:44 2015 +0530 # Node ID 8752daab2f07711c556dfffa9a733b7278484479 # Parent 5ed23f786ea8f98e003189a537f960e4ff16201f asm: AVX2 asm for intra_ang_32 mode 10, 816c-452c diff -r

[x265] [PATCH 1 of 5] asm: AVX2 asm for intra_ang_32 mode 9, improved over 40% than SSE asm

2015-08-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439531917 -19800 # Fri Aug 14 11:28:37 2015 +0530 # Node ID 5ed23f786ea8f98e003189a537f960e4ff16201f # Parent 996ebce8c874fc511d495cee227d24413e99d0c1 asm: AVX2 asm for intra_ang_32 mode 9, improved over 40

Re: [x265] [PATCH] asm: disabled 10bpp AVX AVX2 primitives having less than 3% speed up over SSE

2015-08-17 Thread Dnyaneshwar Gorade
merge earlier patch (asm: disabled 10bpp AVX) into this one and send again to avoid confusion. 2015-08-17 17:44 GMT+05:30 aasaipr...@multicorewareinc.com: # HG changeset patch # User Aasaipriya Chandran aasaipr...@multicorewareinc.com # Date 1439813601 -19800 # Mon Aug 17 17:43:21

[x265] [PATCH 5 of 5] asm: optimized intra_ang16 mode 11 avx2 asm, 520c-370c

2015-08-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439816850 -19800 # Mon Aug 17 18:37:30 2015 +0530 # Node ID 6ff0bcad1688f5ee1e393c648739ed2ae7e79b61 # Parent e75f3a2f1d29f01ca2d71f1b8be970d471b5e1f6 asm: optimized intra_ang16 mode 11 avx2 asm, 520c-370c

[x265] [PATCH 4 of 5] asm: updated intra_ang_32 mode 25 AVX2 asm code, 1300c-1184c

2015-08-17 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439812477 -19800 # Mon Aug 17 17:24:37 2015 +0530 # Node ID e75f3a2f1d29f01ca2d71f1b8be970d471b5e1f6 # Parent 43c9ec65927666db1316efe63d112bd8f9cb5f35 asm: updated intra_ang_32 mode 25 AVX2 asm code, 1300c

Re: [x265] [PATCH 3 of 4] asm: fix bug in macro vpbroadcastd for case ymm, xmm

2015-08-13 Thread Dnyaneshwar Gorade
%2 movd %1 %+ xmm, %2 ; case vpbroadcastd ymm, rN vpbroadcastd %1, %1 %+ xmm %else vpbroadcastd %1, %2 ; case vpbroadcastd ymm, [memory addr] %endif %endmacro Thanks, Dnyaneshwar G On Thu, Aug 13, 2015 at 8:52 AM, Min Chen chenm...@163.com wrote: # HG

[x265] [PATCH 2 of 4] asm: AVX2 asm for intra_ang_32 mode 6, improved over 48% than SSE asm

2015-08-13 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439366905 -19800 # Wed Aug 12 13:38:25 2015 +0530 # Node ID 643a001494a42e65366cfa3e468cc0858955095f # Parent 07110baa95f1d53c8100929b16eafba3b16138d6 asm: AVX2 asm for intra_ang_32 mode 6, improved over 48

[x265] [PATCH 3 of 4] asm: AVX2 asm for intra_ang_32 mode 7, improved over 40% than SSE asm

2015-08-13 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439373105 -19800 # Wed Aug 12 15:21:45 2015 +0530 # Node ID c12d411014f68affea550ee640e26ba61f51e509 # Parent 643a001494a42e65366cfa3e468cc0858955095f asm: AVX2 asm for intra_ang_32 mode 7, improved over 40

[x265] [PATCH 1 of 4] asm: AVX2 asm for intra_ang_32 mode 5, improved over 48% than SSE asm

2015-08-13 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439297628 -19800 # Tue Aug 11 18:23:48 2015 +0530 # Node ID 07110baa95f1d53c8100929b16eafba3b16138d6 # Parent bc5a7c2ac38b06d2a232b983f10bc0394d252ad7 asm: AVX2 asm for intra_ang_32 mode 5, improved over 48

[x265] [PATCH] asm: AVX2 asm for intra_ang_32 mode 4, improved over 45% than SSE asm

2015-08-10 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1439209099 -19800 # Mon Aug 10 17:48:19 2015 +0530 # Branch stable # Node ID 1ae0654c996a3ccab15e384dc8a394c029094544 # Parent 4781e6cef251006db10e107b2916741572f7760a asm: AVX2 asm for intra_ang_32 mode 4

Re: [x265] [PATCH] asm: avx2 code for intra_ang_16 modes 3 33

2015-08-05 Thread Dnyaneshwar Gorade
This is new algorithm for intra_ang16x16. 1075 cycles - current AVX2 asm 827 cycles - new AVX2 asm (improved 23% over current avx2 asm) On Thu, Aug 6, 2015 at 10:41 AM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: Please be sure to mention what is the baseline - for instance, what is

[x265] [PATCH] asm: disabled AVX AVX2 primitives having less than 3% speed up over SSE

2015-08-05 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1438757401 -19800 # Wed Aug 05 12:20:01 2015 +0530 # Node ID 3eb2ec5922be1cd934dec7f7ed886d03c0125ef5 # Parent 3fa7f6838098854de79d3800b2d775dabaf45705 asm: disabled AVX AVX2 primitives having less than 3

[x265] [PATCH] asm: updated avx2 algorithm for copy_ps 32xN 64xN, improved over 45% than SSE asm

2015-08-05 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1438767554 -19800 # Wed Aug 05 15:09:14 2015 +0530 # Node ID 377a996a8d74110f838ff2e3cef1c42781d6d730 # Parent 3eb2ec5922be1cd934dec7f7ed886d03c0125ef5 asm: updated avx2 algorithm for copy_ps 32xN 64xN

[x265] [PATCH] asm: disabled AVX primitives having less than 3% speed up over SSE

2015-08-04 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1438669788 -19800 # Tue Aug 04 11:59:48 2015 +0530 # Node ID fc84f3731e2c9eafc8164361b67422732f811008 # Parent 2b89c446b404ed20c0316efaab5b1e088289c0b4 asm: disabled AVX primitives having less than 3% speed up

[x265] [PATCH] asm: avx2 code for pixelavg_pp 32xN 64xN, improved over 40% than SSE

2015-08-03 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1438596650 -19800 # Mon Aug 03 15:40:50 2015 +0530 # Node ID 43fe4ec1c13a2514030010c2cd699382b67f65cb # Parent a3b72e2a25a7fc544b1b76e872eda012035bf4ac asm: avx2 code for pixelavg_pp 32xN 64xN, improved over

[x265] [PATCH] main12: added lambda tables based based on qp values

2015-07-23 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1437640145 -19800 # Thu Jul 23 13:59:05 2015 +0530 # Node ID 0bdab1ab0e78684cbb3ecc4913e59d2b35b4e1b7 # Parent 42bc8575020b73d129d0bcef70c7cbe80a8b51df main12: added lambda tables based based on qp values diff

[x265] [PATCH] asm: fix linux build error- cannot override register size

2015-07-13 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1436771870 -19800 # Mon Jul 13 12:47:50 2015 +0530 # Node ID 96eaae96478a252f46736416248ec8dcba618c7d # Parent 7cb28662875630da90d85d62b01d58f4c51f7e32 asm: fix linux build error- cannot override register size

[x265] [PATCH 3 of 3] asm: sse4 code for saoCuStatsE1, improved 320369c-151086c

2015-07-07 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1436252372 -19800 # Tue Jul 07 12:29:32 2015 +0530 # Node ID 25a8323b886f480347f4b0813f7ded18e579704a # Parent 235930aae11da04863e3fb13905e2d1d95e3dc0a asm: sse4 code for saoCuStatsE1, improved 320369c-151086c

[x265] [PATCH 2 of 3] asm: sse4 code for saoCuStatsE0, improved 250341c-147284c

2015-07-07 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1436251628 -19800 # Tue Jul 07 12:17:08 2015 +0530 # Node ID 235930aae11da04863e3fb13905e2d1d95e3dc0a # Parent e0166f09f332af72a83eb059d878044db15f59bd asm: sse4 code for saoCuStatsE0, improved 250341c-147284c

[x265] [PATCH] asm: fix 32-bit build error- undefined symbol r7d, r8d

2015-07-06 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1436183156 -19800 # Mon Jul 06 17:15:56 2015 +0530 # Node ID 45e56ef3de405a3f9c6451b46b876e3dc46aac38 # Parent bf57ce5d38d5208a491bf4192e389ab1eb4a4f32 asm: fix 32-bit build error- undefined symbol r7d, r8d

Re: [x265] Compiling 8-bit Win32 target fails: 64 registers r7/r8 used

2015-07-06 Thread Dnyaneshwar Gorade
sent a fix patch. Yes, it was caused by %ARCH_X86_64 removal. On Mon, Jul 6, 2015 at 5:20 PM, Mario *LigH* Rohkrämer cont...@ligh.de wrote: Possibly after a line with a check %if ARCH_X86_64 was removed? Win32 non-HBD still allows ASM. + [ 8%] Building ASM_YASM object

[x265] [PATCH 3 of 3] sao: created new primitive for saoCuStatsBO

2015-07-02 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435749680 -19800 # Wed Jul 01 16:51:20 2015 +0530 # Node ID 9fd6c4bca7695f847ff9a28a065122b840ecae5a # Parent 915d02816797d3c70004e652a13b3804571c251b sao: created new primitive for saoCuStatsBO diff -r

[x265] [PATCH 2 of 3] sao: created new primitive for saoCuStatsE0

2015-07-02 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435749632 -19800 # Wed Jul 01 16:50:32 2015 +0530 # Node ID 915d02816797d3c70004e652a13b3804571c251b # Parent 18151ada638dd19843551e2a6d5d8b2cc9bd28be sao: created new primitive for saoCuStatsE0 diff -r

[x265] [PATCH 1 of 3] sao: created new primitive for saoCuStatsE1

2015-07-02 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435749564 -19800 # Wed Jul 01 16:49:24 2015 +0530 # Node ID 18151ada638dd19843551e2a6d5d8b2cc9bd28be # Parent 76a314f91799c2dce6878c389503d2fe9007dbe8 sao: created new primitive for saoCuStatsE1 diff -r

[x265] [PATCH] asm: intra_filter4x4 avx2 code, improved 8bit: 141c-118c, 10bit: 121c-88c

2015-06-30 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435663360 -19800 # Tue Jun 30 16:52:40 2015 +0530 # Node ID 9340454d3b551f57ba9ce6a3f77fade041975e62 # Parent b1301944894051b9641006797e4d6253b277f3e4 asm: intra_filter4x4 avx2 code, improved 8bit: 141c-118c

[x265] [PATCH] asm: intra_filter 10bpp sse4 code

2015-06-29 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435578547 -19800 # Mon Jun 29 17:19:07 2015 +0530 # Node ID 60832369ebb4e1014b4080b27a0401f97af93958 # Parent 9feee64efa440c25f016d15ae982789e5393a77e asm: intra_filter 10bpp sse4 code Performance improved

[x265] [PATCH] asm: fix gcc build error, invalid size for operand 1

2015-06-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435307390 -19800 # Fri Jun 26 13:59:50 2015 +0530 # Node ID 504a42904fab2a43e4d8b5b65513db7a7dd30ee1 # Parent 1e5c4d155ab85e8e8dd199bb3515801766ea9e88 asm: fix gcc build error, invalid size for operand 1 diff

[x265] [PATCH 2 of 4] asm: intra_filter8x8 sse4 code, improved 990c-201c over C code

2015-06-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435323520 -19800 # Fri Jun 26 18:28:40 2015 +0530 # Node ID 93c31f8b404708cd39d00b85a07b2418794fc103 # Parent 44b574b61b29a3cfba99e8f0d06622e44a86df17 asm: intra_filter8x8 sse4 code, improved 990c-201c over C

[x265] [PATCH 1 of 4] asm: intra_filter4x4 sse4 code and added testbench support, improved 357c-141c over C code

2015-06-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435323067 -19800 # Fri Jun 26 18:21:07 2015 +0530 # Node ID 44b574b61b29a3cfba99e8f0d06622e44a86df17 # Parent d64227e54233d1646c55bcb4b0b831e5340009ed asm: intra_filter4x4 sse4 code and added testbench support

[x265] [PATCH 4 of 4] asm: intra_filter32x32 sse4 code, improved 4050c-652c over C code

2015-06-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435323958 -19800 # Fri Jun 26 18:35:58 2015 +0530 # Node ID e04bde60af516f6f016e3e6f37d5d64e97e589f3 # Parent 1995a55f1320a029fb423f23cbfd24555c258d09 asm: intra_filter32x32 sse4 code, improved 4050c-652c over

[x265] [PATCH 0 of 4 ] asm code and testbench support for intra_filter primitive

2015-06-26 Thread dnyaneshwar
intra_filter_4x4 2.52x141.82 357.20 intra_filter_8x8 4.79x198.79 951.41 intra_filter_16x16 5.56x351.03 1952.17 intra_filter_32x32 6.20x652.82 4050.76 ___ x265-devel mailing list

[x265] [PATCH 3 of 4] asm: intra_filter16x16 sse4 code, improved 1952c-351c over C code

2015-06-26 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435323720 -19800 # Fri Jun 26 18:32:00 2015 +0530 # Node ID 1995a55f1320a029fb423f23cbfd24555c258d09 # Parent 93c31f8b404708cd39d00b85a07b2418794fc103 asm: intra_filter16x16 sse4 code, improved 1952c-351c over

[x265] [PATCH 1 of 6] asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c-690c over SSE

2015-06-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435212794 -19800 # Thu Jun 25 11:43:14 2015 +0530 # Node ID faec09e1ab60531924f2d919d4f283fa91bfec81 # Parent b1af4c36f48a4500a4912373ebcda9a5540b5c15 asm: 10bpp AVX2 code for saoCuOrgE0, improved 974c-690c

[x265] [PATCH 4 of 6] asm: 10bpp AVX2 code for saoCuOrgE2

2015-06-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435213857 -19800 # Thu Jun 25 12:00:57 2015 +0530 # Node ID 8b680fd502e08ec2cab4fff7f5833791bb5bfeef # Parent f43aa44673dcd8e96581c938cf22ad4bbb7657e3 asm: 10bpp AVX2 code for saoCuOrgE2 SAO_EO_2[0] 207c-166

[x265] [PATCH 3 of 6] asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c-614c over SSE

2015-06-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435213462 -19800 # Thu Jun 25 11:54:22 2015 +0530 # Node ID f43aa44673dcd8e96581c938cf22ad4bbb7657e3 # Parent 31da07b7198ca730bae37577d5053a3337477f7b asm: 10bpp AVX2 code for saoCuOrgE1_2Rows, improved 900c

[x265] [PATCH 6 of 6] asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c-15595c over SSE

2015-06-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435219949 -19800 # Thu Jun 25 13:42:29 2015 +0530 # Node ID f1ff5636cba3e2b714ceed86261362a53e8c6aca # Parent 85d5582eedd40e4227131bff366235e6dc2b361a asm: 10bpp AVX2 code for saoCuOrgB0, improved 23127c

[x265] [PATCH 5 of 6] asm: 10bpp AVX2 code for saoCuOrgE3

2015-06-25 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1435214505 -19800 # Thu Jun 25 12:11:45 2015 +0530 # Node ID 85d5582eedd40e4227131bff366235e6dc2b361a # Parent 8b680fd502e08ec2cab4fff7f5833791bb5bfeef asm: 10bpp AVX2 code for saoCuOrgE3 SAO_EO_3[0] 236c-195

Re: [x265] [PATCH 0 of 6 ] SAO SSE4 asm code for HIGH_BIT_DEPTH

2015-06-22 Thread Dnyaneshwar Gorade
Okay. Will check IACA report and try pxor for m0 and buffer 1023. On Mon, Jun 22, 2015 at 8:24 PM, chen chenm...@163.com wrote: right some comment: 'psignb X, [pb_128]' equal to 'psubb X, 0, X', in AVX2, second type faster, in SSE4, choice depends on IACA report in PMINSW, you buffer ZERO

[x265] [PATCH 1 of 6] asm: 10bpp sse4 code for saoCuOrgE0, improved 8740c-974c, over C code

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434712676 -19800 # Fri Jun 19 16:47:56 2015 +0530 # Node ID a94e9a1f0fde08e060a9b52e3353ce2f242d9257 # Parent 83a7d824442455ba5e0a6b53ea68e6b7043845de asm: 10bpp sse4 code for saoCuOrgE0, improved 8740c-974c

[x265] [PATCH 4 of 6] asm: 10bpp sse4 code for saoCuOrgE2

2015-06-22 Thread dnyaneshwar
# HG changeset patch # User Dnyaneshwar G dnyanesh...@multicorewareinc.com # Date 1434963191 -19800 # Mon Jun 22 14:23:11 2015 +0530 # Node ID f85c15cc0e1d70e63182b03e294c2778f598143d # Parent 558ffdc4e832061d99f1ec688fe1ae64db48642f asm: 10bpp sse4 code for saoCuOrgE2 Performance

  1   2   3   4   >