Re: [x265] [PATCH] encoder: Do not include CLL SEI message if empty

2018-11-06 Thread Praveen Tiwari
Hello Vittorio, Sorry for the late reply, all of us were on leave due to the Diwali festival in India. Thanks for the patch, will run some basic test and push the patch. Regards, Praveen On Wed, Nov 7, 2018 at 12:35 AM Vittorio Giovara wrote: > > > On Thu, Nov 1, 2018 at 5:34 PM Vittorio

Re: [x265] [PATCH] fix Issue #442: linking issue on non x86 platform

2018-10-31 Thread Praveen Tiwari
Thanks! I messed up the syntax. On Wed, Oct 31, 2018 at 5:45 PM Andrey Semashev wrote: > On 10/31/18 2:33 PM, prav...@multicorewareinc.com wrote: > > # HG changeset patch > > # User Praveen Tiwari > > # Date 1540983948 -19800 > > # Wed Oct 31 16:35:

Re: [x265] Original C++ code used for sad functions' assembly code in COST_MV?

2018-09-05 Thread Praveen Tiwari
Hello Jeffrey, You can find all C primitives in source/common folder. SAD C primitives ares in source/common/pixel.cpp. Thanks, Praveen On Wed, Sep 5, 2018 at 12:23 PM, Mario *LigH* Rohkrämer wrote: > Jeffrey Chen schrieb am 04.09.2018 um 23:57: > >> Hi, I would like to configure the sad

Re: [x265] Code performance issue

2018-06-04 Thread Praveen Tiwari
Hello Min, Thanks for the suggestion, we will run some tests and let you know if any change is required here. Thanks. Regards, Praveen Tiwari On Sat, Jun 2, 2018 at 9:18 AM, chen wrote: > There have series performance issues, such as, > > uint32_t sum = (uint32_t)pow((outOfBound

Re: [x265] [PATCH] threadpool.cpp: use WIN system call for popcount

2018-05-03 Thread Praveen Tiwari
, Andrey Semashev <andrey.semas...@gmail.com> wrote: > On Thu, May 3, 2018 at 7:37 PM, Pradeep Ramachandran > <prad...@multicorewareinc.com> wrote: > > > > On Thu, May 3, 2018 at 2:23 PM, <prav...@multicorewareinc.com> wrote: > >> > >>

Re: [x265] [PATCH 000 of 307 ] AVX-512 implementataion in x265: breaks 32-bit compilation

2018-04-11 Thread Praveen Tiwari
Thanks for reporting, we are looking at the issue, will send a fix soon. Regards, Praveen Tiwari On Thu, Apr 12, 2018 at 2:31 AM, Mario Rohkrämer <cont...@ligh.de> wrote: > Am 07.04.2018, 04:29 Uhr, schrieb <mythr...@multicorewareinc.com>: > > This series of patches enables

Re: [x265] [PATCH 000 of 307 ] AVX-512 implementataion in x265

2018-04-06 Thread Praveen Tiwari
Your request is on the way, soon we will share the performance related details. Thanks. Regards, Praveen Tiwari On Fri, Apr 6, 2018 at 9:36 PM, Vittorio Giovara <vittorio.giov...@gmail.com > wrote: > just curious, what kind of general speed improvement does this give? > I could have

Re: [x265] [PATCH] quant.cpp: 'rdoQuant_c' primitive for SIMD optimization

2017-11-27 Thread Praveen Tiwari
Please ignore this patch I messed an update. I will resend this soon. Thanks On Mon, Nov 27, 2017 at 5:11 PM, <prav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1511167656 -19800 > # Mon No

Re: [x265] [PATCH 2 of 2] x86: Change assembler from YASM to NASM

2017-11-21 Thread Praveen Tiwari
, Praveen Tiwari ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] Fwd: [PATCH] intra: sse4 version of strong intra smoothing

2017-11-20 Thread Praveen Tiwari
-- Forwarded message -- From: chen Date: Tue, Nov 21, 2017 at 10:07 AM Subject: Re: [x265] [PATCH] intra: sse4 version of strong intra smoothing To: Development for x265 >diff -r a7c2f80c18af -r 973560d58dfb

[x265] Fwd: [PATCH 3 of 3] SEA motion search:integralv functions avx2 implementation

2017-05-02 Thread Praveen Tiwari
-- Forwarded message -- From: Date: Tue, May 2, 2017 at 3:16 PM Subject: [x265] [PATCH 3 of 3] SEA motion search:integralv functions avx2 implementation To: x265-devel@videolan.org # HG changeset patch # User Vignesh Vijayakumar # Date 1493121121

[x265] Fwd: [PATCH 2 of 3] SEA motion search:Add testbench for integralv functions

2017-05-02 Thread Praveen Tiwari
-- Forwarded message -- From: Date: 2017-05-02 15:16 GMT+05:30 Subject: [x265] [PATCH 2 of 3] SEA motion search:Add testbench for integralv functions To: x265-devel@videolan.org # HG changeset patch # User Vignesh Vijayakumar # Date 1493358749

[x265] Fwd: [PATCH 1 of 3] SEA motion search:Setup asm primitives for integral calculation

2017-05-02 Thread Praveen Tiwari
-- Forwarded message -- From: Date: Tue, May 2, 2017 at 3:16 PM Subject: [x265] [PATCH 1 of 3] SEA motion search:Setup asm primitives for integral calculation To: x265-devel@videolan.org # HG changeset patch # User Vignesh Vijayakumar # Date

Re: [x265] Interested in fast popcnt substitute below SSE4.2?

2017-03-01 Thread Praveen Tiwari
Hi Mario, Sorry for late reply, you have shared an interesting and useful information. Currently we are doing some experimental refactoring over the ASM code base, so it might take some time. Hoping to receive more post like this. Regards, Praveen Tiwari On Wed, Mar 1, 2017 at 8:21 PM, Mario

Re: [x265] [PATCH 1 of 9] pcs: update design to have 'm_achivedFps' for every PCS Instance

2016-11-17 Thread Praveen Tiwari
Please, ignore this patch. Thanks. On Thu, Nov 17, 2016 at 8:51 PM, <prav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1479128885 -19800 > # Mon Nov 14 18:38:05 2016 +0530 >

Re: [x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-23 Thread Praveen Tiwari
bit-depth version and compare with one bit-depth version, > but the output are still matched in both 10 and 12 bit. > > Regards, > Min > > At 2016-09-22 14:39:50,"Praveen Tiwari" <prav...@multicorewareinc.com> > wrote: > > Hi Min, > > After this pat

Re: [x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-22 Thread Praveen Tiwari
Hi Min, After this patch outputs are changing, tested for following command line for 10-bit and 12-bit outputs. --input=NebutaFestival_2560x1600_60_10bit_crop.yuv --input-res=2560x1600 --fps=60 --numa-pools="NULL" --output-depth=12 --hash=1 -o NFOut12.hevc Regards, Praveen On Thu, Sep

Re: [x265] [PATCH] threadpool.cpp: fix default pool param behaviour, if NULL or “” (default) x265 will use all available threads on each NUMA node

2016-09-08 Thread Praveen Tiwari
Please ignore this this behaviour is not required for linux systems. Thanks. Regards, Praveen On Wed, Sep 7, 2016 at 5:19 PM, <prav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1473246754 -19800 >

Re: [x265] [PATCH] threadpool: fix warning: ‘int popCount(uint64_t)’ defined but not used [-Wunused-function]

2016-05-30 Thread Praveen Tiwari
h https://patches.videolan.org/patch/13495/ (it fixes > also this warning)? > > > W dniu 2016-05-30 o 14:45, prav...@multicorewareinc.com pisze: > > # HG changeset patch > > # User Praveen Tiwari <prav...@multicorewareinc.com> > > # Date 1

Re: [x265] [PATCH 1 of 7] threadpool.cpp: get correct CPU count for multisocket machines -> windows system fix

2016-05-23 Thread Praveen Tiwari
rav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1463655478 -19800 > # Thu May 19 16:27:58 2016 +0530 > # Node ID 9a6ab28b736e1167ac26977d7da8ab2d23cc296f > # Parent aca781339b4c8dae94ff7da73f18cd44

Re: [x265] [PATCH] ThreadPool.cpp: fix getCpuCount function for windows systems

2016-05-20 Thread Praveen Tiwari
Please ignore this sending updated patch. thanks. Regards, Praveen On Tue, May 17, 2016 at 7:17 PM, <prav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1463492830 -19800 > # Tue May 17 19:17:1

Re: [x265] [PATCH] ThreadPool.cpp: fix core count for windows machines

2016-05-20 Thread Praveen Tiwari
Please ignore this sending updated patch. Thanks Regards, Praveen On Tue, May 17, 2016 at 8:01 PM, Pradeep Ramachandran < prad...@multicorewareinc.com> wrote: > > On Tue, May 17, 2016 at 7:07 PM, <prav...@multicorewareinc.com> wrote: > >> # HG changeset patch >

Re: [x265] [PATCH] motion.cpp: optimize 'X265_DIA_SEARCH' byeliminating costly branch instructions

2016-03-08 Thread praveen tiwari
reinc.com wrote: ># HG changeset patch ># User Praveen Tiwari <prav...@multicorewareinc.com> ># Date 1457448163 -19800 ># Tue Mar 08 20:12:43 2016 +0530 ># Node ID 519441d72cf723dc3b279a91a6080f329729cb49 ># Parent 0e1b6472c05e3a53538d8e064e502d8a7508eb6e >motion.c

Re: [x265] [PATCH] param: cleanup, print reconfigured param option along with its old and configured value

2016-03-07 Thread Praveen Tiwari
Please ignore the patch need to update. Thanks. Regards, Praveen On Tue, Mar 8, 2016 at 10:57 AM, <prav...@multicorewareinc.com> wrote: > # HG changeset patch > # User Praveen Tiwari <prav...@multicorewareinc.com> > # Date 1457356750 -19800 > # Mon Mar 07 18:49:1

[x265] Fwd: [PATCH] asm: avx2 code for weight_sp() 16bpp

2015-06-30 Thread Praveen Tiwari
-- Forwarded message -- From: aasaipr...@multicorewareinc.com Date: Mon, Jun 29, 2015 at 4:51 PM Subject: [x265] [PATCH] asm: avx2 code for weight_sp() 16bpp To: x265-devel@videolan.org # HG changeset patch # User Aasaipriya Chandran aasaipr...@multicorewareinc.com # Date

Re: [x265] Fwd: [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp

2015-06-29 Thread Praveen Tiwari
on this. On Fri, Jun 26, 2015 at 5:31 PM, Praveen Tiwari prav...@multicorewareinc.com wrote: -- Forwarded message -- From: raj...@multicorewareinc.com Date: Fri, Jun 26, 2015 at 3:14 PM Subject: [x265] [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp To: x265-devel

[x265] Fwd: [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp

2015-06-26 Thread Praveen Tiwari
-- Forwarded message -- From: raj...@multicorewareinc.com Date: Fri, Jun 26, 2015 at 3:14 PM Subject: [x265] [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp To: x265-devel@videolan.org # HG changeset patch # User Rajesh Paulrajraj...@multicorewareinc.com # Date 1435311076

Re: [x265] Fwd: [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp

2015-06-26 Thread Praveen Tiwari
wrote: I tried using vinserti128. But that reduces the performance than this one. So i kept this version. On Fri, Jun 26, 2015 at 3:37 PM, Praveen Tiwari prav...@multicorewareinc.com wrote: -- Forwarded message -- From: raj...@multicorewareinc.com Date: Fri, Jun 26

[x265] Fwd: [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp

2015-06-26 Thread Praveen Tiwari
-- Forwarded message -- From: raj...@multicorewareinc.com Date: Fri, Jun 26, 2015 at 3:14 PM Subject: [x265] [PATCH] asm: pixelavg_pp[8xN] avx2 code for 10bpp To: x265-devel@videolan.org # HG changeset patch # User Rajesh Paulrajraj...@multicorewareinc.com # Date 1435311076

Re: [x265] [PATCH 1 of 3] asm: intra_pred_ang32_33 improved by ~35% over SSE4

2015-03-26 Thread Praveen Tiwari
Please ignore duplicate patch (second), send my mistake. Regards, Praveen On Fri, Mar 27, 2015 at 10:41 AM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1427356204 -19800 # Thu Mar 26 13:20:04 2015 +0530 # Branch

Re: [x265] [PATCH 2 of 3] asm: intra_pred_ang32_25 improved by ~53% over SSE4

2015-03-26 Thread Praveen Tiwari
Please ignore duplicate patch (second), send my mistake. Regards, Praveen On Fri, Mar 27, 2015 at 10:41 AM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 142736 -19800 # Thu Mar 26 14:23:20 2015 +0530 # Branch

Re: [x265] [PATCH] asm: intra_pred_ang16_25

2015-03-12 Thread Praveen Tiwari
Please ignore, need to add performance data in commit message. Regards, Praveen On Thu, Mar 12, 2015 at 6:50 PM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1426165765 -19800 # Node ID

Re: [x265] [PATCH] asm-avx2: inra_pred, align const

2015-03-11 Thread Praveen Tiwari
Updated this patch on tip. Thanks, Praveen On Tue, Mar 10, 2015 at 10:53 AM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1425964751 -19800 # Node ID f97dfb483647d573cbcab9a4f007ac2aa89c9066 # Parent

[x265] Fwd: [PATCH] asm: avx2 code for sad[32x32] for 8bpp

2015-03-11 Thread Praveen Tiwari
-- Forwarded message -- From: sumala...@multicorewareinc.com Date: Wed, Mar 11, 2015 at 2:24 PM Subject: [x265] [PATCH] asm: avx2 code for sad[32x32] for 8bpp To: x265-devel@videolan.org # HG changeset patch # User Sumalatha Polureddysumala...@multicorewareinc.com # Date

[x265] Fwd: [PATCH] asm-avx2: intra_pred_ang8_11

2015-03-11 Thread Praveen Tiwari
the compiler will not use two 'mova' instruction internally rather than just using once? Can be depend on the compiler here for this optimization? Even syntax of 'vpermd' does not allows this. At 2015-03-10 13:58:50,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav

[x265] Fwd: [PATCH] asm: intra_pred_ang16_34

2015-03-10 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Wed, Mar 11, 2015 at 6:32 AM Subject: Re: [x265] [PATCH] asm: intra_pred_ang16_34 To: Development for x265 x265-devel@videolan.org same speed to old version This avx2 version of asm code eliminates following instruction

[x265] Fwd: [PATCH] asm: intra_pred_ang16_2

2015-03-10 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Wed, Mar 11, 2015 at 6:32 AM Subject: Re: [x265] [PATCH] asm: intra_pred_ang16_2 To: Development for x265 x265-devel@videolan.org same speed to old version This avx2 version of asm code eliminates following instruction on

[x265] Fwd: [PATCH] asm: intra_pred_ang8_24 8bpp, improved 206.33c - 177.70c over SSE version

2015-03-10 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Wed, Mar 11, 2015 at 6:09 AM Subject: Re: [x265] [PATCH] asm: intra_pred_ang8_24 8bpp, improved 206.33c - 177.70c over SSE version To: Development for x265 x265-devel@videolan.org +c_ang8_mode_24: db 5, 27, 5, 27, 5,

Re: [x265] [PATCH] asm-avx2: intra_pred_ang8_25, (42.92x)

2015-03-09 Thread Praveen Tiwari
Updated the code with more optimization. Regards, Praveen On Sat, Mar 7, 2015 at 3:31 AM, chen chenm...@163.com wrote: right At 2015-03-06 14:16:23,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1425622433 -19800

Re: [x265] [PATCH] asm-avx2: intra_pred_ang8_11, (51.84x)

2015-03-09 Thread Praveen Tiwari
Update the patch with more optimization. Regards, Praveen On Sat, Mar 7, 2015 at 3:40 AM, chen chenm...@163.com wrote: right At 2015-03-06 15:50:38,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1425628229 -19800

Re: [x265] [PATCH] asm-avx2: intra_pred_ang8_24, (40.05x)

2015-03-09 Thread Praveen Tiwari
Updated the patch as per suggestions. Regards, Praveen On Sat, Mar 7, 2015 at 3:57 AM, chen chenm...@163.com wrote: At 2015-03-06 17:24:05,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1425633836 -19800 # Node ID

[x265] Fwd: Fwd: [PATCH Review Only] asm-avx2: intra_pred_ang8_33, improved 265.79c - 185.43c over sse4 asm code

2015-02-26 Thread Praveen Tiwari
,Praveen Tiwari prav...@multicorewareinc.com wrote: -- Forwarded message -- From: chen chenm...@163.com Date: Wed, Feb 25, 2015 at 7:38 PM Subject: Re: [x265] [PATCH Review Only] asm-avx2: intra_pred_ang8_33, improved 265.79c - 185.43c over sse4 asm code To: Development for x265

[x265] Fwd: [PATCH Review Only] asm-avx2: intra_pred_ang8_33, improved 265.79c - 185.43c over sse4 asm code

2015-02-25 Thread Praveen Tiwari
,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari prav...@multicorewareinc.com # Date 1424854196 -19800 # Node ID 177fe9372668b4824c291e967349664766688179 # Parent 02bac78bde961d60d180e59b5260fad93b98d9b4 asm-avx2: intra_pred_ang8_33, improved 265.79c - 185.43c over sse4

[x265] Fwd: [PATCH] blockcopy_pp_12x32: SSE2 asm code optimization

2015-02-06 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Thu, Feb 5, 2015 at 5:55 PM Subject: Re: [x265] [PATCH] blockcopy_pp_12x32: SSE2 asm code optimization To: Development for x265 x265-devel@videolan.org this code is right but could you try use general register move (rN,

Re: [x265] [PATCH] blockfill_s_8x8 sse2 asm code optimization

2015-02-02 Thread Praveen Tiwari
Sent updated patch. Thanks. Regards, Praveen On Mon, Feb 2, 2015 at 4:39 PM, chen chenm...@163.com wrote: At 2015-02-02 16:55:16,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1422867249 -19800 # Branch stable # Node ID

Re: [x265] [PATCH] add testbench for psyCost_ss and asm for psyCost_ss_4x4: improve 1989c-515c

2015-01-09 Thread Praveen Tiwari
If it is only 64x64, then definitely it is range issue when we are finally accumulating sum of all sad calculations. It make more obvious with 64x64 because more number of accumulation is here. Algorithm issue must have reflected in other partition also. Regards, Praveen On Fri, Jan 9, 2015 at

Re: [x265] [PATCH] asm: luma_vpp[16x32, 16x64] in avx2: improve 3875c-2488c, 7499c-4915c

2014-11-20 Thread Praveen Tiwari
tab_LumaCoeffVer_32 table of this name is already in file, redefining here will cause build error. Please, verify and update patch. On Thu, Nov 20, 2014 at 2:49 PM, Divya Manivannan di...@multicorewareinc.com wrote: # HG changeset patch # User Divya Manivannan di...@multicorewareinc.com #

[x265] Fwd: [PATCH] refactorizaton of the transform/quant path

2014-11-19 Thread Praveen Tiwari
patch # User Praveen Tiwari # Date 1416299427 -19800 # Node ID 706fa4af912bc1610478de8f09a651ae3e58624c # Parent 2f0062f0791b822fa932712a56e6b0a14e976d91 refactorizaton of the transform/quant path. This patch involves scaling down the DCT/IDCT coefficients from int32_t to int16_t as they can

[x265] Fwd: [PATCH] refactorizaton of the transform/quant path

2014-11-19 Thread Praveen Tiwari
patch # User Praveen Tiwari # Date 1416299427 -19800 # Node ID 706fa4af912bc1610478de8f09a651ae3e58624c # Parent 2f0062f0791b822fa932712a56e6b0a14e976d91 refactorizaton of the transform/quant path. This patch involves scaling down the DCT/IDCT coefficients from int32_t to int16_t as they can

Re: [x265] [PATCH] disable denoiseDct asm code until fixed for Mac OS

2014-11-19 Thread Praveen Tiwari
changeset patch # User Praveen Tiwari # Date 1416402744 -19800 # Node ID 0ef14321fb144362b609d51f2d7c58f7db757ceb # Parent 706fa4af912bc1610478de8f09a651ae3e58624c disable denoiseDct asm code until fixed for Mac OS with denoise disabled, it finds the next failing primitive: $ ./test

Re: [x265] [PATCH 3 of 3] asm: AVX2 version luma_vpp[4x4], improve 391c - 302c

2014-11-03 Thread Praveen Tiwari
Crashing on vc11-x86-8bpp, Release mode. Min, can you check your code ? Regards, Praveen On Fri, Oct 31, 2014 at 4:16 AM, Min Chen chenm...@163.com wrote: # HG changeset patch # User Min Chen chenm...@163.com # Date 1414709200 25200 # Node ID 5d0b20f6e4de0b59b8c3306793c7267e01b9a41b #

[x265] Fwd: [PATCH] weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code

2014-10-16 Thread Praveen Tiwari
-16 17:20:13,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1413451199 -19800 # Node ID 858be8d7d7176ab6c6d01cf92d00c8478fe99b34 # Parent 79702581ec824a2a375aebe228d69c3930aeea96 weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse

Re: [x265] [PATCH] noiseReduction: make noiseReduction deterministic for a given number of frameEncoders

2014-10-14 Thread Praveen Tiwari
Seems we missed out something here, I tested this patch at my end outputs are deterministic with --pmode but still non-deterministic without --pmode option. Steve/Deepthi please verify at your end before pushing it. I used the following cli: y4mInputs\park_joy_1280x720p50.y4m --tune=ssim --psnr

[x265] Fwd: [PATCH] denoiseDct: unit test code

2014-09-16 Thread Praveen Tiwari
-- Forwarded message -- From: Steve Borho st...@borho.org Date: Mon, Sep 15, 2014 at 4:28 PM Subject: Re: [x265] [PATCH] denoiseDct: unit test code To: Development for x265 x265-devel@videolan.org On 09/15, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen

Re: [x265] [PATCH] copy_cnt: enable avx2 version of asm code

2014-09-11 Thread Praveen Tiwari
You can push 16x16 and 32x32 also they are good in performance but they need a bit more improvement, I will be sending improvement patch soon. Regards, Praveen Tiwari On Thu, Sep 11, 2014 at 11:29 AM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: Would be better to combine this asm

Re: [x265] [PATCH] removed copy_cnt_4 avx2 asm code: SSE version is eualy faster

2014-09-11 Thread Praveen Tiwari
Ignore It, need to correct commit message. Regards, Praveen Tiwari On Thu, Sep 11, 2014 at 4:41 PM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1410433904 -19800 # Node ID 5740ec22db67267bfca97fbba07ef9239802d2b0 # Parent

[x265] Fwd: Fwd: [PATCH] copy_cnt_4: faster AVX2 code

2014-09-10 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Wed, Sep 10, 2014 at 12:14 PM Subject: Re: [x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code To: Development for x265 x265-devel@videolan.org At 2014-09-10 09:34:31,Praveen Tiwari prav...@multicorewareinc.com wrote

[x265] Fwd: [PATCH] copy_cnt_4: faster AVX2 code

2014-09-09 Thread Praveen Tiwari
vinserti128 ? At 2014-09-09 16:37:23,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1410251834 -19800 # Node ID d011073f35258cb2f0ad95db6038c2d9fb840b27 # Parent ebb84e9dbb0fa0e8c4c9304b2efd57f8ac3d0c05 copy_cnt_4: faster AVX2 code diff -r ebb84e9dbb0f -r

Re: [x265] [PATCH] count_nonzero primitive, downscaling quantCoeff from int32_t* to int16_t*

2014-08-12 Thread Praveen Tiwari
Thanks, just sent a fix for it. Regards, Praveen On Tue, Aug 12, 2014 at 7:18 PM, chen chenm...@163.com wrote: -X265_CHECK((int)numSig == primitives.count_nonzero(coeff, 1 log2TrSize * 2), numSig differ\n); +/* This section of code is to safely convert int32_t

Re: [x265] x265: uncommon behavior by changing the 8-point DCT matrix

2014-06-10 Thread Praveen Tiwari
I think you are testing with asm code enabled. Assembly code has it's own table, it nothing to do with constant 'g_t8' at source/Lib/TLibCommon/TComRom.cpp (only for C code). Check dct8.asm file for asm tables. Regards, Praveen Tiwari On Wed, May 28, 2014 at 5:15 AM, Paulo André Oliveira

Re: [x265] Fwd: [PATCH] noise reduction feature, ported from x264

2014-05-12 Thread Praveen Tiwari
(1), W(5), W(1), W(3), W(1), W(5), W(1), W(4), W(5), W(2), W(5), W(4), W(5), W(2), W(5), W(3), W(1), W(5), W(1), W(3), W(1), W(5), W(1) what is logic behind such arrangement ? Regards, Praveen Tiwari On Sat, May 10, 2014 at 8:12 AM, Jason Garrett-Glaser ja...@x264.comwrote

[x265] Fwd: [PATCH] noise reduction feature, ported from x264

2014-05-08 Thread Praveen Tiwari
-- Forwarded message -- From: Jason Garrett-Glaser ja...@x264.com Date: Thu, May 8, 2014 at 5:08 PM Subject: Re: [x265] [PATCH] noise reduction feature, ported from x264 To: Development for x265 x265-devel@videolan.org This only seems to have 4x4 and 8x8 transform sizes; how does

Re: [x265] [PATCH] all_angs_pred_32x32, asm code improvement

2014-02-27 Thread Praveen Tiwari
This is new patch same changes in other modes, but I have given same commit message perhaps that's why it seems confusing. Do I need to send as an attachment ? On Thu, Feb 27, 2014 at 4:28 PM, Deepthi Nandakumar deep...@multicorewareinc.com wrote: The earlier patch was pushed, Praveen. Can

Re: [x265] [PATCH] all_angs_pred_32x32, asm code improvement

2014-02-26 Thread Praveen Tiwari
Oh, just left by mistake. I commented old code to test correctness of new code, I will update the patch. On Thu, Feb 27, 2014 at 3:33 AM, chen chenm...@163.com wrote: At 2014-02-26 20:28:52,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1393417704

Re: [x265] [PATCH] all_angs_pred_4x4, mova replace with pxor

2013-12-04 Thread Praveen Tiwari
Min, I have sent the updated full patch. Regards, Praveen Tiwari On Wed, Dec 4, 2013 at 8:58 PM, chen chenm...@163.com wrote: can you send a full patch, not patch to patch At 2013-12-04 22:50:05,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date

Re: [x265] [PATCH] asm-primitives.cpp, removed temporary function pointer initialization, generated through macro calls

2013-11-22 Thread Praveen Tiwari
sorry, I removed wrong pointer initialization, I will fix it in next patch, don't merge it. On Fri, Nov 22, 2013 at 4:34 PM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1385118266 -19800 # Node ID f2b8bcaf435c00d835cd4389063ed09d22e7be28 # Parent

Re: [x265] [PATCH] asm code for pixeladd_ps_4x4 and testbench integration

2013-11-20 Thread Praveen Tiwari
Merged, sent implementation. Regards, Praveen Tiwari On Wed, Nov 20, 2013 at 6:08 PM, chen chenm...@163.com wrote: At 2013-11-20 19:45:24,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1384947915 -19800 # Node ID

Re: [x265] [PATCH] bug fix in blockcopy_pp_4x4

2013-11-12 Thread Praveen Tiwari
Please, ignore this patch old code is also fine. Some other bug. Regards, Praveen Tiwari On Tue, Nov 12, 2013 at 3:09 PM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1384249182 -19800 # Node ID 40695de368b6c890fa27a08c8e5a277c9682149c # Parent

Re: [x265] [PATCH] asm code for blockcopy_ps, 8x6, 8x16 and 8x32

2013-11-11 Thread Praveen Tiwari
I mistyped one partition size, instead of 8x6 it will be 8x8, rest are correct. Regards, Praveen Tiwari On Mon, Nov 11, 2013 at 2:58 PM, prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1384162089 -19800 # Node ID

Re: [x265] [PATCH] asm code for blockcopy_ps_16x4

2013-11-11 Thread Praveen Tiwari
Fixed. Regards, Praveen Tiwari On Mon, Nov 11, 2013 at 4:06 PM, chen chenm...@163.com wrote: +movu m1, [r2] +punpcklbw m2, m1,m0 Here have a hide register copy, try to avoid it by SSE4.1 pmovzxbw m2, m1 +movu [r0], m2

Re: [x265] [PATCH] asm code for blockcopy_ps_2x4

2013-11-11 Thread Praveen Tiwari
Replaced. Regards, Praveen Tiwari On Mon, Nov 11, 2013 at 7:02 PM, chen chenm...@163.com wrote: +movd m0,[r2] +pmovzxbw m0,m0 +pextrd [r0], m0, 0 same as movd ___ x265-devel mailing list

Re: [x265] [PATCH] asm code for blockcopy_ps_24x32

2013-11-11 Thread Praveen Tiwari
Sent Patch. Regards, Praveen Tiwari On Mon, Nov 11, 2013 at 6:54 PM, chen chenm...@163.com wrote: +;- +; void blockcopy_ps_%1x%2(int16_t *dest, intptr_t destStride, pixel *src, intptr_t srcStride

[x265] Fwd: [PATCH] blockcopy_sp_4x8, optimized asm code

2013-11-08 Thread Praveen Tiwari
# User Praveen Tiwari # Date 1383903250 -19800 # Node ID 1e6bf52b6e3471b81e636569daa667f6dec9838a # Parent 44ac213169c906eab5cba6b4aba876391b81da99 blockcopy_sp_4x8, optimized asm code diff -r 44ac213169c9 -r 1e6bf52b6e34 source/common/x86/blockcopy8.asm --- a/source/common/x86/blockcopy8.asm Fri

[x265] Fwd: [PATCH] blockcopy_sp_8x2, optimized asm code

2013-11-08 Thread Praveen Tiwari
-- Forwarded message -- From: chen chenm...@163.com Date: Fri, Nov 8, 2013 at 4:30 PM Subject: Re: [x265] [PATCH] blockcopy_sp_8x2, optimized asm code To: Development for x265 x265-devel@videolan.org +movh [r0], m0 +movhps [r0 + r1], m0 change movh to movlps is

[x265] Fwd: [PATCH] blockcopy_sp_16xN, optimized asm code

2013-11-08 Thread Praveen Tiwari
for .asm files? t 2013-11-08 21:32:05,prav...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1383917516 -19800 # Node ID 662664f0863b38b838a15867745c5564f574fb09 # Parent 227a5666e08869d36e07a75f3db95dd94c774715 blockcopy_sp_16xN, optimized asm code diff -r 227a5666e088

[x265] Fwd: [PATCH] added pixelsub_ps C primitive and function pointer creation

2013-11-07 Thread Praveen Tiwari
...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1383807695 -19800 # Node ID 34ba8955747b66dcf3471fa216d15b97a3b07e0c # Parent 93cccbe49a93dd4c054ef06aca76974948793613 added pixelsub_ps C primitive and function pointer creation diff -r 93cccbe49a93 -r 34ba8955747b

Re: [x265] [PATCH] asm code for blockfil_s, 16x16

2013-11-07 Thread Praveen Tiwari
Applied to code. Regards, Praveen Tiwari On Thu, Nov 7, 2013 at 8:09 PM, chen chenm...@163.com wrote: +movr3d, %2 %2/8 + + subr3d,8 + jnz.loop dec r3d ___ x265-devel mailing list

[x265] Fwd: [PATCH] asm code for blockfil_s, 4x4

2013-11-07 Thread Praveen Tiwari
# User Praveen Tiwari # Date 1383828996 -19800 # Node ID f2af7af43dfcb08135a08e755f654314a89efae7 # Parent d71f86b1c58b4fc9f8a3ffeaaef45c60f8bcc468 asm code for blockfil_s, 4x4 blockfill has two l Actually I named all pointers with blockfill (two I) and function with blockfil (one I), perhaps

Re: [x265] [PATCH] asm code for blockcopy_sp, 6x8

2013-11-06 Thread Praveen Tiwari
Fixed. Regards, Praveen Tiwari On Wed, Nov 6, 2013 at 8:09 PM, chen chenm...@163.com wrote: + movd [r0 + 2 * r1], m3 + pextrwr6,m3,2 + mov [r0 + 2 * r1 + 4], r6w SSE4.1 support below: pextrw[r0 + 2 * r1 + 4], m3,2

Re: [x265] [PATCH] asm: assembly code for pixel_sad_12x16

2013-10-30 Thread Praveen Tiwari
-- Forwarded message -- From: dnyanesh...@multicorewareinc.com Date: Wed, Oct 30, 2013 at 7:47 PM Subject: [x265] [PATCH] asm: assembly code for pixel_sad_12x16 To: x265-devel@videolan.org # HG changeset patch # User Dnyaneshwar Gorade dnyanesh...@multicorewareinc.com # Date

[x265] Fwd: [PATCH] assembly code for pixel_sad_x3_24x32

2013-10-30 Thread Praveen Tiwari
-- Forwarded message -- From: yuva...@multicorewareinc.com Date: Wed, Oct 30, 2013 at 2:38 PM Subject: [x265] [PATCH] assembly code for pixel_sad_x3_24x32 To: x265-devel@videolan.org # HG changeset patch # User Yuvaraj Venkatesh yuva...@multicorewareinc.com # Date 1383124045

[x265] Fwd: [PATCH 4 of 4] asm: interp_8tap_v_sp for ipfilter_sp[FILTER_V_S_P_8]

2013-10-28 Thread Praveen Tiwari
-- Forwarded message -- From: Steve Borho st...@borho.org Date: Mon, Oct 28, 2013 at 11:55 PM Subject: Re: [x265] [PATCH 4 of 4] asm: interp_8tap_v_sp for ipfilter_sp[FILTER_V_S_P_8] To: Development for x265 x265-devel@videolan.org On Mon, Oct 28, 2013 at 9:24 AM, Min Chen

[x265] Fwd: [PATCH] check_IPFilterChroma_primitive, stride made equal to min width 2, fix for 2XN block

2013-10-17 Thread Praveen Tiwari
I tried using stride 64 for both the source and dest buffers, which is perfectly reasonable, and the 2xN primitives failed their unit test which tells me they need to be fixed prior to using them in the encoder. Sent patch for fix. ___ x265-devel

[x265] Fwd: [PATCH] Added C primitive and unit test code for chroma filter

2013-10-15 Thread Praveen Tiwari
+templateint N, int width +void interp_horiz_pp(pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int height, int coeffIdx) +{ +int cStride = 1; +short const * coeff= g_chromaFilter[coeffIdx]; +src -= (N / 2 - 1) * cStride; +coeffIdx; +int offset; +

Re: [x265] [PATCH REVIEW Only ] chroma 4XN block, coeffIdex insted of coeff pointer

2013-10-11 Thread Praveen Tiwari
...@multicorewareinc.com wrote: # HG changeset patch # User Praveen Tiwari # Date 1381510220 -19800 # Node ID 5a9160e8b0bdc3117c2417bc29453077488efd8e # Parent c6d89dc62e191f56f63dbcb1781a6494da50a70d chroma 4XN block, coeffIdex insted of coeff pointer diff -r c6d89dc62e19 -r 5a9160e8b0bd source/common/x86

Re: [x265] [PATCH REVIEW Only ] chroma 4XN block, coeffIdex insted of coeff pointer

2013-10-11 Thread Praveen Tiwari
ohh... It will be movacoef2, [tab_coeff + coeffIdx * 16]. On Fri, Oct 11, 2013 at 11:21 PM, Praveen Tiwari prav...@multicorewareinc.com wrote: I have just missed to change the line movacoef2, [tab_coeff + 16] (I was just testing for coeffIdex 1 ) I will make

[x265] Fwd: [PATCH] replace pixelsub_sp vector class function with intrinsic

2013-10-04 Thread Praveen Tiwari
for (int x = 0; x bx; x += 16) { -Vec16uc word0, word1; -Vec8s word3, word4; -word0.load_a(src0 + x); -word1.load_a(src1 + x); -word3 = extend_low(word0) - extend_low(word1); -

Re: [x265] [PATCH] asm code for ipfilterH_pp, 4 tap filter

2013-09-28 Thread Praveen Tiwari
suppose, during execution width comes less than 8 like 5, then we would like to run our code section which handles the reaming width (_end_col:) not the whole code (handle multiple of 8 and renaming width part, it will computed twice in this case and corrupting some (8 - widthleft) dst[] old