Re: [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

2024-05-28 Thread chen
you please take a look performace with LD1R?. Regards, Chen At 2024-05-28 18:03:43, "Hari Limaye" wrote: >Hi Chen, > >Thank you for reviewing the patches. > >>In this case, replace LD1 by LDR+ADD is not get benefit > >Here, the existing instruction `ld1 {v0.s

Re: [x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

2024-05-24 Thread chen
Hi Hari, These 8 patches looks good, the only comment on below code = .macro SAD_START_4 f -ld1 {v0.s}[0], [x0], x1 +ldr s0, [x0] +ldr s1, [x2] +add x0, x0, x1 +add x2, x2, x3

Re: [x265] [PATCH 0/7] AArch64 saoCuStats Optimisations

2024-05-22 Thread chen
Hi Hari, The new patches looks good for me now, thank you for your patches. Regards, Chen At 2024-05-23 03:09:26, "Hari Limaye" wrote: >Hi Chen, > >Thank you for reviewing the patches. > >>In signOf_neon >>>+ // signOf(a - b) = -(a > b) |

Re: [x265] [PATCH 0/7] AArch64 saoCuStats Optimisations

2024-05-21 Thread chen
e it is similar algorithm as Neon Regards, Chen At 2024-05-21 00:14:35, "Hari Limaye" wrote: >Hi, > >This patch-series adds AArch64 Neon, SVE, and SVE2 implementations of >the saoCuStats function primitives for low and high bitdepth. > >This series is based on the pre

Re: [x265] [PATCH 01/12] AArch64: Fix costCoeffNxN test on Apple Silicon

2024-05-06 Thread chen
Hi Hari Limaye, Thank you fix AARCH64 build issues, these 12 patches looks good for me. Regards, Chen At 2024-05-03 05:19:36, "Hari Limaye" wrote: >The assembly routine x265_costCoeffNxN_neon is buggy and produces an >incorrect result on Apple Silicon, causing the

Re: [x265] Fwd: NASM 2.15.03 (MYS2/MinGW) throws a huge amount of macro warnings

2023-05-23 Thread chen
Hello, Could you please try my local patch? Regards, Min Chen 2023-05-21 17:27:35,"Mario *LigH* Rohkrämer" >Almost 3 years later, NASM version 2.16.01, and still no solution, >nobody is responsible for "just" warnings. > >-- > >Fun and success!

Re: [x265] ARM patches

2022-11-05 Thread chen
Hi, I haven't OS X environment, so I just guess the reason. The GCC and LLVM use different symbol prefix. We use "[private_prefix %+ _entropyStateBits]" in the x86 assembly code to suit these changes. But use as "movrel x1, x265_entropyStateBits" in aarch64, I don't found these little

Re: [x265] PING [PATCH] aarch64: replace ldr pseudo-instruction with adrp+add

2022-10-18 Thread chen
verify patch? Regards, Min Chen diff --git a/source/CMakeLists.txt b/source/CMakeLists.txt index 13e4750de..80f8e59a9 100755 --- a/source/CMakeLists.txt +++ b/source/CMakeLists.txt @@ -266,6 +266,9 @@ if(GCC) add_definitions(-DHAVE_NEON) endif() endif

Re: [x265] PING [PATCH] aarch64: replace ldr pseudo-instruction with adrp+add

2022-10-06 Thread chen
Hi Song, I means the current tree support these adrp+add mode with compile option -DPIC, so we need not patch the code. Regards, Min Chen 2022-10-06 06:42:33,"Fangrui Song" Hi Min, sorry but I just saw your question. I do not understand the request. adrp+add is jus

Re: [x265] PING [PATCH] aarch64: replace ldr pseudo-instruction with adrp+add

2022-09-24 Thread chen
'-fPIC -DPIC' Regards, Min Chen At 2022-09-24 15:21:35, "Fangrui Song" wrote: >Ping. The breaks lld build and some binutils configurating defaulting >to disallow text relocations. > >On 2022-08-29, Fangrui Song wrote: >>On 2022-08-29, Fangrui Song wrote: >>>O

Re: [x265] [PATCH] aarch64: replace ldr pseudo-instruction with adrp+add

2022-08-30 Thread chen
suggest keep current code, so the user may configuration by themselves. btw: the ENABLE_PIC looks just work with GCC, so I think we need take a look these option on Apple platform. Regards, Min Chen At 2022-08-30 13:42:39, "Fangrui Song" wrote: >On 2022-08-29, Fangrui Song wrote:

Re: [x265] [PATCH] aarch64: replace ldr pseudo-instruction with adrp+add

2022-08-29 Thread chen
Hi Song, Thank you for your patch. However, syntax of ':lo12:' depends on compiler, so more general LDR is better in here. Regards, Min Chen At 2022-08-30 02:33:37, "Fangrui Song" wrote: >The ldr pseudo-instruction uses a literal pool, which is less efficient >and d

Re: [x265] Some more Arm64 patches to bring performance up on Graviton processors

2022-03-25 Thread chen
-#if X86_64 +#if X86_64 || defined(__aarch64__) [MC] This is right, but for more generic, we can check with sizeof(long*)==8 Other are fine. Regards, Min Chen 2022-03-25 00:24:01,"Pop, Sebastian" Hi, Please find attached a few more changes that bring up the p

Re: [x265] [arm64] port costCoeffNxN

2022-03-10 Thread chen
sig, If we allow the data in absCoeff to be stored sparsely, we can get parallel processing all of 16 elements. Regards, Min Chen At 2022-03-05 04:24:09, "Pop, Sebastian" wrote: Thanks Min Chen for your feedback. Please see attached a patch that avoids one transfer from N

Re: [x265] Wrong version info?

2022-03-10 Thread chen
Hi Roger, Both version number looks right. The branch stable is 1 commit ahead of Tag 3.5, and branch master ahead more. So the version number is 3.5+1 and 3.5+34, the other part is git hash Regards, Min Chen At 2022-03-11 12:41:44, "Roger Pack" wrote: >Hello. >As a note if

Re: [x265] [arm64] port costCoeffNxN

2022-03-02 Thread chen
algorithm problems, for example, we spends many instructions for absCoeff[numNonZero], if we allow spare zeros inside of array, we will reduce many of instructions. Regards, Min Chen At 2022-03-02 07:28:15, "Pop, Sebastian" wrote: Hi, the attached patch fixes the registra

Re: [x265] [arm64] Status and combined patch

2022-01-28 Thread chen
Hi Sebastian, Thank your contribute, I haven't more comments now. Regards, Min Chen 2022-01-29 02:35:24,"Pop, Sebastian" Hi, > [MC] how about CMHI with a vector register that hold zeros? This works wonderfully, thanks for the suggestion! Perform

Re: [x265] [arm64] Status and combined patch

2022-01-27 Thread chen
Hi Sebastian, Thank you for your explain more, I inline my comments. At 2022-01-28 10:08:36, "Pop, Sebastian" wrote: Hi Min Chen, Thank you for your review comments, that helped improve the performance of scanPosLast on arm64: scanPosLast 5.46x

Re: [x265] [arm64] Status and combined patch

2022-01-21 Thread chen
Hi Sebastian, Thank you for your contribution, I reviewed and made some of comments, could you please take a look. Regards, Min Chen At 2022-01-19 23:25:30, "Pop, Sebastian" wrote: Hi Gopi, Please find attached a patch that ports scanPosLast to arm64 NEON. s

Re: [x265] x265 bug report

2021-11-24 Thread chen
Hi Nathan, Ah, I have same question in couple years ago. The root cause lies in the type of intermediate variable, the sign bit will affacf high part of combine varible. so if you change sum_t to sum2_t on variable A & B, you will get correct result. Regards, Min Chen At 2021-11-25 1

Re: [x265] missing files while compile x265-devel

2021-09-13 Thread chen
Hi, stdint.h is C/C++ standard header files. sys/time.h is depends on OS, but I guess it is not necessary during runing, it mostly use by collection performance data. memory.h may be ignore if you declare memory manager functions in other headers, such as malloc() Regards, Min Chen At 2021

Re: [x265] [arm64] port ssim_4x4x2_core

2021-08-07 Thread chen
Hi, Code looks good. The only comment is UADALP is slower, we can adjust order of sum to avoid it. Regards, Min Chen 2021-08-07 02:01:13,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernel: ssim_4x4x2_core 30.69x 13.39 410.85 Ok

Re: [x265] [arm64] port scale1D_128to64 and scale2D_64to32

2021-07-30 Thread chen
, Min Chen At 2021-07-31 12:14:29, "Pop, Sebastian" wrote: Hi, Please let me know if you have ideas on how to make this code faster. I tried to remove the stall by fetching more memory earlier, still no change in performance: // void scale2D_64to32(pixel* dst, const pixel* src

Re: [x265] [arm64] port scale1D_128to64 and scale2D_64to32

2021-07-30 Thread chen
Hi, The code looks good. little performance change because pipeline stall, two of LD1 can't hidden latency penalty, but it is not big problem, we saved the code size. Could you please make a stalone patch, I guess patch to patch is not good idea. Regards, Min Chen At 2021-07-31 02:27:36

Re: [x265] [arm64] port scale1D_128to64 and scale2D_64to32

2021-07-29 Thread chen
to LD1+ADDLP btw: excuse me, other patches need more time, probability review on weekend. Regards, Min Chen 2021-07-30 06:13:34,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernels: scale1D_128to64 68.89x 12.06

Re: [x265] [arm64] port addAvg

2021-07-27 Thread chen
Hi Sebastian, Looks good now, thanks. Regards, Min chen At 2021-07-27 23:50:19, "Pop, Sebastian" wrote: Thanks Min Chen for your reviews. In the attached patch I used dup instead of memory load, and I rescheduled some of the instructions to avoid pipel

Re: [x265] [arm64] port addAvg

2021-07-27 Thread chen
Hi, I just a little comments. +.macro addAvg_start +lsl x3, x3, #1 +lsl x4, x4, #1 +movrel x11, addAvg_offset +ld1 {v30.8h}, [x11] All of value in the addAvg_offset is 0x40, why not DUP? +add v0.8h, v0.8h,

Re: [x265] [arm64] port cpy2Dto1D_{shl, shr} and cpy1Dto2D_{shl, shr}

2021-07-27 Thread chen
Looks good, thanks. 2021-07-27 02:53:10,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernels: cpy2Dto1D_shl[4x4] 15.69x 6.73105.60 cpy2Dto1D_shr[4x4] 12.97x 6.6586.28 cpy2Dto1D_shl[8x8] 43.32x 8.85383.16

Re: [x265] [arm64] port count_nonzero, blkfill, and copy_{ss, sp, ps}

2021-07-24 Thread chen
the W12 in the range [-16,0] Please also remind the W0 is low part of X0, and result in the reg S4 is int32. Others in the patch looks good. Regards, Min Chen At 2021-07-25 13:31:06, "Pop, Sebastian" wrote: Hi, > You didn't see improve because you still use USHR, after C

Re: [x265] [arm64] port sad_x{3,4}

2021-07-23 Thread chen
Hi, That's my fault, I lost these part of SAD, so your code is no problem now, thank you. Regards, Min Chen At 2021-07-24 03:54:46, "Pop, Sebastian" wrote: Hi Min Chen, thanks for your reviews. > +.macro SAD_X_END_64 x > +uaddlp v16.4s, v16.8h > The

Re: [x265] [arm64] port avg_pp

2021-07-23 Thread chen
Looks good 2021-07-24 07:04:03,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernels: avg_pp[ 4x4] 8.50x8.8575.21 avg_pp_aligned[ 4x4] 8.49x8.8975.46 avg_pp[ 8x8] 29.12x 11.61 338.01

Re: [x265] [arm64] port count_nonzero, blkfill, and copy_{ss, sp, ps}

2021-07-23 Thread chen
At 2021-07-24 05:23:44, "Pop, Sebastian" wrote: Hi, > +fmovw12, s4 > +neg w12, w12 > +add w0, w12, #16 > (-w12) + 16 equal to 16-w12, load #16 into w0 may execution parallelism with > FMOV. I see a small improvement with this change.

Re: [x265] [arm64] port sad_x{3,4}

2021-07-22 Thread chen
[x6], #4 +st1 {v17.s}[0], [x6], #4 +st1 {v18.s}[0], [x6], #4 I guess STP may store two result in a cycle Regards, Min Chen 2021-07-22 14:30:50,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernels: sad_x3[

Re: [x265] [arm64] port sad

2021-07-20 Thread chen
. Regards, Min Chn At 2021-07-20 12:45:03, "Pop, Sebastian" wrote: Thanks Min Chen for your reviews. I tried your suggestion to remove one of the FP->GPR transfers. With the following patch I do not see any improvement for the 64x routines, and the number of instructions rem

Re: [x265] [arm64] port sad

2021-07-17 Thread chen
,v16 Regards, Min Chen 2021-07-17 04:44:05,"Pop, Sebastian" Hi, the attached patch ports to arm64 the following kernels: sad[ 4x4] 10.11x 6.5065.72 sad[ 8x8] 28.95x 8.50246.00 sad[ 8x4] 23.03x 5.45

Re: [x265] [arm64] port LUMA_VPP_4xN

2021-07-07 Thread chen
Hi Sebastian, It looks good, thanks. Regards, Min Chen At 2021-07-08 02:20:01, "Pop, Sebastian" wrote: Attached the amended patch with movi. That improved performance, thanks! I have seen the cmp/br pattern several times. We can do the reordering tuning after all the i

Re: [x265] [arm64] port LUMA_VPP_4xN

2021-07-06 Thread chen
odata. Please see the attached patch. Sebastian From: x265-devel on behalf of chen Reply-To: Development for x265 Date: Friday, July 2, 2021 at 8:11 PM To: Development for x265 Subject: RE: [EXTERNAL] [x265] [arm64] port LUMA_VPP_4xN | CAUTION: This email originate

Re: [x265] [arm64] port LUMA_VPP_4xN

2021-07-02 Thread chen
Hi, I put my comments inline. thanks. btw: I found more improve on this patch. +eor v17.16b, v17.16b, v17.16b The clear register operator may replace by MOVI At 2021-07-03 02:43:07, "Pop, Sebastian" wrote: Hi, thanks for your review. > +#ifdef __MACH__ > +# define

Re: [x265] [arm64] port LUMA_VPP_4xN

2021-07-01 Thread chen
Hello, Thank your patch, I make some comments. +#ifdef __MACH__ +# define MACH +#else +# define MACH # This is not good idea to bypass .const_data +ld1 {v0.s}[0], [x0], x1 +ld1 {v0.s}[1], [x0], x1 +ushll v0.8h, v0.8b, #0 ... +//

Re: [x265] [arm64] port filterPixelToShort

2021-06-24 Thread chen
I have not comment on this patch, thanks. 2021-06-25 01:45:03,"Pop, Sebastian" Added one missing function: convert_p2s[48x64] 1.56x300.44 469.25 ___ x265-devel mailing list x265-devel@videolan.org

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
The patch looks good, no more modify necessary, thanks. btw: you didn't see change with CBNZ, I guess two reasons, one is 'sub x9' too is in first part of loop, I more likely move these independent instruction fill into pipeline stall slots, the second is count of loop is not many enough

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
it looks good for me, thanks. btw: ARM64 have new instruction CBZ / CBNZ. At 2021-06-24 10:11:32, "Pop, Sebastian" wrote: I added the following change in the attached patch. It has better performance with ldp as it allows to re-schedule the instructions in independent ways: function

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
You are welcome. on your CPU, the ldp still slower, so we can keep origin version and improve it again in future. This version looks good for me, thank you for your contribute. At 2021-06-24 10:01:40, "Pop, Sebastian" wrote: Thanks again Chen for your careful review and recom

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
Could you please also try comments in last email? thanks. At 2021-06-24 09:09:09, "Pop, Sebastian" wrote: > +.macro filterPixelToShort_64xN h > +function x265_filterPixelToShort_64x\h\()_neon > +add x3, x3, x3 > +sub x3, x3, #0x40 > +movi

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
Thank your response, comment inline. At 2021-06-24 08:57:20, "Pop, Sebastian" wrote: Hi Chen, Thanks for your review! > +function x265_filterPixelToShort_4x4_neon > +add x3, x3, x3 > +moviv2.8h, #0xe0, lsl #8 > are you compiler do

Re: [x265] [arm64] port filterPixelToShort

2021-06-23 Thread chen
Hi Sebastian, thanks your patch. I have some comments. +function x265_filterPixelToShort_4x4_neon +add x3, x3, x3 +moviv2.8h, #0xe0, lsl #8 are you compiler does not handle constant 0xe000 automatic? it is more readable +ld1 {v0.s}[0],

Re: [x265] NASM 2.15.03 (MYS2/MinGW) throws a huge amount of macro warnings

2021-04-09 Thread chen
Hello, Sorry for delay. I had been fix these warnings with new version nasm in my local tree, but I don't know how to merge it into the current x265 tree, please wait the x265 team to fix these issues. Regards, Min Chen At 2021-04-09 15:48:37, "Mario *LigH* Rohkrämer" wrote: >

Re: [x265] NASM 2.15.03 (MYS2/MinGW) throws a huge amount of macro warnings

2020-09-03 Thread chen
Regards, Min Chen At 2020-09-03 23:56:13, "Nomis101" wrote: >Am 03.09.20 um 15:28 schrieb Mario *LigH* Rohkrämer: >> In the meantime, MSYS2 provides NASM 2.15.04; same output. >> > >I had a patch for this in this list. Maybe you could try if this patch will

Re: [x265] arm64 neon asm optimizations

2020-08-26 Thread chen
Hi Gopi, thank you help review these patches. At 2020-08-27 00:42:11, "Gopi Satykrishna Akisetty" wrote: Hi Min, On Thu, Aug 20, 2020 at 7:48 AM chen wrote: Hi Damiano, Thank your information. I fast take a look, it is based on Intrinsic, the perforamance strong depends o

Re: [x265] arm64 neon asm optimizations

2020-08-19 Thread chen
leave that start at September 2020. Regards, Min Chen At 2020-08-20 00:01:52, "Damiano Galassi" wrote: >Hi, Apple contributed to the HandBrake project a x265 patch >with a bunch of neon asm to improve x265 performance on Apple’s upcoming ARM >Macs, >but I don’t have the

Re: [x265] [PATCH] Add aarch64 support - Part 2

2020-03-17 Thread chen
Hi Xiyuan, I have been forwarded the email to you directly. Regards, Min Chen 2020-03-18 09:38:18,"Xiyuan Wang" Hi chen we didn't receive your reply about Part-1, can you resend it? Maybe the content is too large and the mail list blocked it. You can just quote the code

Re: [x265] [PATCH] Add aarch64 support - Part 2

2020-03-17 Thread chen
a/source/common/pixel.cpp b/source/common/pixel.cpp index 99b84449c..e4f890cd5 100644 --- a/source/common/pixel.cpp +++ b/source/common/pixel.cpp @@ -5,6 +5,7 @@ * Mandar Gurav * Mahesh Pittala * Min Chen + * Hongbin Liu * * This program is free software

Re: [x265] [PATCH]Add: Auto AQ Mode

2020-02-27 Thread chen
At 2020-02-27 16:59:18, "Niranjan Bala" wrote: +double computeBrightnessIntensity(pixel *inPlane, int width, int height, intptr_t stride) +{ +pixel* rowStart = inPlane; restrict with const prefix may better. +double count = 0; why declare as Double? + +for (int i = 0; i <

[x265] [PATCH] Improve all_angs_pred_c by remove unnecessary transpose

2019-11-04 Thread chen
From 7e495390396d6a55f95ad4649e46b56fd7d2ef1c Mon Sep 17 00:00:00 2001 From: Min Chen Date: Mon, 4 Nov 2019 16:21:20 +0800 Subject: [PATCH] Improve all_angs_pred_c by remove unnecessary transpose --- source/common/intrapred.cpp | 22 +++--- 1 file changed, 3 insertions(+), 19

Re: [x265] [x265 patch] Adaptive Frame Duplication

2019-09-22 Thread chen
At 2019-09-23 12:50:22, "Akil" wrote: # HG changeset patch # User Akil Ayyappan # Date 1568370446 -19800 # Fri Sep 13 15:57:26 2019 +0530 # Node ID 531f6b03eed0a40a38d3589dec03f14743293146 # Parent c4b098f973e6b0ee4aee3bf0d7b54da4e2734d42 Adaptive Frame duplication +uint32_t y = 0; +

Re: [x265] why fail to build x265 with vmaf

2019-09-08 Thread chen
Could you please try #include ? At 2019-09-09 10:18:13, "qw" wrote: Hi, The latest vmaf source code is used, but I still fail to build x265. Below is the error message: Scanning dependencies of target common [ 1%] Building ASM_NASM object common/CMakeFiles/common.dir/x86/pixel-a.asm.o [

Re: [x265] [x265 patch] New AQ mode with Variance and Edge information

2019-07-15 Thread chen
ead. At 2019-07-15 13:58:53, "Akil" wrote: Thanks for your suggestions, Chen. Have added the matrix in comments. That should make the code more readable. Regarding the last point, I think (rowNum+X)*stride cannot be replaced by a constant since it tends to change every time. On Fri, Jul 12,

Re: [x265] [x265 patch] New AQ mode with Variance and Edge information

2019-07-11 Thread chen
On Wed, Jul 10, 2019 at 3:41 PM Akil wrote: # HG changeset patch # User Akil Ayyappan # Date 1561035091 -19800 # Thu Jun 20 18:21:31 2019 +0530 # Node ID d25c33cc2b748401c5e908af445a0a110e26c3cf # Parent 4f6dde51a5db4f9229bddb60db176f16ac98f505 AQ: New AQ mode with Variance and Edge

Re: [x265] how to build x265 that supports both 8bit and 10bit

2019-05-18 Thread chen
The debug info affect compiler code generate, so lost a few performance, but we can ignore them since it is not much. Regards, Min At 2019-05-18 22:09:06, "qw" wrote: If I want to build x265 with release and ,debug info, I will choose the option of CMAKE_BUILD_TYPE=RelWithDebInfo. Is

Re: [x265] how to build x265 that supports both 8bit and 10bit

2019-05-16 Thread chen
Hi, Could you please try it with multilib.bat It is Steve's idea, we build lib two times with different bit_depth and combine these libs into one multiple feature lib. Regards, Min Chen At 2019-05-17 11:06:13, "qw" wrote: hi, I read x265 source code, and find one function, as s

Re: [x265] [PATCH x265] Add AVX2 assembly code for normFactor primitive.

2019-03-07 Thread chen
Just say it works. First at all, The expect algorithm is square of (x >> shift) It is 8 bits (I assume we talk with 8bpp, the 16bpp are similar) multiple of 8-bits and result is 16 bits. The function works on CU-level, the blockSize is up to 64 only, or call 6-bits. So, we can decide the

[x265] Original C++ code used for sad functions' assembly code in COST_MV?

2018-09-04 Thread Jeffrey Chen
Hi, I would like to configure the sad function in COST_MV for another platform. However, the assembly code would not be supported on the other platform. Where can I find the original programming language code that was made into the assembly language code?

[x265] reference patch to remove unnecessary pow(x,2)

2018-07-27 Thread chen
This patch remove unnecessary pow() and abs() 0001-improve-pow-x-2.patch Description: Binary data ___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] non-optimized chroma p2s[] on ARM platform

2018-07-17 Thread chen
I found that p.chroma[X265_CSP_I420].pu[i].p2s was not initialize on ARM platform, all of them execute as C-model, I guess these functions may reuse NEON's convert_p2s[*] ___ x265-devel mailing list x265-devel@videolan.org

Re: [x265] bug in IntraPred [DC]

2018-07-04 Thread chen
Please ignore my previous email, the dcVal initialize value is width, so this module have not bug. Sorry for disturb.___ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel

[x265] bug in IntraPred [DC]

2018-07-04 Thread chen
Hi, There have a long time bug in our intra prediction DC mode, see details: HM // Function for calculating DC value of the reference samples used in Intra prediction //NOTE: Bit-Limit - 25-bit source Pel

[x265] Another performance issue on ARM code

2018-06-11 Thread chen
I found some issues in ARM code, I don't point out on time, that's my failure. Such as these garbage code in x265_pixel_add_ps_4x4_neon: vmov.u16q10, #255 veor.u16q11, q11 veor.u16d3, d3 veor.u16d5, d5 btw: the ARM build was broken after

[x265] Code performance issue

2018-06-01 Thread chen
There have series performance issues, such as, uint32_t sum = (uint32_t)pow((outOfBound >> 2), 2); Are you want to get square value from a small integer? ___ x265-devel mailing list x265-devel@videolan.org

Re: [x265] [PATCH 300 of 307] x86: AVX512 'count_nonzero_16x16' avx-512 kernel, 22% speedup over avx2

2018-04-06 Thread chen
Sorry, I miss a line, resend with addition comment At 2018-04-07 01:27:34, "chen" <chenm...@163.com> wrote: At 2018-04-06 21:17:37, mythr...@multicorewareinc.com wrote: ># HG changeset patch ># User Jayashree ># Date 1517283539 28800 ># Mon Jan 29 19

Re: [x265] [PATCH 300 of 307] x86: AVX512 'count_nonzero_16x16' avx-512 kernel, 22% speedup over avx2

2018-04-06 Thread chen
At 2018-04-06 21:17:37, mythr...@multicorewareinc.com wrote: ># HG changeset patch ># User Jayashree ># Date 1517283539 28800 ># Mon Jan 29 19:38:59 2018 -0800 ># Node ID 3c6e5ce07dbca7f967e4b5b62fe450979da3bf81 ># Parent 624c83571d1df840e1206c46e589044fbf87ff32 >x86: AVX512

Re: [x265] unsigned promotion prevents encoding frames with negative strides

2018-01-17 Thread chen
Hi, Thank you report this bug. I think the root cause is not sizeof(), the negative stride is invalid in encoder/decoder core. To avoid these invalid input parameters, the x264 insert a middle-layer that convert color space and images, but x265 doesn't it. Of course, crash is worst way to

Re: [x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-10 Thread chen
At 2018-01-11 00:06:29, "Andrey Semashev" <andrey.semas...@gmail.com> wrote: >On 01/10/18 18:53, chen wrote: >> Hi Andrey, >> >> Our code rule prohibit inline assembly, especially the patch used GCC >> extension syntax. > >Ok, I

Re: [x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-10 Thread chen
Hi Andrey, Our code rule prohibit inline assembly, especially the patch used GCC extension syntax. the "lock" prefix will lock the CPU bus, it will be greater penalty on the multi-core system. Thanks, Min At 2018-01-10 23:30:06, "Andrey Semashev" wrote: >Any

Re: [x265] [PATCH] intra: sse4 version of strong intrasmoothing

2017-11-29 Thread chen
SSSE3 pmulhrsw also improve pmullw+paddw+psraw At 2017-11-28 23:57:50, "Ximing Cheng" wrote: ># HG changeset patch ># User Ximing Cheng ># Date 1511862059 -28800 ># Tue Nov 28 17:40:59 2017 +0800 ># Node ID

Re: [x265] [PATCH] intra: sse4 version of strong intrasmoothing

2017-11-28 Thread chen
I have a few comments. At 2017-11-28 23:57:50, "Ximing Cheng" wrote: >diff -r b24454f3ff6d -r 9cd0cf6e2fd8 source/common/x86/const-a.asm >--- a/source/common/x86/const-a.asmWed Nov 22 22:00:48 2017 +0530 >+++ b/source/common/x86/const-a.asmTue Nov 28 17:40:59

Re: [x265] [PATCH] intra: sse4 version of strong intra smoothing

2017-11-20 Thread chen
>diff -r a7c2f80c18af -r 973560d58dfb source/common/x86/intrapred8.asm >--- a/source/common/x86/intrapred8.asm Mon Nov 20 14:31:22 2017 +0530 >+++ b/source/common/x86/intrapred8.asm Tue Nov 21 03:10:14 2017 +0800 >@@ -22313,11 +22313,144 @@ > mov [r1 + 64], r3b ;

[x265] [PATCH] fix build error on VS2008 ( ambiguous on pow() )

2017-06-28 Thread chen
From 360c25c6198e7aaa3a9f0ad611d99f94a1ea6347 Mon Sep 17 00:00:00 2001 From: Min Chen <chenm...@163.com> Date: Wed, 28 Jun 2017 11:54:05 -0500 Subject: [PATCH] fix build error on VS2008 ( ambiguous on pow() ) --- source/encoder/slicetype.cpp |3 ++- 1 files changed, 2 insertions

Re: [x265] [PATCH] avx2: 'integral4v' asm code -> 7.48x faster than 'C' version

2017-05-08 Thread chen
Hi Guillaume, Our development platform is Visual Studio, the compiler can't auto-vectorize. We also can't assume user have advanced compiler on their computer. Regards, Min At 2017-05-08 19:36:24,"Guillaume POIRIER" wrote: >Hello Praveen Tiwari, > >Just for curiosity,

Re: [x265] x265 crashes/gets stuck when giving more than '--slices 16'

2017-03-14 Thread chen
Good morning Michael, I made a restrict on count of slices because we have limited number of output NAL buffers. Every slices need a independent NAL, but the SPS/PPS/VPS will also allocate at least one of NAL, so I made slices limit to (MAX_NAL_UNITS - 1) Best regards, Min At 2017-03-14

[x265] fix logic timing bug

2016-11-23 Thread chen
# HG changeset patch # User Min Chen <chenm...@163.com> # Date 1479924604 21600 # Node ID c5ea19f5852aadd42bedd1d9fe4eb4b350a31e73 # Parent a895b6344a82f2b5a0f8bc4ba7a913f0c40d114d fix logic timing bug --- source/encoder/framefilter.cpp | 11 --- 1 files changed, 8 insertions

[x265] cleanup debug code

2016-11-16 Thread chen
# HG changeset patch # User Min Chen <min.c...@multicorewareinc.com> # Date 1479317016 21600 # Node ID 99a4a2d29d5c2b997745b06e5954a03bc080478f # Parent 4c1652f3884fba9fab4c589dd057b12e6bf33d5b cleanup debug code --- source/encoder/sao.cpp |4 +--- 1 files changed, 1 insertions

[x265] [slices] restrict mv never beyond boundary in both slices and non-slices mode

2016-11-01 Thread chen
# HG changeset patch # User Min Chen <min.c...@multicorewareinc.com> # Date 1478030336 18000 # Node ID 201758801366fb5e5b59710d87f4b8da911d6b73 # Parent 5fe7ac3068ebedc3d58451518c54c501e3c41103 [slices] restrict mv never beyond boundary in both slices and non-slices mode --- source/e

Re: [x265] [slices] fix multi-slices output non-determination bug

2016-11-01 Thread chen
2016-11-01 11:40:45,"Pradeep Ramachandran" <prad...@multicorewareinc.com> : On Mon, Oct 31, 2016 at 11:03 PM, chen <chenm...@163.com> wrote: # HG changeset patch # User Min Chen <min.c...@multicorewareinc.com> # Date 1477935084 18000 # Node ID 9be03f087899

[x265] [slices] fix multi-slices output non-determination bug

2016-10-31 Thread chen
# HG changeset patch # User Min Chen <min.c...@multicorewareinc.com> # Date 1477935084 18000 # Node ID 9be03f08789954f772a50f26485a9c96ca745497 # Parent b08109b3701e9b86010c5a5ed0ad7b3d6a051911 [slices] fix multi-slices output non-determination bug --- source/common/common.h

[x265] [PATCH] [slices] allow number of slices more than rows (Issue #300-3)

2016-10-27 Thread chen
From e697fcd5fa0d36b33d42d01c2845ca36533dbd96 Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Thu, 27 Oct 2016 11:11:09 -0500 Subject: [PATCH] [slices] allow number of slices more than rows (Issue #300-3) --- source/common/param.cpp|2 -- source/e

Re: [x265] [PATCH] [PPC] support option --no-asm to disable Altivec

2016-10-25 Thread chen
All of his origin files in another patch, that is very large and mail-list block it until you approval. At 2016-10-25 11:59:45,"Pradeep Ramachandran" <prad...@multicorewareinc.com> wrote: On Tue, Oct 25, 2016 at 2:59 AM, chen <chenm...@16

[x265] [PATCH] [PPC] GPL v2 copyright header

2016-10-24 Thread chen
From 1bea85513646e4d9d992bbe326a9cb3275ec313a Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Mon, 24 Oct 2016 16:38:55 -0500 Subject: [PATCH] [PPC] GPL v2 copyright header --- source/common/ppc/dct_altivec.cpp | 24 source/

[x265] [PATCH] [PPC] support option --no-asm to disable Altivec

2016-10-24 Thread chen
From d23527c6204921b782ef8bc2f1a69de88163202a Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Mon, 24 Oct 2016 16:27:35 -0500 Subject: [PATCH] [PPC] support option --no-asm to disable Altivec --- source/CMakeLists.txt|2 +- source/common/c

Re: [x265] Tile support in x265

2016-10-12 Thread chen
Thank you help reply that message. I am the developer for WPP and Slices, the motion vectors has restricted in slice boundary now, I will also make same restricts on Tiles. In future, we will addition a new user option to allow MV beyond boundary. Paid attention, it is a low priority task in

Re: [x265] x265-devel Digest, Vol 40, Issue 26

2016-09-28 Thread chen
+0800 (CST) From: chen <chenm...@163.com> To: "Development for x265" <x265-devel@videolan.org> Subject: Re: [x265] Optimize slice QP in PPS for x265 Message-ID: <2e196b48.110e.1576c622432.coremail.chenm...@163.com> Content-Type: text/plain; charset="gbk"

Re: [x265] Optimize slice QP in PPS for x265

2016-09-27 Thread chen
Hello Xuefeng, Your idea is good, in low bitrate environment, the MV, header are most important part in bitstream. I take a look your code, it sounds some problems. Your calculate correlation between sliceQp and QP Range (it is [0, 51] without range extension), so you will got a constant

Re: [x265] [PATCH] frameFilter: check for reconRowFlag

2016-09-27 Thread chen
This patch made logic bug, the m_reconRowFlag and numRowFinished use to enable Sao filter when all row finished. At 2016-09-27 19:17:16,as...@multicorewareinc.com wrote: ># HG changeset patch ># User Ashok Kumar Mishra ># Date 1474974965 -19800 ># Tue Sep 27

Re: [x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-24 Thread chen
nd binary both are different. I applied you patch build once (like 8 bit build) and collected all depth outputs (8, 10 and 12), compared with three builds of x265 i.e 8 bit, 10 bit and 12 bit. Regards, Praveen On Fri, Sep 23, 2016 at 2:47 AM, chen <chenm...@163.com> wrote: Hi Prave

Re: [x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-23 Thread chen
depth outputs (8, 10 and 12), compared with three builds of x265 i.e 8 bit, 10 bit and 12 bit. Regards, Praveen On Fri, Sep 23, 2016 at 2:47 AM, chen <chenm...@163.com> wrote: Hi Praveen, I test your cmdlind on my VS2008 build. I build three bit-depth version and compare with

Re: [x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-22 Thread chen
On Thu, Sep 15, 2016 at 1:55 AM, chen <chenm...@163.com> wrote: From ea50e494473623ed0dbff2907194aaf268dc449a Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Wed, 14 Sep 2016 15:23:38 -0500 Subject: [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Wo

[x265] [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround)

2016-09-14 Thread chen
From ea50e494473623ed0dbff2907194aaf268dc449a Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Wed, 14 Sep 2016 15:23:38 -0500 Subject: [PATCH] [multi-lib] Support 8+10+12 bits in single DLL (Workaround) --- source/CMakeLists.txt

[x265] [PATCH] [slice] fix help information defaule value mistake

2016-09-13 Thread chen
From ea93a3ddb7e8c7e106955acef56f6df72a15587a Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Tue, 13 Sep 2016 10:59:09 -0500 Subject: [PATCH] [slice] fix help information defaule value mistake --- source/x265cli.h |2 +- 1 files changed, 1 insertions

Re: [x265] [PATCH 1 of 2] [slice] slice feature in help menu

2016-09-13 Thread chen
Thank you point out my fault, I forgot to check default value field, I was fixed this bug now. At 2016-09-13 15:12:33,"Mario *LigH* Rohkrämer" <cont...@ligh.de> wrote: >Am 07.09.2016, 22:27 Uhr, schrieb Min Chen <chenm...@163.com>: > >> +H0(" --[n

[x265] [PATCH] [slice] verify untest path and enable it

2016-09-12 Thread chen
From dc6d861fd8f91c90e6bbdee366cfb7df5fdf183f Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Mon, 12 Sep 2016 13:18:32 -0500 Subject: [PATCH] [slice] verify untest path and enable it --- source/encoder/frameencoder.cpp |2 +- 1 files changed, 1 inse

[x265] [PATCH] [slice] cleanup debug code

2016-09-07 Thread chen
From e409325885d196b53d9824ee861867a696e6df51 Mon Sep 17 00:00:00 2001 From: Min Chen <min.c...@multicorewareinc.com> Date: Wed, 7 Sep 2016 15:25:49 -0500 Subject: [PATCH] [slice] cleanup debug code --- source/encoder/frameencoder.cpp | 10 +- source/encoder/framefilter.cpp

  1   2   3   4   5   6   7   8   9   10   >