[FFmpeg-devel] [PATCH 6/6] avcodec/hevcdsp: Add NEON optimization for idct16x16

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/hevcdsp_idct_neon.S | 241 + libavcodec/arm/hevcdsp_init_neon.c | 2 + 2 files changed, 243 insertions(+) diff --git

[FFmpeg-devel] [PATCH 0/6] Optimize HEVC decoding on ARM (32bit) platform

2017-11-22 Thread Shengbin Meng
NEON optimization for sao avcodec/hevcdsp: Add NEON optimization for idct16x16 Shengbin Meng (1): avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation libavcodec/arm/Makefile|4 +- libavcodec/arm/hevcdsp_epel_neon.S | 2078

[FFmpeg-devel] [PATCH 1/6] avcodec/hevcdsp: Add NEON optimization for qpel weighted mode

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/hevcdsp_init_neon.c | 66 + libavcodec/arm/hevcdsp_qpel_neon.S | 509 + 2 files changed, 575 insertions(+) diff --git

[FFmpeg-devel] [PATCH 5/6] avcodec/hevcdsp: Add NEON optimization for sao

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/Makefile| 3 +- libavcodec/arm/hevcdsp_init_neon.c | 62 + libavcodec/arm/hevcdsp_sao_neon.S | 181 + 3 files

[FFmpeg-devel] [PATCH 2/6] avcodec/hevcdsp: Add NEON optimization for epel

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/Makefile|3 +- libavcodec/arm/hevcdsp_epel_neon.S | 2068 libavcodec/arm/hevcdsp_init_neon.c | 459 3 files

[FFmpeg-devel] [PATCH 4/6] avcodec/hevcdsp: Use pre-load (pld) to optimize data loading

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/hevcdsp_epel_neon.S | 10 ++ libavcodec/arm/hevcdsp_qpel_neon.S | 24 2 files changed, 30 insertions(+), 4 deletions(-) diff --git

[FFmpeg-devel] [PATCH 3/6] avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation

2017-11-22 Thread Shengbin Meng
New code is written for qpel; and then code for qpel is reused for epel, because whole-pixel interpolation in qpel and epel are identical. Signed-off-by: Shengbin Meng <shengbinm...@gmail.com> --- libavcodec/arm/hevcdsp_init_neon.c | 106 ++ libavcodec/arm/hevcdsp_qpel_

Re: [FFmpeg-devel] [PATCH 1/6] avcodec/hevcdsp: Add NEON optimization for qpel weighted mode

2017-11-22 Thread Shengbin Meng
> On 22 Nov 2017, at 20:26, Michael Niedermayer <mich...@niedermayer.cc> wrote: > > On Wed, Nov 22, 2017 at 07:12:01PM +0800, Shengbin Meng wrote: >> From: Meng Wang <wangmeng.k...@bytedance.com> >> >> Signed-off-by: Meng Wang <wangmeng.k.

[FFmpeg-devel] [PATCH v2 5/5] avcodec/hevcdsp: Add NEON optimization for sao

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/Makefile| 3 +- libavcodec/arm/hevcdsp_init_neon.c | 62 + libavcodec/arm/hevcdsp_sao_neon.S | 181 + 3 files

[FFmpeg-devel] [PATCH v2 4/5] avcodec/hevcdsp: Use pre-load (pld) to optimize data loading

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/hevcdsp_epel_neon.S | 10 ++ libavcodec/arm/hevcdsp_qpel_neon.S | 24 2 files changed, 30 insertions(+), 4 deletions(-) diff --git

[FFmpeg-devel] [PATCH v2 1/5] avcodec/hevcdsp: Add NEON optimization for qpel weighted mode

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/hevcdsp_init_neon.c | 67 + libavcodec/arm/hevcdsp_qpel_neon.S | 509 + 2 files changed, 576 insertions(+) diff --git

[FFmpeg-devel] [PATCH v2 2/5] avcodec/hevcdsp: Add NEON optimization for epel

2017-11-22 Thread Shengbin Meng
From: Meng Wang Signed-off-by: Meng Wang --- libavcodec/arm/Makefile|3 +- libavcodec/arm/hevcdsp_epel_neon.S | 2068 libavcodec/arm/hevcdsp_init_neon.c | 458 3 files

[FFmpeg-devel] [PATCH v2 3/5] avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation

2017-11-22 Thread Shengbin Meng
New code is written for qpel; and then code for qpel is reused for epel, because whole-pixel interpolation in qpel and epel are identical. Signed-off-by: Shengbin Meng <shengbinm...@gmail.com> --- libavcodec/arm/hevcdsp_init_neon.c | 107 ++ libavcodec/arm/hevcdsp_qpel_

Re: [FFmpeg-devel] [PATCH] 8-bit hevc decoding optimization on aarch64 with neon

2017-11-21 Thread Shengbin Meng
> On 19 Nov 2017, at 01:35, Rafal Dabrowa wrote: > > > This is a proposal of performance optimizations for 8-bit > hevc video decoding on aarch64 platform with neon (simd) extension. Nice to see the work for aarch64! We are also in the process of doing NEON

[FFmpeg-devel] HEVC ARM optimization

2017-10-20 Thread Shengbin Meng
Hi, I’d like to know if anyone is dong or interested in ARM optimization for the native HEVC decoder in FFmpeg? We can see that some time-consuming operations in HEVC decoding have not been optimized using NEON, e.g, qpel and epel interpolation, SAO, IDCT of large blocks. I have some

Re: [FFmpeg-devel] [PATCH v3] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-04-08 Thread Shengbin Meng
LGTM. Regards, Shengbin Meng > On 27 Mar 2018, at 20:43, Yingming Fan <yingming...@gmail.com> wrote: > > From: Meng Wang <wangmeng.k...@bytedance.com> > > Signed-off-by: Meng Wang <wangmeng.k...@bytedance.com> > --- > This v3 patch removed unused codes 's

Re: [FFmpeg-devel] [PATCH] checkasm/hevc_mc : add hevc_mc for checkasm

2018-04-17 Thread Shengbin Meng
> On Apr 9, 2018, at 10:12, Yingming Fan wrote: > > From: Yingming Fan > > --- > Hi, there. > I plane to submit our arm32 neon codes for qpel and epel. > While before this i will submit hevc_mc checkasm codes. > This hevc_mc checkasm codes check

Re: [FFmpeg-devel] [PATCH v2] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-25 Thread Shengbin Meng
> On 22 Mar 2018, at 20:51, Yingming Fan wrote: > > From: Meng Wang > > Signed-off-by: Meng Wang > --- > This v2 patch remove unused codes 'stride_dst /= sizeof(uint8_t);' compared > to v1. V1 have this codes

Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-22 Thread Shengbin Meng
The code looks good to me. I think the wrapper is fine, because that part of code is not suitable for NEON assembly. But you can remove the using of `sizeof(uint8_t)` as suggested by Carl. Shengbin Meng > On 19 Mar 2018, at 12:41, Yingming Fan <yingming...@gmail.com> wrote

Re: [FFmpeg-devel] [PATCH] avcodec/arm/hevcdsp_sao : add NEON optimization for sao

2018-03-22 Thread Shengbin Meng
Hi, By checkasm benchmark, I can see a speedup of ~3x for band mode and ~6x for edge mode on my device (the device has aarch64 CPU, but I configured ffmpeg with `—arch=arm`). And FATE passed as well. Results of a checkasm run: $./tests/checkasm/checkasm --test=hevc_sao --bench $ sudo