From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_idct_neon.S | 241 +
libavcodec/arm/hevcdsp_init_neon.c | 2 +
2 files changed, 243 insertions(+)
diff --git
NEON optimization for sao
avcodec/hevcdsp: Add NEON optimization for idct16x16
Shengbin Meng (1):
avcodec/hevcdsp: Add NEON optimization for whole-pixel interpolation
libavcodec/arm/Makefile|4 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2078
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_init_neon.c | 66 +
libavcodec/arm/hevcdsp_qpel_neon.S | 509 +
2 files changed, 575 insertions(+)
diff --git
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile| 3 +-
libavcodec/arm/hevcdsp_init_neon.c | 62 +
libavcodec/arm/hevcdsp_sao_neon.S | 181 +
3 files
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile|3 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2068
libavcodec/arm/hevcdsp_init_neon.c | 459
3 files
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_epel_neon.S | 10 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 24
2 files changed, 30 insertions(+), 4 deletions(-)
diff --git
New code is written for qpel; and then code for qpel is reused for epel,
because whole-pixel interpolation in qpel and epel are identical.
Signed-off-by: Shengbin Meng <shengbinm...@gmail.com>
---
libavcodec/arm/hevcdsp_init_neon.c | 106 ++
libavcodec/arm/hevcdsp_qpel_
> On 22 Nov 2017, at 20:26, Michael Niedermayer <mich...@niedermayer.cc> wrote:
>
> On Wed, Nov 22, 2017 at 07:12:01PM +0800, Shengbin Meng wrote:
>> From: Meng Wang <wangmeng.k...@bytedance.com>
>>
>> Signed-off-by: Meng Wang <wangmeng.k.
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile| 3 +-
libavcodec/arm/hevcdsp_init_neon.c | 62 +
libavcodec/arm/hevcdsp_sao_neon.S | 181 +
3 files
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_epel_neon.S | 10 ++
libavcodec/arm/hevcdsp_qpel_neon.S | 24
2 files changed, 30 insertions(+), 4 deletions(-)
diff --git
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/hevcdsp_init_neon.c | 67 +
libavcodec/arm/hevcdsp_qpel_neon.S | 509 +
2 files changed, 576 insertions(+)
diff --git
From: Meng Wang
Signed-off-by: Meng Wang
---
libavcodec/arm/Makefile|3 +-
libavcodec/arm/hevcdsp_epel_neon.S | 2068
libavcodec/arm/hevcdsp_init_neon.c | 458
3 files
New code is written for qpel; and then code for qpel is reused for epel,
because whole-pixel interpolation in qpel and epel are identical.
Signed-off-by: Shengbin Meng <shengbinm...@gmail.com>
---
libavcodec/arm/hevcdsp_init_neon.c | 107 ++
libavcodec/arm/hevcdsp_qpel_
> On 19 Nov 2017, at 01:35, Rafal Dabrowa wrote:
>
>
> This is a proposal of performance optimizations for 8-bit
> hevc video decoding on aarch64 platform with neon (simd) extension.
Nice to see the work for aarch64!
We are also in the process of doing NEON
Hi,
I’d like to know if anyone is dong or interested in ARM optimization for the
native HEVC decoder in FFmpeg?
We can see that some time-consuming operations in HEVC decoding have not been
optimized using NEON, e.g, qpel and epel interpolation, SAO, IDCT of large
blocks.
I have some
LGTM.
Regards,
Shengbin Meng
> On 27 Mar 2018, at 20:43, Yingming Fan <yingming...@gmail.com> wrote:
>
> From: Meng Wang <wangmeng.k...@bytedance.com>
>
> Signed-off-by: Meng Wang <wangmeng.k...@bytedance.com>
> ---
> This v3 patch removed unused codes 's
> On Apr 9, 2018, at 10:12, Yingming Fan wrote:
>
> From: Yingming Fan
>
> ---
> Hi, there.
> I plane to submit our arm32 neon codes for qpel and epel.
> While before this i will submit hevc_mc checkasm codes.
> This hevc_mc checkasm codes check
> On 22 Mar 2018, at 20:51, Yingming Fan wrote:
>
> From: Meng Wang
>
> Signed-off-by: Meng Wang
> ---
> This v2 patch remove unused codes 'stride_dst /= sizeof(uint8_t);' compared
> to v1. V1 have this codes
The code looks good to me. I think the wrapper is fine, because that part of
code is not suitable for NEON assembly.
But you can remove the using of `sizeof(uint8_t)` as suggested by Carl.
Shengbin Meng
> On 19 Mar 2018, at 12:41, Yingming Fan <yingming...@gmail.com> wrote
Hi,
By checkasm benchmark, I can see a speedup of ~3x for band mode and ~6x for
edge mode on my device (the device has aarch64 CPU, but I configured ffmpeg
with `—arch=arm`). And FATE passed as well.
Results of a checkasm run:
$./tests/checkasm/checkasm --test=hevc_sao --bench
$ sudo
20 matches
Mail list logo